Mathematics Form and Function Compress
Mathematics Form and Function Compress
Mathematics
Form and Function
Springer-Verlag
New York Berlin Heidelberg Tokyo
Saunders Mac Lane
Department of Mathematics
University of Chicago
Chicago, Illinois 60637
U.S.A.
All rights reserved. No part of this book may be translated or reproduced in any
form without written permission from Springer-Verlag, 175 Fifth Avenue, New
York, New York 10010, U.S.A.
9 8 7 6 5 432 I
This book records my efforts over the past four years to capture in words a
description of the form and function of Mathematics, as a background for
the Philosophy of Mathematics. My efforts have been encouraged by lec-
tures that I have given at Heidelberg under the auspices of the Alexander
von Humboldt Stiftung, at the University of Chicago, and at the University
of Minnesota, the latter under the auspices of the Institute for Mathematics
and Its Applications. Jean Benabou has carefully read the entire
manuscript and has offered incisive comments. George Glauberman, Car-
los Kenig, Christopher Mulvey, R. Narasimhan, and Dieter Puppe have
provided similar comments on chosen chapters. Fred Linton has pointed
out places requiring a more exact choice of wording. Many conversations
with George Mackey have given me important insights on the nature of
Mathematics. I have had similar help from Alfred Aeppli, John Gray, Jay
Goldman, Peter Johnstone, Bill Lawvere, and Roger Lyndon. Over the
years, I have profited from discussions of general issues with my colleagues
Felix Browder and Melvin Rothenberg. Ideas from Tammo Tom Dieck,
Albrecht Dold, Richard Lashof, and Ib Madsen have assisted in my study
of geometry. Jerry Bona and B. L. Foster have helped with my examina-
tion of mechanics. My observations about logic have been subject to con-
structive scrutiny by Gert Miiller, Marian Boykan Pour-El, Ted Slaman,
R. Voreadou, Volker Weispfennig, and Hugh Woodin. I have profited from
discussions of philosophical issues with J. L. Corcoran, Philip Kitcher, Leo-
nard Linsky, Penelope Maddy, W. V. Quine, Michael Resnik, and Howard
Stein. Some of my earlier views on various issues have been constructively
examined by Joel Fingerman, Marvin 1. Greenberg, Nicholas Goodman,
P. C. Kolaitis, J. R. Shoenfield, and David Stroh. I am grateful to all those
people-and to a number of others-even in the numerous cases where I
have not followed their advice.
Vl Preface
Introduction
CHAPTER I
Origins of Formal Structure 6
I. The Natural Numbers 7
2. Infinite Sets 10
3. Permutations 11
4. Time and Order 13
5. Space and Motion 16
6. Symmetry 19
7. Transformation Groups 21
8. Groups 22
9. Boolean Algebra 26
10. Calculus, Continuity, and Topology 29
1I. Human Activity and Ideas 34
12. Mathematical Activities 36
13. Axiomatic Structure 40
CHAPTER II
From Whole Numbers to Rational Numbers 42
I. Properties of Natural Numbers 42
2. The Peano Postulates 43
3. Natural Numbers Described by Recursion 47
4. Number Theory 48
5. Integers 50
6. Rational Numbers 51
7. Congruence 52
8. Cardinal Numbers 54
9. Ordinal Numbers 56
10. What Are Numbers? 58
Vlll Contents
CHAPTER III
Geometry 61
I. Spatial Activities 61
2. Proofs without Figures 63
3. The Parallel Axiom 67
4. Hyperbolic Geometry 70
5. Elliptic Geometry 73
6. Geometric Magnitude 75
7. Geometry by Motion 76
8. Orientation 82
9. Groups in Geometry 85
10. Geometry by Groups 87
II. Solid Geometry 89
12. Is Geometry a Science? 91
CHAPTER IV
Real Numbers 93
I. Measures of Magnitude 93
2. Magnitude as a Geometric Measure 94
3. Manipulations of Magnitudes 97
4. Comparison of Magnitudes 98
5. Axioms for the Reals 102
6. Arithmetic Construction of the Reals 105
7. Vector Geometry 107
8. Analytic Geometry 109
9. Trigonometry 110
10. Complex Numbers 114
II. Stereographic Projection and Infinity 116
12. Are Imaginary Numbers Real? 118
13. Abstract Algebra Revealed 119
14. The Quaternions-and Beyond 120
15. Summary 121
CHAPTER V
Functions, Transformations, and Groups 123
I. Types of Functions 123
2. Maps 125
3. What Is a Function? 126
4. Functions as Sets of Pairs 128
5. Transformation Groups 133
6. Groups 135
7. Galois Theory 138
8. Constructions of Groups 14?
9. Simple Groups 146
10. Summary: Ideas of Image and Composition 147
Contents IX
CHAPTER VI
Concepts of Calculus 150
I. Origins 150
2. Integration 152
3. Derivatives 154
4. The Fundamental Theorem of the Integral Calculus 155
5. Kepler's Laws and Newton's Laws 158
6. Differential Equations 161
7. Foundations of Calculus 162
8. Approximations and Taylor's Series 167
9. Partial Derivatives 168
10. Differential Forms 173
II. Calculus Becomes Analysis 178
12. Interconnections of the Concepts 183
CHAPTER VII
Linear Algebra 185
I. Sources of Linearity 185
2. Transformations versus Matrices 188
3. Eigenvalues 191
4. Dual Spaces 193
5. Inner Product Spaces 196
6. Orthogonal Matrices 198
7. Adjoints 200
8. The Principal Axis Theorem 202
9. Bilinearity and Tensor Products 204
10. Collapse by Quotients 208
II. Exterior Algebra and Differential Forms 210
12. Similarity and Sums 213
13. Summary 218
CHAPTER VIII
Forms of Space 219
I. Curvature 219
2. Gaussian Curvature for Surfaces 222
3. Arc Length and Intrinsic Geometry 226
4. Many-Valued Functions and Riemann Surfaces 228
5. Examples of Manifolds 233
6. Intrinsic Surfaces and Topological Spaces 236
7. Manifolds 239
8. Smooth Manifolds 244
9. Paths and Quantities 247
10. Riemann Metrics 251
II. Sheaves 252
12. What Is Geometry? 256
x Contents
CHAPTER IX
Mechanics 259
1. Kepler's Laws 259
2. Momentum, Work, and Energy 264
3. Lagrange's Equations 267
4. Velocities and Tangent Bundles 274
5. Mechanics in Mathematics 277
6. Hamilton's Principle 278
7. Hamilton's Equations 282
8. Tricks versus Ideas 287
9. The Principal Function 289
10. The Hamilton-Jacobi Equation 292
II. The Spinning Top 295
12. The Form of Mechanics 301
13. Quantum Mechanics 303
CHAPTER X
Complex Analysis and Topology 307
1. Functions of a Complex Variable 307
2. Pathological Functions 310
3. Complex Derivatives 312
4. Complex Integration 317
5. Paths in the Plane 322
6. The Cauchy Theorem 328
7. Uniform Convergence 333
8. Power Series 336
9. The Cauchy Integral Formula 338
10. Singularities 341
II. Riemann Surfaces 344
12. Germs and Sheaves 351
13. Analysis, Geometry, and Topology 356
CHAPTER XI
Sets, Logic, and Categories 358
I. The Hierarchy of Sets 359
2. Axiomatic Set Theory 362
3. The Propositional Calculus 368
4. First Order Language 370
5. The Predicate Calculus 373
6. Precision and Understanding 377
7. Gi.idel Incompleteness Theorems 379
8. Independence Results 383
9. Categories and Functions 386
10. Natural Transformations 390
II. Universals 392
12. Axioms on Functions 398
13. Intuitionistic Logic 402
Contents Xl
CHAPTER XII
The Mathematical Network 409
I. The Formal 410
2. Ideas 415
3. The Network 417
4. Subjects, Specialties, and Subdivisions 422
5. Problems 428
6. Understanding Mathematics 431
7. Generalization and Abstraction 434
8. Novelty 438
9. Is Mathematics True? 440
10. Platonism 447
II. Preferred Directions for Research 449
12. Summary 453
Bibliography 457
Index 463
Introduction
These numbers are used to list in order the objects of some collection of
things, or simply to label these objects, or to count the collection, or to
(thereby) compare two collections. From these activities, several
Mathematical concepts arise together
At this point the word "set" simply means a collection of things: A group-
ing or assemblage S of objects (say, of physical objects or of symbols)
such as the collection of two turtle doves, three french hens, four colley
birds, or five gold rings-or the two collections
of three letters each, written with the conventional bracket notation for a
set or collection. At this stage, the word "collection" is appropriate,
because all that matters about a set (or collection) is that it is determined
by specifying its elements; one does not yet need more sophisticated
notions, such as sets whose elements are themselves sets, or sets of sets of
sets. or sets of subsets.
8 I. Origins of Formal Structure
In these terms, one can give semi-final descriptions of the (at first)
highly informal operations of listing, labeling, counting, and comparing.
To "list" a collection such as {A,B,C} means to attach in regular order a
numeral to each object in the collection; one usually begins with the
numeral I and proceeds in order, say, as {A 1,B2 ,C3}. Note that the
numerals will be adequate for this process in all cases only if there is
always a next numeral; this is one origin of the idea that every natural
number n has an immediate successor sen) = n + 1. To "label" means to
attach the same numerals to the objects of the· collection, but irrespective
of their order, as in {A 2 ,B3,C1 }. To "count" a collection means to deter-
mine how many numerals (or which numerals) are needed to label all the
objects in the collection. In this connection, note that the count, done
properly, always comes out to the same answer. In particular, the
numerals needed do not depend on the order in which the objects of the
collections are counted: Whether it is {A 1,B2,C3}, {B 1,A 2,C3} or
{C 1,B2,A 3 }, it always ends at the same 3. Comparing two collections,
such as {A,B,C} and {U, V, W} means matching each object of the first
collection with some object of the second, until both are exhausted, as in
{AIW, BIV, CIU}. Of course, it might happen that one collection is
exhausted before the other; the first is then "smaller" in the comparison.
The result of this comparison does not depend on the order in which
objects are matched: {A,B} in any order is smaller than {U, V, W}. There
are many pairs of collections to be compared, but it again turns out that it
is not necessary to compare each pair; it is enough to compare finite col-
lections with the standard initial segments of the positive natural
numbers:
In this context, one says that the collection {A,B,C} has the cardinal
number 3, in symbols
# {A,B,C} = 3. (4)
B ~ 2, c ~ 3. (7)
B ~ v, C~W (8)
#{A,B,C} = #{U,V,W},
3 +2 = #{A,B,C,U,v},
and similarly for other sums. The product 2· 3 can be described "geomet-
rically" as the cardinal number of a 2 X 3 square array
2. 3 -- # {(A,U)(B,U)(C,U)}
(A, V)(B, V)( C, V) .
Here the three columns are three disjoint sets, so the product can also be
described as an iterated sum
2·3 = 2 + 2 + 2.
Similarly, the exponential 23 can be described as an iterated product
it can also be described as the cardinal number of the set of all functions
from a 3-element set {I,2,3} to a 2-element set {O,l}.
IO I. Origins of Formal Structure
°
puter or employ the familiar rules: The addition and multiplication tables
for the digits from to 9, plus the rules for carry-over of tens. These rules
are "formal" in the basic sense of the word: They do not refer to the
meanings of the decimals or of the arithmetic operations (though they can
be rigorously deduced from these meanings). Instead they simply specify
what to do, and specify that correctly. Thus if one counts two disjoint col-
lections as having 5 and 17 members, respectively, and then adds the
decimals 5 and 17 according to the rules, the sum is always the count for
the combined collection-and similarly for the product. To be sure, items
can get lost from collections and calculators can make errors, but then
there are further rules to make checks, like the rule of "casting out 9's"
(replace each decimal by the sum of its digits, then add or multiply,
according to the case). For numbers written in bases other than tens,
there are corresponding rules for calculations and for checks (what does
one cast out?).
This example gives a clear indication of what we intend to mean by for-
mal: A list of rules or of axioms or of methods of proof which can be
applied without attention to the "meaning" but which give results which
do have the correct interpretation.
2. Infinite Sets
The collection of all the natural numbers,
° °
starts with and has to each number a successor; hence it is infinite. His-
torically, one started with I and not 0, but we need as the cardinal
number of the empty set.
The infinite set N of all natural numbers includes many finite subsets
as well as infinite sets, such as the set P of all positive natural numbers
p = {1,2,3,4, ... },
the set E of all even positive numbers, and the set S of all positive multi-
ples of 6. These various infinite sets may be compared as follows:
3. Permutations II
p 1
IIII
{ 1, 2, 3, 4, ...
E { 2, 4, 6, 8, ... l (2)
S
1111
(6, 12, 18, 24, ... );
the result shows that there are just as many even positives as there are
positives all told; ben) = 2n defines a bijection b: P~E. Similarly
c(2m) = 6m is a bijection c: E~S. In the comparisons (2), c(b(n)) = 6n
gives a "composite" bijection c·b: P~S.
A set X is called denumerable when there is a bijection f: N ~X. Thus
the comparisons (2) indicate that P, E, and S are all denumerable; as a
matter of fact, any subset of N is either finite or denumerable.
Two sets X and Y have the same cardinal number when there is a bijec-
tion f: X ~ Y. This definition includes the finite cardinals 0,1, ... already
discussed in §l, and the cardinal number called ~o (aleph-naught) of N,
E, P, and all other denumerable sets. In this way, the elementary activity
of counting leads to infinite cardinal numbers-of which ~b is only the
first. We will later see that the set of all points on a line is infinite but not
denumerable.
One can also formally describe when a set is infinite: When its cardinal
number is not finite, or, equivalently, when it has a proper subset S for
which there is a bijection S ~ X.
Finitists hold that infinite sets (and geometrical infinities) are just con-
venient fictions, while only the finite is "real". This we must later con-
sider. For that matter, is a finite set real? On the fourth day of Christmas,
did my true love send me four colley birds or a set of four colley birds?
Where is the set?
3. Permutations
A finite set, counted in any order, leads to the same (finite) cardinal
number. The count is not changed by "permuting" the things counted.
But one may also count how many permutations there are. Thus the set
{l ,2,3} has six permutations
This is usually written as a cycle (132), standing for f-> 3 f-> 2 f-> 1.
Any permutation of {l ,2,3} can be viewed as a bijection
(1)
(4)
In this list, the composite of any two permutations still leaves the polyno-
mial (4) unchanged, so the composite is also in the list. Such a list of per-
mutations is called a permutation group. The combined list (2) and (3) is
also such a group.
t < t' and t' < til imply t < til (1)
for the "binary relation" <. Moreover, for any two distinct instants of
time, one must come before. In different language, for all t and til exactly
one of
0<1<2<3< .. · (3)
The usual order of the positive and negative integers provides another
instance of these laws:
There are numerous other examples of these two formal laws. Hence it is
handy to have a name for this combined situation, as it might apply to
any set X (of instants of time or of integers or of rationals ... ).
A binary relation < on a set X specifies that x < y is true or false for
any two elements x,y in X; one might also say that the relation amounts
to specifying a set: The set of all those ordered pairs (x,y) with x < y. A
linearly ordered set is then a set X with a binary relation < for which the
laws (l) and (2) hold; in other words it is a set equipped with a transitive
and trichotomous relation <. One can then invent (or discover?) many
other examples of linearly ordered sets: Finite ones such as
1 < 2 < 3 < 4 or long infinite ones such as
o< 1< 2 < 3 < ... < w < w + 1 < w + 2 < ... , (6)
where w is the first thing beyond all the finite natural numbers. (This
linearly ordered set is actually the start of the infinite ordinal numbers.)
This definition is an easy first (of many) cases of a list of axioms
describing a common situation with many different examples. As in other
cases, the choice of axioms can vary. Thus, rather than using "before" and
"after", the passage of time can be described by the notion "not later
than", usually written t < t '. This alternative can be formalized for any
linearly ordered set X. Define x < y to mean x < y or x = y. This
binary relation on X is then
2<4<6<8<10< ...
4. Time and Order 15
When there is such an isomorphism [, X and Yare said to have the same
order type. (This is like the definition of "same cardinal number" except
that now one also keeps in mind the order of the elements being com-
pared.) One can then readily prove (say) that any linearly ordered set of 4
elements is order isomorphic to the standard such set: 1 < 2 < 3 < 4.
A general question is then at hand: Can one describe a particular
model of the axioms by giving enough additional axioms to determine the
model uniquely (i.e. uniquely up to an order isomorphism?) In the present
case, can one give properties of an ordered set X which imply the
existence of an order isomorphism X ..... N (or X ..... Q, the ordered set of
rationals, or X ..... R, the ordered set of reals?)
The answers are "yes". To get at the case of the reals R, one must for-
mulate the sense in which a real number (an instant of time) can be
approximated by rational numbers. For example, the real number 7T is
determined by the usual sequence of decimal approximations
3.14,3.141,3.1415,3.14159,3.141592, ....
Indeed, 7T is the "least upper bound" of this set of rational numbers. For-
mally, in a linearly ordered set X an element b is an upper bound for a
subset S of X if s ~ b for every s in S. Also, b is a least upper bound for
S if no b' with b' < b is an upper bound for S. This implies that if S
has a least upper bound, that least upper bound is unique. (This is the
sense in which 7T, for example, is determined uniquely by its decimal
expansion). Also, the set X is unbounded if there is in X no upper bound
and no lower bound. (For example, the ordered set N has a lower bound
0, hence is not unbounded).
The crucial property of the ordered set of real numbers is completeness:
Every non-empty subset S with an upper bound has a least upper bound.
The ;;tdditional fact that every real number can be approximated by
rationals can be made formal by stating that the set Q of rational
numbers is "dense" in R. Here a subset D of a linearly ordered set X is
said to be dense in X if, for all X < Y in X there is always a d in D
between X and y, so that X < d < y. It is then clear that the ordered set
R is complete, unbounded, and has a denumerable dense subset. Also
one can prove that any linearly ordered set X with these three properties
is order isomorphic to R (see Hausdorff). In the proof one uses a charac-
terization of the order type of Q: It is denumerable, unbounded, and
dense (as a subset of itself).
16 I. Origins of Formal Structure
This result does provide a description of the order of the real numbers.
In Chapter IV we will combine this with a description of their algebraic
properties. These properties also arise from experience with the passage
of time. Once intervals of time are measured by a clock (or an hourglass)
one can add one interval to another, and regard each instant of time t as
the end of an interval (from some starting time). This addition is then an
operation which produces to each pair t, t' of instants their sum, t + t',
with properties such as t + t' = t' + t and
-just like those for the addition of natural numbers. Again, different
examples lead to the same formal law.
Moreover the intent is that this distance is the shortest from p to q. (The
straight line is the shortest distance between two points.) In particular, this
means that the distance from p to q is not lessened when it is measured
along two straight lines going through a third intermediate point r. This
amounts to the (Figure 1) triangle axiom: For all p, q and r in X,
(4)
Figure I
18 I. Origins of Formal Structure
tl. This is called a rigid motion M; it assigns to every point p of the figure
concerned a new point Mp such that, for all p and q,
We write C = N·M for the composite and observe at once that if M and
N are rigid motions, so is C. For parametrized motions the addition of
time intervals usually corresponds to composites, in that
(7)
The axioms for a metric space show that any rigid motion M keeps dis-
tinct points distinct. Indeed, p =1= q implies by axiom (2) that p(p,q) =1= 0
and hence by the definition (5) of a motion that p(Mp,Mq) =1= 0), hence
Mp =1= Mq by axiom (2) again.
In studying the symmetry of a figure F, we usually consider a motion M
of F "into" itself; that is, a motion M such that p in F moves to some
M(p) in F and such that every point q of F comes from some p in F, so
that q = M(p). By the above, the motion M is therefore a bijection of F
to F, and so has an inverse M- I which is also a rigid motion of F to F.
However, the reader might wish to construct an infinite figure F (say
one in the plane) and a rigid motion M of F into F which is not onto F.
B
~c
B'
A A'~C'
Figure 2
6. Symmetry 19
6. Symmetry
Symmetrical objects are all about us. There are many (man-made) sym-
metrical figures (Figure 1). Each of the figures has vertical symmetry,
horizontal symmetry, and rotational symmetry. The vertical symmetry V
can be construed as a reflection of the figure in its vertical axis, and simi-
larly for the horizontal axis, H. The rotational symmetry can likewise be
regarded as a 180· rotation R of the figure about its center. If we think of
the figure as a metric space X, each of these symmetries is a rigid motion
M of X onto itself, and these four motions are the only such. This sug-
gests a definition of a symmetry of a figure F: A rigid motion of F onto
itself. In particular the different figures of (1) have by this definition the
same symmetry (later called the four-group).
By this definition, the composite of two symmetries of F is again a sym-
metry. Thus vertical reflection followed by another vertical reflection is
the identity (which thus must count as a symmetry). Again, vertical
reflection followed by horizontal reflection is the 180· rotation. This one
may check by actual experiments with a rectangular card-or one may
label the vertices of the rectangle by numbers 1, 2, 3, 4 so that V amounts
to the permutation (12)(34), H is (14)(23), and the composite H· V (first
apply V then H) is
1 ~ 2 ~ 3, 2 ~ 1 ~ 4, 3 ~ 4 ~ I, 4 ~ 3 ~ 2; (1)
this is the permutation (13)(24) given by the 180· rotation. Thus the total
list of symmetries for the Figure I is
~I 0-0
Figure 1
20 I. Origins of Formal Structure
2 4
Figure 2
• !.!.!
Figure 3
I I I
-3 -2 -1 0 2 3
Figure 4
7. Transformation Groups 21
7. Transformation Groups
A permutation of a set, a symmetry of a figure, and a motion of
Euclidean space are all examples of "transformations". A transformation
T of a set X is a bijection T: X ..... X; that is, a one-to-one correspondence
x I--> Tx on the elements x of X. Thus each transformation T has an inverse
T- 1: X ..... X; any two transformations Sand T have a composite S· T -first
apply T and then S.
A transformation group G on a set X is a non-empty set G of transfor-
mations T on X which contains with each T its inverse and with any two
transformations S, T in G their composite. This implies that G always
contains the identity transformation I on X:
(1)
(2)
(3)
f~ ~f (4)
x - - - - - -• x .
r
22 I. Origins of Formal Structure
(5)
to find the permutation, label each vertex by I, look to see where the ver-
tex goes, and read off its label (by I-I).
This result does formalize the evident fact that the permutations of a
typical set {l,2,3,4} of 4 things represent also the permutations of any set
of four things. Generally, if sets Y and X have the same cardinal number,
by a bijection I: Y --X, then the correspondence # of (5) is a bijection
from the permutation group of X to that of Y. Note incidentally that #
goes in the direction opposite to I.
However, this notion of a map is a bit complicated. Moreover, it
doesn't directly handle all the desired comparisons. Thus in (6.1) the dum-
bell Yand the perimeter X of the rectangle clearly have the "same" sym-
metries, but there is no evident way to get a map f: Y --x to make such a
comparison. Indeed, there is no such f -because the dumbell Y has a
center point left fixed by all the motions and there is no such point on the
perimeter of the rectangle. The two transformation groups in this case
can at least be compared through some intermediary-mapping each (say)
into a common (containing) such rectangle.
To summarize: symmetry forces us to consider transformation groups,
and even forces thoughts as to more abstractions from this notion.
8. Groups
For any three transformations R, S, and T of a set X the iterated compos-
ite, by its definition, satisfies
for all r, s, t in G.
(ii) A rule determining an element e (the unit, often written as e = 1) of
G such that, for all t in G,
te = t. (2)
tt- 1 = e. (3)
st = ts (4)
which might as well (for the sake of symmetry) be used as axioms in place
of (2) and (3). A group G may have subgroups S (a subset which is itself a
group under the same multiplication (and inverse)). If G is finite, its cardi-
nal number is called its order. One proves that the order of a subgroup is
always a divisor of the order of the group; this serves to understand and
explain some of the observations made above about the orders 8 and 4 of
subgroups of the symmetric group of four things. There are all manner of
constructions of particular groups. Thus to each positive n the cyclic group
24 I. Origins of Formal Structure
(6)
of cyclic groups. Moreover, the orders m] , ... ,mk of these factors can be
chosen so that each is a multiple of the next (and their product is the
order of G). We will be concerned with the origins of this theorem in
number theory (the multiplicative group of residues prime to m, modulo
m), in topology (the homology group of a finite complex described in
terms of Betti numbers and torsion coefficients). We are also concerned
with the question of the proper generality of such a theorem (is it really a
theorem about finitely generated modules over a principal ideal ring
(Algebra, p. 384)?). We are concerned with the additional concepts which
such a theorem brings to attention-for example, the notion of direct
product of groups and its eventual conceptual generalization to products
of other types of objects (rings, spaces) and finally, to products of objects
in a category (Categories Work, p. 68 or Chapter XI below).
For non-abelian groups G there is no structure theorem as simple as (6).
For example, the symmetric group on 3 letters {l,2,3} has order 6 but it is
not cyclic nor is it a product of cyclic groups (though it does have sub-
groups of orders 2 and 3). For such non-abelian groups there are instead
much deeper structural results (Chapter V). One may then ask why the
very simple group axioms lead to such deep structure.
Return to the idea of comparing two groups. For the case of a bijection
f: Y -->X which gives a map of one transformation group (H, Y) to (G,X)
we used #: G-->H with
for all Tin G, as in (7.5). Then for any composite S· T in G one gets
Thus arises the definition: for any two groups G and H a homomorphism
# or b: G-->H is a function assigning to each sin G an element bs in H in
such a way that
But geometry is not the only source of the idea of isomorphism. The
familiar property of the logarithm (say to base 10),
that always te = t", our axiom (ii) has specified that the element e is
"given". Indeed it can be "given" as a function e: {*} ~ G mapping the
one point set {*} into the element e of the set G. Such a function is a nul-
lary operation (on the set G). Thus the group axioms provide three opera-
tions
9. Boolean Algebra
Another example of an algebra is provided by the operations such as the
intersection and the union of subsets Sand T of a given set X. If we write
XES for "x is an element of S" and ¢ ; > for "if and only if', these
operations are specified by giving the elements of the resulting subset of X
as follows:
9. Boolean Algebra 27
snr s~r
ScSuT, T C S U T, (10)
which state that it is the least upper bound of Sand T in the partial order
given by inclusion. In an exactly dual way, the intersection S nTis the
greatest lower bound of the subsets Sand T. In other words, both these
Boolean operations can be described directly in terms of inclusion,
10. Calculus. Continuity, and Topology 29
long and painful historical development) comes down to make the famil-
iar (but meticulous) £ - 8 definition of continuity: A function f: R ..... R
is continuous at a point a E R if
For all real £ > 0 there is a real 8 > 0 such that, for all x in R, (1)
If Ix - a I < 8, then If(x) - f(a) I < £. (2)
If this statement holds for all points a E R, the function f is called con-
tinuous; the class of all such continuous functions is called C.
Note that the statement involves both propositional connectives ("if ...
then") and the so called "bounded" quantifiers (For all real numbers,
there exists a real number). Thus it is that careful formulations lead to the
use of concepts of formal logic.
Topological and metric spaces arise from analysis of this definition of
continuity. The inequalities used in the definition arise from ideas of
approximation (approximations of the value b = f( a) to within the accu-
racy £) and so implicitly involve the open interval I( (b) =
{y I I y-b I < £} of center b and "radius" £. In the familiar represen-
tation of the function f by its graph (the set of points (x, f(x)) in the
plane), this open interval appears as an open horizontal strip of width 2£
around y = f( a) (Figure I). The definition is concerned with those
points x E R for which f( x) lands in this interval I = I( (b )-this set of
points is usually called the inverse image of I under the function f, in
symbols:
f(a) = b 2€
1-------+
--------+-----~-------+---------x
Figure I
10. Calculus, Continuity, and Topology 31
(3)
Figure 2
other forms with the same center-so we get the same E - 8 continuous
function by just choosing different 8's to a given E.
What is the intrinsic formulation? In the alternative description of con-
tinuity stated in the theorem above there appeared unions of open inter-
vals and unions of discs. In any metric space X the open disc with center a
and radius 8 may be defined to be the set of all points x in X with
p(x,a) < 8. Now define an open set U in X to be any union (finite or
infinite) of open discs. Equivalently, a subset U is open in X if to each
point a E U there is a 8 > 0 such that p(x,a) < 8 implies x E U -
every point a in U is contained in an open disc centered at a and within
U. Now the continuity of f: X ...... Y is expressed in terms of open sets: f is
continuous if and only if the inverse image of any open set of Y is an
open set of X. This is the desired intrinsic formulation, independent of the
perhaps accidental choice of a metric. Specifically, the three different
metrics P = PI, P2, and P3 described above for the plane all yield the
same open sets, because any union of circular open discs is also a union
of open squares or of open diamonds, and conversely. In this way, the
notion of "open set" is more intrinsic than that of distance.
This suggests that a space can be defined directly in terms of its open
subsets. A topological space is a set X in which certain of the subsets U are
distinguished and called the open sets, and in which these open sets are
required to satisfy the following three axioms:
1. The intersection of two open sets is open;
2. The union of any collection of open sets is open;
3. X itself and the empty subset 0 c X are open.
A topology on a set X is then the specification of open subsets which
satisfy these axioms. Thus any metric X determines a topology, in which
the open sets are the unions of discs open in the metric. There are also
topologies not defined by way of a metric. For example, there is a topol-
ogy on the set N of natural numbers for which the open subsets are the
empty subset and all those subsets S with finite complement (in N). There
are many other examples of topologies.
The definition of continuity (the inverse image of any open set is open)
now applies to any function f: X ...... Y between topological spaces
10. Calculus, Continuity, and Topology 33
X and Y. The three axioms on open sets are enough to prove most of the
basic facts about continuous functions-for example, the fact that the
composite x I--> g(j(x)) of two continuous functions g and j is again con-
tinuous.
To describe continuity at a single point of a space, one may use the
notion of "neighborhood"" A neighborhood of a point a in a topological
space X is any open set of X which contains a. One then says that a func-
tion j: X --> Y between topological spaces in continuous at one point
a E X if to each neighborhood V of j(a) there is a neighborhood U of a
for which j(U) C V. This definition agrees with the previous notion of
continuity at a point for a metric and expresses the intuitive idea that
"nearby" points in U go into nearby points in V. Moreover, j is continu-
ous if and only if it is continuous in this sense at each point a E X.
Extensive experience has shown that this description of a "topology" in
terms of open sets and neighborhoods is extraordinarily effective in for-
mulating all sorts of Mathematical facts in a geometric form. The concept
of "topology" has been appropriately abstracted from the many examples
of "continuity".
The notion of a topological space was first presented by F. Hausdorff
in a famous (and beautiful) book Mengenlehre, His definition was formu-
lated differently, in terms of selected neighborhoods, and included an
added axiom (the Hausdorff separation axiom): Two distinct points have
disjoint neighborhoods, A topological space with this property is called a
Hausdorff space,
We have now seen a number of Mathematical concepts which are
described as sets-with-structure. Thus a linearly ordered set is a set
equipped with a binary relation < having certain specified properties. A
group is a set equipped with a binary, a unary, and a nullary operation,
which together satisfy certain identities. A Boolean algebras is similarly a
set with appropriate operations. A topological space X is a set-with-
structure, where in this case the "structure" consists of a specified collec-
tion of the subsets of X, namely the collection of all open sets. This kind
of structure is quite different in style from the algebraic structures. There
are also structures of a mixed kind. For example, there are cases of
motions (e.g., translations or rotations) which deal with a set of motions
which is both a group and a space. This leads to the notion of a topologi-
cal group, Such a group is a set G which is both a group and a topological
space and in which the group operations-both the product G X G -> G and
the inverse G->G-are continuous, It is this last condition which ties the
two structures together (to make the definition complete, one must know
how the topology on G induces, in a natural way, a topology on G X G).
As in this case, most composite axiomatic structures (combinations of two
kinds of structure on the same set) involve one or more axioms expressing
the formal connection between the two structures-here between the
group structure and the topology.
34 I. Origins of Formal Structure
Table 1.1
ax + by + cz = °
has infinitely many non-zero solutions, but all can be expressed as sums of
multiples of some two solutions-because, as we know, the set of all solu-
tions (x,y,z) is a plane through the origin in 3-space and any vector lying
in that plane is the sum of multiples of two suitable such. Again the solu-
tions of the homogeneous linear second order differential equation
d 2x Idt 2 = -k 2x all have the form
x = A cos kt + B sin kt ;
for some finite list of indices it , ... ,ih. In current terminology, this pro-
perty states that I is a compact subset (of R) and so leads to the idea of
compactness for topological spaces.
At the end of our study of structure, we will return to a more detailed
examination of these processes, internal to Mathematics, for the genera-
tion of new notions. They playa role counterpuntal to the input of prob-
lems from the sciences outside Mathematics. Both are accompanied by
the continued search for deeper properties of the notions already at hand.
m + °= m, m + n = n + m, (1)
k + (m + n) = (k + m) + n. (2)
These rules can be proved from the definitions of the operations. For
example, the commutative law (1) holds because, when two disjoint finite
sets are combined, the cardinal number of the combined set does not
depend on which of the two sets is taken first. On the other hand, the
rules are formal in the sense that they can be used directly without atten-
tion to their "meaning". For example, the associative law (2) tells me that
if I add a long column of figures in three successive groups, subsequently
combined, the final result will be the same, irrespective of the order in
which the three are combined. A similar rule will work for more than
three groups. Moreover, these (long-established) rules are inviolate: If it
doesn't tum out as they specify, I know that I have made a mistake some-
where. This is the merit of a formal rule: Once firmly established, it can
be applied mechanically and is an infallible guide.
Multiplication has corresponding formal properties:
Again, this law can be used formally, without attention to its origin in the
definitions of addition and multiplication, as suggested in the following
display:
(6)
m n m n
n = ak b
k + ak-l bk-I +. .. + al b + aD (7)
for some natural k and with coefficients ai all satisfying 0 ai < b. In<
particular, if b = 10, this is the decimal expansion of n, and its properties
lead to the familiar formal rules for manipulating decimals.
form (v). Using sets, one can also construct a model of these axioms
which is "non-standard" in the sense that the numbers are not exhausted
by taking successors (Chapter XI). However, this is not so for the set-
theoretic version with axiom (v'). As we will soon see, one can prove that
these Peano postulates do determine the natural numbers up to an iso-
morphism.
From the Peano postulates one can define all the familiar arithmetic
operations by recursion-specifying in succession the results of the opera-
tion. Thus addition (explicitly, the operation "add k") is defined by the
two recursion equations
k + 0= k, k + sn = s (k + n); (1)
(3)
A proof of this theorem uses axiom (v') and must depend upon the
set-theoretic definition of "function" as a table of values (see Chapter V),
but the idea behind any such proof is a straightforward use of induction,
as follows. For each n let n denote the finite set n = {0,l,2,3, ... , n} and
consider the following property P of n: There is a unique function
fn: n ~ X which satisfies (4) for m = 0,1,2, ... ,n-l. Clearly, P holds
for n = O. If n has the property P, we have the (unique) function fn
whose values may be listed as
o 2
foO /2 2
46 II. From Whole Numbers to Rational Numbers
0
Ii Ii
.. N
" • N
4. Number Theory
Once the Peano postulates are at hand, they yield all manner of specific
results. Division is sometimes but not always possible, but if one tries to
divide m by n one obtains a quotient q and a remainder r, which may be
o but in any event less than n, as in the equation m = nq + r with
<
o r < n. This result is known as the division algorithm. Those natural
numbers which have no divisors (except, of course, for themselves and 1)
are the primes; they appear in a curious irregular order:
Except for the prime 3 in the first sequence, all the primes must fall in the
last two sequences. It turns out that there are an infinite number of
primes in each of these two arithmetic sequences, and that they are, in a
sense, equally distributed between those two sequences. More generally,
Dirichlet's theorem asserts that any arithmetic sequence nd + r, for fixed
d and r and increasing n, will have an infinite number of primes, provided
only that d and r have no common factors except 1.
Every number can be written as a sum of at most four squares or of at
most nine cubes. These results have relatively elementary proofs; by
much deeper analysis for Waring's problem, similar results hold for higher
powers. By trial, one can verify that each small even number can be writ-
ten as a sum of two primes. Goldbach (in 1742) conjectured that this was
always true. To date, no one has proved this to be so. The best results to
date are VinogradofI's: Every sufficiently large odd number r is a sum of
three primes, and Chen's: Every sufficiently large even number is a sum
p + b, where p is a prime and b is either a prime or a product of two
primes.
Problems in Diophantine equations ask for solutions in integers and in
natural numbers. The equation x 2 + i = z2 has infinitely many (well-
known) solutions in non-zero integers x, y, and z, but the equation
x4 + l = z4 has none. Fermat stated, and no one has yet proved, that
xn + yn = zn for n > 2 has none. That the numbers of such solutions is
finite has just recently been proved (the Mordell conjecture). Pell's equa-
tion x 2 - Di = 1 has an infinite number of integer solutions, of
relevance to algebraic number theory.
50 II. From Whole Numbers to Rational Numbers
This is but a small sample of the wealth of questions arising for the
natural numbers. All these results are ultimate consequences of the struc-
ture specified with such simplicity by the five Peano postulates.
5. Integers
To keep accounts of gains and losses, subtraction is needed. Within the
natural numbers, subtraction is not always possible, but it becomes possi-
ble when the set N of all natural numbers is expanded to the set Z of
integers. One can formally define the integers (and arithmetic operations
upon them) in several ways. Perhaps the simplest is that of adjoining to
N a new copy of the positive numbers, each prefixed by -, as
-1,-2,-3,-4, .. '. Then addition of the old and new integers is
defined for natural numbers nand m in N in cases:
n + (- m) = n - m, if n >m,
-(m-n), if n < m,
(m,n) + (m',n') = (m +m', n +n'), (m,n )(m',n') = (mm' +nn', mn' +m') .
But beware: the pairs (m,n) and (m+h,n+h) should count as the same,
hence one defines (m,n) = (r,s) if and only if m + s = n + r, and
verifies that this artificial equality satisfies the expected rules; in particu-
lar, that sums and products of equals are equal. The integers, defined to
be these pairs with this equality, do not literally contain the natural
numbers from which we started, but the meaning of subtraction suggests
that each n in N be identified with the pair (n,O); this identification
preserves addition and multiplication. Stated more formally, this says that
the function N ...... Z given by n ~ (n,O) carries sums to sums, products to
products, inequalities to inequalities, and distinct numbers to distinct
integers; it is thus a monomorphism of the structure described by
+, X, and <.
These two constructions of the integers give essentially the same result.
Specifically, the map n ~ (n,O), -m ~ (O,m) is an isomorphism (for
6. Rational Numbers 51
6. Rational Numbers
To keep accounts, one often needs to divide numbers evenly into parts-
and this often cannot be accomplished with whole numbers. Fractions
provide the answer. They are introduced individually, as 1/2, 2/3, 1/5,
4/5, etc., and then manipulated in the evident way:
(m,n) + (m ',n ') = (mn' + nm', nn '), (m,n)(m ',n ') = (mm ',nn ') (2)
of the practical rules (1) and taking care to define the equality of (m,n)
and (r,s) by ms = nr. When m in N+ is identified with the pair (m, 1) this
again defines a minimal expansion of the set N + to a larger set in which
division is always possible and in which all the rules of arithmetic still
hold. As before, there is nothing unique about the formulation of this
construction. Instead, one might have used only those pairs (m,n) in which
m and n have no common factor (except I); in that case, one must modify
addition and multiplication in (1) to reduce each answer to lowest terms.
With this inconvenience, the "artificial" definition of equality of pairs is
avoided. Again, what matters is only the resulting structure up to isomor-
phism.
The system Q of all rationals may then be obtained from Q+, the posi-
tive rationals, by simply adjoining zero and negative rationals. Alterna-
tively, one may construct Q directly from Z by using all pairs (a,b) of
integers a,b in Z, with the same addition and multiplication as in (2)-and
the important proviso that the "denominator" b is never O.
As in previous cases, what matters is not the explicit definition of a
rational, but the resulting structure.
52 II. From Whole Numbers to Rational Numbers
7. Congruence
A typical clock runs up to the figure of 12 hours and then repeats, but one
can still do arithmetic on the limited list of hours: Seven hours after nine
o'clock is four o'clock. Similarly, in the decimal system there are only ten
digits 0, 1,2, ... , 9; the usual rules for addition and multiplication, ignor-
ing the carryover to the tens' place, work perfectly well for the manipula-
tion of these digits by themselves:
6 +7 = 3, 8 + 7 = 5, 8·3 = 4, 3·9 = 7.
These rules ignore all the multiples of 10; in a sexagesimal system there
are similar rules which ignore all the multiples of sixty. "Casting out
nines" is a rule for checking arithmetic calculations. This rule for checking
a multiplication says: Add up the digits of each factor, multiply the result-
ing sums, and check this against the digit sum of the original purported
answer. Thus 32 times 27 calculates to 864. To check, 32 becomes 5, 27
becomes 9, 5 times 9 is 45, with digit sum 9. This checks with the digit
sum in the purported answer, which is 8 + 6 + 4 = 18 with digit sum 9.
What happens here is that 32 is replaced by 5, casting out the difference
which is 27, or three nines. The reason it works is that factors differing by
a multiple of nine will have a product differing (at most) by a multiple of
9. In brief, arithmetic operations are valid "casting out 9's" or "modulo
9".
These examples each involve the use of a modulus: 12, 10, 60, or 9, as
°
the case may be. The general procedure is similar. For integers a, b, and
any natural number m oF as modulus, one writes a =: b (mod m), or
says that a is congruent to b for the modulus m, when the difference
a - b is a multiple of m. Then one readily proves the arithmetic rules: If
a =: band c =: d, both mod m, then
These formulas provide for a computation of any cp (m) from the prime
decomposition of m. We cite them to emphasize that the formulation of
congruence arises both from practice (multiplying hours or digits) and
from number theory.
To say that calculations with congruences are calculations with the
remainders is a bit artificial. Thus modulo 5 one could replace the five
remainders 0, 1,2,3,4 by the remainders -2, -1,0, 1,2 or by -4, -3,
- 2, -1, 0. Here, as always, mathematicians strive for an invariant formu-
lation. Each remainder r stands for (and may be replaced by) the
"congruence class" emr of all integers a with a == r (mod m). To add the
class emr to the class ems one may then take any representative a in emr,
any b in ems, add a and b, and take the class of this sum a + b as the
sum emr + ems. One must then prove that the resulting sum of classes
doesn't depend on the representatives a and b chosen-but this fact is just
a restatement of the rule (1) for adding two congruences. With this fact
established, we see that the collection Zm of all these congruence classes
em forms a system with binary operations of addition and multiplication,
and that the function em from Z to Zm, as in
8. Cardinal Numbers
The question arises: What, after all, is a natural number? One explana-
tion says that it is a cardinal number. For this purpose, as in § 1.2, define
two sets Sand S' to be equinumerous (or, cardinally equivalent), in sym-
bols S == S', when there is a bijection b: S ~S'; that is, a one-to-one
correspondence between Sand S'. This relation between sets is reflexive
(because the identity function is a bijection), symmetric (because the
inverse of a bijection is again such), and transitive (because the composite
of two bijections is again such).
Arithmetic operations work appropriately under this equivalence rela-
tion. To add two collections Sand T, first make sure that they are disjoint
(have no objects in common), then take as sum the collection S + T of
all objects in either S or T. This disjoint union is preserved by bijections:
if b: S ~S' and c: T ~ T' are bijections, they determine together a bijec-
tion S + T ~S' + T'. Therefore
On this basis, a finite cardinal number is just a finite set taken "modulo"
cardinal equivalence; that is, with cardinal equivalence taken as the
equality. In other words, a number is "represented" by a finite set, and
two sets count as the same if they are in bijection. Alternatively, if we
want not a "representative" of the number, but a single object, we may
define the cardinal number of S to be the set of all sets S' equinumerous
with S:
card S = {S' I S == S '} (4)
The arithmetic rules above for congruence then justify definitions of the
sum and the product of cardinal numbers by
In other words
(Call this the naive comprehension axiom.) One (standard) way to avoid
this trouble is to apply this comprehension axiom only to construct subsets
of some already given set W-as, for example, subsets of the set Z. This
"bounded" comprehension axiom then reads: Given a set Wand a prop-
erty P of sets, one may form the set
Later (in Chapter XI) we will examine this axiom in the context of a
systematic axiomatization of "set" and "element of a set". For this pur-
pose, we also assign to each set U its power set: All subsets of U:
PU = {S I S c U}. (10)
For the present, to explicate cardinal numbers, start with some (infinite)
initial set Vo, called a type or a "universe"; for example, Vo might be the
set of natural numbers or the set of all real numbers. The "next" type VI
is to consist of all the subsets S of Vo, so can be described as VI = P Vo.
Now we can describe the cardinal number of such a set S as the set of all
equinumerous S' which are also subsets of Vo
This cardinal number is then a set in the next higher type V2 = PVI •
This approach makes the cardinal number depend on a choice (or hierar-
chy) of types, but the definition (11), in contrast to (4), uses only the
bounded comprehension axiom so that Russell's paradox no longer arises.
The use of successive "universes" Vo, VI, V2 is a highly simplified version
of the "type theory" invented by Russell to avoid his paradox. (More care
is needed in details; for example to arrange that a type is closed under
products.)
Any such approach defines the finite cardinal numbers in terms of sets
and bijections and derives their arithmetic properties (addition, multipli-
cation, exponentiation) from properties of sets (and bijections). It serves
also to define the cardinal number of an infinite set, and arithmetic opera-
tions upon such infinite sets.
In this way the natural question: What is infinity? receives a numerical
answer. There are different sizes of infinity, and they can be measured by
suitable "cardinal" numbers which are subject to arithmetic operations.
9. Ordinal Numbers
The cardinal number of a finite set S does not depend on the order in
which the elements of S are presented or counted. However, for other
purposes, we often list the elements of a set in a given order, as we count
9. Ordinal Numbers 57
first, second, third, ... , and not just one, two, three, .... This leads to the
notion of an ordinal number. Such a number is attached to a suitable
linearly ordered set (P, <). If P I is a second such set, an order-
isomorphism, as in §IA, is a bijection I: P~P which preserves the order,
in the sense that PI < P2 in P implies IpI < Ip2 in pi; when there is
such an isomorphism, the ordered sets are said to be ordinally equivalent
(in symbols, P~PI). Then a finite ordinal number can be defined to be a
finite linearly ordered set P taken modulo ordinal equivalence-that is,
with ordinal equivalence as equality. Alternatively, as with congruence
classes of integers modulo m or as with cardinal numbers, we can define
finite ordinal numbers as equivalence classes in a type VI
here the reversal of the factors follows tradition. The result is independent
of the choices of P and Q within the classes ord P and ord Q because of
the rule (3). The definitions of sum and exponents for ordinals is analo-
gous. For finite ordinals, these operations agree with the corresponding
arithmetic operations on finite cardinals, because the cardinal number of a
finite ordered set does not depend on the order.
This is not so for infinite sets. If P is an infinite ordered set, the set
ord P defined in (3) is usually called the order type of P. For example, the
denumerably infinite set N+ of positive integers with its usual order has
an order type quite different from that of the set N- of negative integers
(N+ has a first element, but N- has none). The order type of N- is not
regarded as an infinite ordinal number.
58 II. From Whole Numbers to Rational Numbers
For ordinal numbers, the leading idea is that there is always a "next"
one-the first ordinal "beyond" a given set of ordinals. More generally,
this should mean that any set of ordinal numbers has a first element; this
leads to the definition: A (linearly) ordered set P is well-ordered when
every non-empty subset S c P has a first element in the given order. For
the natural numbers N, the axiom of induction is equivalent to the asser-
tion that every non-void subset T of N has a first element-that is, to the
requirement that N in its given order is well-ordered.
Generally, an ordinal number is defined to be the order type in the
sense (3) of some well-ordered set. This definition provides, for example,
for many infinite but denumerable ordinal numbers. The ordinal number
of the (well-ordered) set N is usually written as w; then the arithmetic
operations apply to w as well as to finite ordinals. For example,
w + w = w2 is the order type of two copies of N, one following the
other. On the other hand, 2w is the order type of w copies of 2, one fol-
lowing the other; hence 2w = wand 2w =1= w2! The first few infinite ordi-
nals come in order as
where each ... stands for the whole string of natural numbers from 3 on.
This display suggests that the ordinal numbers are themselves well-
ordered, where ord P < ord Q means that P is order isomorphic to an
"initial segment" of Q. Moreover, each ordinal number is the order type
of the ordered set of all preceding ordinals. This motivates an alternative
description of an ordinal number as the ordered set of all the preceding
ordinals:
I, I I, I I I, I I I I, I I I I I,···
lO. What Are Numbers? 59
(b) Natural numbers are finite sets, with equinumerous sets regarded as
equal numbers, while successor means "adjoin one more element".
(c) Natural numbers are cardinal-equivalence classes of finite sets.
(d) Natural numbers are ordinal-equivalence classes of finite ordered sets,
while successor means "adjoin one new element, to come after all the
others".
(e) Natural numbers are finite sets of sets, linearly ordered by the
membership relation, as in (6).
In each of these cases, the natural numbers as described do satisfy the
Peano postulates. From this multiplicity of answers to our question, we
must conclude that there is no answer to the question: What is a natural
number? There are various alternative concrete descriptions, depending
on the sort of counting intended or on the prior assumptions. In each
case, the description provides "numbers" which do satisfy the Peano pos-
tulates. Hence we conclude that one does not define what a natural
number "is", by itself. Instead, one defines the system of all natural
numbers, with successor operation. Then N is any such system which
satisfies the Peano postulates. This means that there are many such
systems within set theory-but that they are all isomorphic, just as in the
case above of the decimal notations.
Note that the postulates themselves are by no means unique; for exam-
ple, the Peano postulates may be replaced by the recursion theorem as an
axiom. Here, as in other axiomatic descriptions of mathematical objects,
there are a variety of choices for lists of axioms. Number theory, like
other subjects in Mathematics, is not the study of a unique model nor yet
the examination of a unique axiomatic system-it is rather a study of the
form exemplified by the various models and specified in the axioms.
To summarize: The natural numbers start out from elementary opera-
tions of counting, listing, and comparing; they then develop into effective
tools for calculation. The rules for calculation are formal and can be
organized as the consequences of simple systems of postulates. The conse-
quences of these postulates include the remarkably varied and rich prop-
erties studied in number theory-properties by no means apparent in the
original processes of counting and listing. They are none the less implicit
in these elementary human activities. Number theory is inevitable.
But number theory is not self-contained. First, calculations with natural
numbers do not allow all subtractions or all divisions, and so require the
60 II. From Whole Numbers to Rational Numbers
References. There are many sources for number theory; one good one is
Hardy and Wright [1983]. For a truth about natural numbers not
demonstrable from the Peano postulates, with induction on properties, see
the last article in Barwise: Handbook of Logic.
CHAPTER III
Geometry
I. Spatial Activities
Geometrical figures and facts arise from a wide array of activities and
observations. Some involve motion; thus in watching motion we see
objects falling in vertical lines, water waves expanding in circles from a
dropped stone, dun eland grasses moved by the wind to describe semicir-
cles on the sand, tree branches oscillating back and forth in the wind, long
straight lines of ocean waves approaching a beach, and the like. There are
also motions which we initiate and then watch-the thrown ball falling,
the log rolling downhill, the circle turned into a wheel for a cart, and so
on. Some activity involves construction. Thus a post or a column will stay
upright in balance if it is set perpendicular to the floor as a vertical; a
three-legged stool is more likely to be steady than one constructed with
four legs; rods joined to form a triangle stay rigid, but rods joined in a
square may wobble. Also, pieces of a board cut apart will fit together
again to make up the same size board. Mapping a labyrinth or painting a
scene calls for reproducing a variety of shapes, each on a smaller scale.
Fitting an object inside another highlights differences in shapes. To check
ahead of time that a fit will be possible may require a measurement of
length or of circumference. Estimating separation at a distance requires
lines of sight, angles, and triangles. These are but a few of the many
activities that go into the formation of geometrical iOeas.
62 III. Geometry
From these activities one may disengage various figures: Circles and
ovals; lines, horizontal, vertical, and perpendicular; triangles, equilateral
or otherwise; squares and rectangles; cubes, cylinders, as with logs or
rods, and spheres, as with oranges or balloons. One also discovers a
variety of facts about these figures: A triangle is rigid because three sides
(or two sides and the included angle) determine the triangle up to rigid
motion. A right triangle can come in various shapes, but in every case the
square constructed on the longest side is the sum of the squares on the
two shorter legs (a fact said to have been useful in re-establishing prop-
erty lines after the Nile had been in flood). Two different perpendiculars
to the same line would never meet. Two triangles with corresponding
angles equal have corresponding sides proportional-and so on.
After a considerable array of facts about geometrical figures are at
hand, some connections between these facts come to light. Once the area
of a triangle can be computed from the lengths of its sides, it is possible to
fit four congruent triangles together so as to prove the Pythagorean
theorem, by adding up the five areas in Figure 1. Or, the Pythagorean
theorem can be deduced from facts about ratios of sides in similar trian-
gles.
Explicitly (Figure 2), the perpendicular from the right angle divides the
hypotenuse e in Figure 2 into pieces hand k with h/a = ale and
k/b = ble. Clearing of fractions, he = a 2 and ke = b 2 , so that (h + k)e
= e 2 = a 2 + b2, q.e.d. Many more such logical relations between
geometrical facts gradually appear.
/
a
/ k
/
/
/
b
Figure 2
2. Proofs Without Figures 63
A~------------~--------~B
an "outside". One can also prove that any closed polygon (which does not
cross itself) divides the plane into two parts, and inside and an outside-
but the proof is quite involved because the polygon may be convoluted,
with many reentrant angles.
With these axioms, one can also define that is meant by a ray or "half
line" starting at a point A. If B is a second point and I the line determined
by A and B, then the ray r from A containing B consists of all the points
between A and B, the point B, and all points D with B between A and D.
An angle L rs between two rays rand s can then be defined as the figure
formed by two (distinct) rays rand s from the same point. In particular,
this defines the (three) angles of a triangle ABC. One also defines straight
angle (formed by two rays on the same line).
Group III (Congruence). Next one introduces two new undefined terms,
"Segment AB is congruent to segment A'B"', in symbols AB == A'B', and
"Angle rs is congruent to angle r' s "', in symbols Lrs == Lr's '. The
corresponding axioms require:
(l) Given a segment AB and a ray r from A " there is a unique point B'
on r with AB == A 'B'.
In simpler language, this states that we can "layoff" the length AB on a
given line from A ' in a given direction.
(2) Congruence of segments is reflexive, symmetric, and transitive.
(3) AB == A 'B' and BC == B'C' imply AC == A 'C', provided B is
between A and C and B' between A ' and C'.
This amounts to describing the addition of segments.
(4) Given an angle Lrs and a ray r' from A' on a line I, there is a unique
ray s' from A ' on a given side of I so that Lrs == Lr 's '.
This axiom specifies that one can "layoff" the angle Lrs from a given ray
r' at A ' and on a given side of the line of r '.
(5) If two triangles ABC and A 'B'C' have AB == A 'B', BC == B'C' and
LB == LB', then also LA == LA' and LC == LC'.
66 III. Geometry
Given this much about the two triangles, one proves that AC == A 'C' and
hence that the triangles are congruent. This is the familiar first congruence
theorem (side-angle-side, or SAS) of Euclidean geometry. It is convention-
ally "proved" by moving one triangle till its parts coincide with the other.
Though we regard such "motion" as lying in the practical origins of
geometry, it use in a formal axiomatic proof is not acceptable; hence this
axiom. From the congruence axioms, one can define right angles.
Two lines are defined to be parallel when they do not meet (have no
point in common). One then requires the following famous axiom:
Group IV (The Parallel Axiom). Given a point A not on the line m, there
is at most one line through A parallel to m.
A c
Dedekind's Axiom. Let all the points on a line I be divided into two non-
empty disjoint subsets Sand T in such a way that no point of S is
between two points of T and no point of T is between two points of S.
Then there is a unique point 0 on I with the following property: For
points A, B not 0 on I, 0 is between A and B if and only if A lies in Sand
B in T, or A in T and B in S.
In other words, the point 0 divides the line into Sand T. We will see in
Chapter IV that this axiom is modeled on a similar axiom for the real
numbers.
Lemma 1. If the interior angles a: and 13 on the same side of the transversal
k to I and m have sum a: + 13 = 180·, then the lines I and m are parallel.
PROOF. If I and m are not parallel, then by the definition of parallel these
lines must meet in some point P. By symmetry, we may take P to be on
the same side of k as the angles a and f3. Then (and with covert refer-
ences to Figure 2, which cannot "really" exist) we layoff on m, on the ray
opposite BP, a segment BP I == AP. Since LP I BA + f3 is a straight angle,
so equal to 180·, it follows that the angle pi BA must equal a. Therefore
the triangles piBA and PAB have corresponding sides (P' B == PA ),
angles a, and sides BA == AB congruent, so by the basic congruence
axiom (side-angle-side, or SAS), they are congruent triangles. Hence angle
P'AB of the first triangle is congruent to angle PBA (or f3) of the second.
Since f3 + a = 180·, this in turn means that the angle P'AB + BAP =
180· is a straight angle, so that P'AP is a straight line. It must then be the
given line I, which is thus revealed to be a line meeting m in two different
points P' and P on opposite sides of k (note the essential use of "sides" of
a line, which can be defined by using Pasch's axiom). From this contradic-
tion, we deduce that I must have been parallel to m.
Theorem 2. Through a point A not on a line m there exists at least one line
I parallel to m.
PROOF. On the line m there is at least one point B. Join this point to A.
The segment AB makes with m two angles y and f3 which together form a
straight angle, so y + f3 = 180·. Layoff the angle y at A, with one side
along the segment AB and the other on a ray AP along the (new) line I';
then BAP, on the same side as f3, is equal to y (Figure 3). By the lemma,
I' must then be parallel to m, as desired.
Up to this point, we have not invoked the parallel axiom.
/
I AI
~rT--r----
./' ex
./"""~ j - - - - -
- .......... "
- -------------~
~ ~ ..--- ........ / >,
p'C ./' m(J
_ _--
IE
I
Figure 2
3. The Parallel Axiom 69
i'
------p
Figure 3
A
--------- - - -I- - - - -
BL--------------------~C
Figure 4
70 III. Geometry
4. Hyperbolic Geometry
A set of axioms is said to be an independent set if no one of these axioms
can be deduced from the others. It is desirable and appropriate (though
not necessary) that the axioms for a basic structure, such as that of the
Euclidean plane, be independent. In particular, there is the question: Is
the parallel axiom independent, or can it be deduced from the others?
This question has had considerable historical importance. For example,
one might try to prove the parallel axiom by assuming the contrary (more
than one parallel to m through a point A) and deducing a contradiction.
There were several attempts to do this, most notably one in which Sac-
cheri in 1733 deduced a large number of consequences, some of them
perhaps bizarre-but none a contradiction. Nevertheless, he concluded
that Euclid's parallel postulate was "vindicated". Then in the 19th century
Bolyai, Lobachevsky, and Gauss took the opposite view, preparing to
develop systematically a non-Euclidean geometry (specifically, a hyper-
bolic geometry) on the assumption that there is more than one parallel,
and hence that the angle sum in a triangle is not 180°. When this is done
systematically, it turns out that the angle sum is always less than 180° and
that the difference between 180° and the sum is proportional to the area
of the triangle.
This striking development raised (at least) two questions: "Is the result-
ing geometry consistent?", and "Does it fit the real world?" To answer the
latter question, one must propose a specific "real world" interpretation of
the primitive concepts of the geometry-say, by taking a straight line to be
the path (in vacuum) of a ray of light, while an angle is the thing meas-
ured by a surveyor "turning off' with a transit the angle between two rays
of light. With this interpretation, it appears that Gauss (who was also
active as an astronomer) measured the angle sum for the triangle formed
by chosen "points" on the peaks of three convenient mountains in Ger-
many; the resulting angle sum was 180°, within the accuracy of the mea-
surements then made. While the result indicates that there is not a
flagrant deviation from Euclidean geometry on this interpretation, it does
not provide any clear decision between the reality of Euclidean and
hyperbolic geometry. It even suggests that there might never be such a
decision, in view of the inevitable margin of error in the measurements
made in any such interpretation. The terms involved in the interpretation
are also open to question; for example, in general relativity theory the
path of a light ray may not be "straight" in the intended sense. This ulti-
mately brings up another and more profitable thought: Any geometrical
axioms, Euclidean or non-Euclidean, offer a mathematical structure which
may be open to a variety of different interpretations to suit a variety of
geometrical (or even non-geometrical) circumstances.
There remains the question of the consistency of the assumptions of
hyperbolic geometry. By definition, these assumptions are consistent if
4. Hyperbolic Geometry 71
.
dIS(AB) AS /
= lo~ [AT BT
BS' 1 (1)
and observe that this means that as B approaches T along the pseudo-line,
with A fixed, the pseudo-distance AB will approach infinity. Then with
this "metric" (i.e. in this metric space), the length of the whole pseudo-
line will be infinite, as one might wish. All the congruence axioms for seg-
ments thus hold. The angle between two pseudo-lines at a point of inter-
Figure 1
72 III. Geometry
section is then taken to be the Euclidean angle between the Euclidean cir-
cles. One can then define congruent angles and layoff such angles as
required in the axiom. Moreover, drawing a triangle formed by three
pseudo-lines will readily show that the angle sum in such a (pseudo) trian-
gle will turn out to be less than 180·. All the axioms for hyperbolic
geometry hold in the interpretation. In particular, Figure 2 suggests that
through a pseudo-point A not on a pseudo-line m there will be many
pseudo-lines not meeting m, and hence parallel to m.
This Euclidean interpretation of hyperbolic geometry proves that hyper-
geometry are multiple. On the one hand, this geometry arises from mea-
surements: Do the angles in a triangle of light rays really add up to 180·,
This contradiction could then hold for the pseudo-points and the pseudo-
lines of this model, and so would apply to those points and circles of
Euclidean geometry-it would then be a contradiction in Euclidean
geometry. In this way, the interpretation provides a proof of relative con-
sistency. We will return later (Chapter XI) to the deeper question of abso-
lute consistency.
The intellectual origins of the structure of a hyperbolic non-Euclidean
geometry are multiple. On the one hand, this geometry arises from meas-
urements: Do the angles in a triangle of light rays really add up to 180·,
or to something else? On the other hand, this geometry also arises from a
formal study of the axioms for geometry. Is the parallel axiom necessary,
or can it be deduced from the other axioms? If not, what does this say
about geometries? The result is clearly the conclusion that there can be
different forms of geometrical theories. In the actual historical develop-
ment, this 19th century conclusion was a tremendous shock. The result
emphasizes the subtle nature of the primitive terms of a formal axiomatic
system: The straight line, assumed as a primitive term in the Hilbert
axiomatics, can indeed be variously interpreted-in particular, by objects,
such as our pseudo-lines, which are intuitively by no means straight.
The development of non-Euclidean geometry represents a major change
in the nature of Mathematics, from a science (of number and space) to a
study of form.
Figure 2
5. Elliptic Geometry 73
5. Elliptic Geometry
There is another non-Euclidean plane geometry, called elliptic geometry,
in which there are no parallel lines. The axioms needed to describe such a
structure must deviate from the parallel axiom and from other axioms of
Euclidean geometry, since the latter axioms by themselves (as in §3) sup-
pose the existence of some parallel lines. Instead of examining these
axioms, we will describe elliptic geometry by Euclidean models.
The proof of Lemma 3.1 already suggests what must happen in an
elliptic plane: Two perpendiculars at different points to a line k are likely
to meet on both sides of k (so something must break down in the
Euclidean description of the two "sides" of a line). More explicitly, if the
perpendiculars at two points A and B on k meet at some point P, then the
triangle APB has its base angles at A and B equal, hence is isosceles.
Then take another point C on k so that AB = BC (and B is between
A and C); then PB == PB, AB == BC, so the triangle PBA is congruent to
PBC, and then the corresponding sides PA and PC are equal (Figure 1).
Therefore PC is a third perpendicular to k, and all three of these perpen-
diculars to k (at points A, B, and C) meet at the single point P-and there
will be many more such perpendiculars from P to various points along k.
This configuration seems implausible in our usual plane, but it suggests
a different interpretation. Let a "point" be a Euclidean point on some
fixed sphere, while "line" is a great circle on that sphere. Then the equa-
tor is a "line", and so is the Greenwich meridian-and the various meridi-
ans from the north pole N do realize on the sphere the curious behavior
suggested in Figure 2. In this interpretation one may define congruence
in terms of distance (the length of arc along a great circle) and angle mea-
sure (the usual angle between two great circles). The result is a geometry
in which the appropriate axioms hold. In particular, it is clear from exam-
ples that the sum of the three interior angles in a triangle for this
geometry is always greater than l80 o-a result opposite to that in hyper-
bolic geometry. This model is sometimes called double elliptic plane
geometry-"double" because any two lines (two great circles) meet in two
points (diametrically opposite on the sphere).
p
Figure 1
74 III. Geometry
The axiom that two "lines" meet in a "point": can be rescued by con-
structing a single elliptic interpretation: Identify diametrically opposite
points in the previous model. Thus a "point" of this new model is a pair
of diametrically opposite points on the fixed sphere, a "line" is a great cir-
cle on the sphere, the "distance" from A to B is the shorter of the great-
circular arcs from A to B or its diametrical opposite, and so on. Put
differently, the geometry of great circles on the sphere is a non-Euclidean
geometry.
This is a special case of the geometry of geodesics on a curved surface
S. A curve y on S is said to be a geodesic if, given two nearby points A
and B on y, the length of y from A to B is less than the length of any
other curve from A to B on the surface. (It is necessary to speak here of
"nearby" points; on the sphere, a great circle from the north pole N is no
longer a shortest distance when one travels along it beyond the south
pole.) Hyperbolic plane geometry can also be represented by such a
model. Take the tractrix, a curve in the xy plane given by the equations in
a parameter 0,
as in Figure 3. Rotate this curve about the y axis to form a surface. The
y
L-----------------~x
Figure 3
6. Geometric Magnitude 75
6. Geometric Magnitude
Hilbert's (and Euclid's) axioms for Euclidean geometry have been
expressed in terms of congruence, and not in terms of distance, as it is
usually measured by a real number. In other words, the geometric
approach has been formulated independently of any use of real numbers
(but does employ natural numbers, to state the Archimedean axiom). In
fact, the geometric approach can be used to give a geometric description
of magnitudes (i.e., of real numbers). We now sketch briefly how this
might be done.
Fix on a line k and choose on it a point 0 (as origin) and another point
U, calling the segment OU the unit magnitude. Then by a magnitude one
means any segment OD from the origin on this line k; one of the two rays
from 0 on k is chosen to be positive, and the segments OD on that ray are
called positive magnitudes (Figure I). These magnitudes are ordered;
specifically, any negative magnitude is by definition less than any positive
magnitude; again, given two positive magnitudes OD / and OD, the first is
less than the second when D / lies between 0 and D on k. Integral magni-
tudes (OU2,OU3, •.. ) may be obtained by simply laying off the same seg-
ment OU repeatedly along the ray au.
A segment AB anywhere in the plane is then measured by some (posi-
tive) magnitude: Simply layoff AB as a segment OD along the positive ray
on k from 0, and use the corresponding magnitude OD as the measure.
These magnitudes can be added (in the evident way) and multplied, by
using similar triangles (see §IV.3). A magnitude OD is rational if there are
integers m and n with m(OD) = n(OU). There are irrational magnitudes,
such as the hypotenuse of an isosceles right triangle with legs of unit
length. Any irrational magnitude can be approximated by rational ones.
Such a geometric theory of magnitude was present in Euclid's
geometry. The theory there avoided the choice of any unit of measure-
ment, and so operated wholly with ratios. Thus what we have called the
magnitude OD would appear in Euclid as a ratio OD IOU. Euclid then used
proportions, which are simply equalities of ratios; our approach would
o u D
Figure I
76 III. Geometry
and
In effect, this describes the ratio OD/OU in terms of the set of those frac-
tions n /m with n /m < OD IOU; similar ideas will appear in our descrip-
tion in Chapter IV of real numbers in terms of rationals. The essential
observation is that the axioms for plane geometry suffice to give a
geometric theory of real numbers.
Angular magnitudes may also be compared and measured. Here an
angle, as in the axioms, is an angle rs between two rays rand s emanating
from a common origin O. Once a unit angle has been chosen, it can be
bisected repeatedly to yield binary fractions of this unit and then, by
approximations, real number measures of all angles. Once the perimeter
27T of the unit circle has been determined, one may take as unit an angle
of one radian-that is, the angle at the center of a circle of radius I which
is sub tended by a circular arc of length 1. This assigns to a right angle the
familiar measure 7T 12, and gives measures ranging from 0 to 7T for all the
angles between two rays rand s. Angle measures greater than 7T apply
only to angles from a ray r to a ray s in (say) the counterclockwise
direction-and this uses the notion of an orientation, to be discussed
below in §8.
For Greek Mathematics, as this discussion indicates, magnitudes were
geometric rather than arithmetic; western Mathematics, as we will see in
Chapter IV, has reversed this emphasis. Oswald Spengler, in his book The
Decline of the West, has argued that this means that there are two wholly
different "Mathematics", as parts of two different cultures. Our position is
rather that congruence and geometric ratios on the one hand and Dede-
kind cuts on the other are just two different careful formalizations of the
same underlying idea of magnitude-and that this point exemplifies the
unity of idea behind the inevitably varied form.
7. Geometry by Motion
Geometry need not be static. Intuitively, it is concerned with the ways in
which "objects" can be moved around in an ambient "space". From this
7. Geometry by Motion 77
point of view space is really there just as a receptacle for motion. For
example, the congruence theorems for triangles in plane geometry can be
viewed as descriptions of conditions when one triangle can be moved so
as to coincide with another. The motions involved include translations,
rotations, and reflection. All are familiar from early practical activities; all
are examples of "rigid motions".
A rigid motion of the plane is a bijection A 4 A', B 4 B' on the points
of the plane such that each segment AB is congruent to its image A 'B'; in
other words, it is a transformation preserving distance (§I.5). From this it
follows that every angle is congruent to its image, for every angle LABC is
part of a triangle ABC which is moved to an image triangle A 'B'C' with
corresponding sides congruent-and hence, by a congruence theorem, with
corresponding angles congruent. Since a straight line is the shortest distance
between any two of its points, it also follows that a rigid motion must take
the points on a line I into points on some line I'; in other words, it moves I
to I'. Moreover, it preserves betweenness on I: If C is between A and B on I,
then the image point C' is between A ' and B' on I'. A rigid motion also
takes each side of I into one of the sides of I'; indeed, D and E lie on the
same side of I when the segment CD does not meet I; consequently the
images D' and E' lie on the same side of the image line I'. All told, a rigid
motion of the plane is an "automorphism": An isomorphism A 4 A "
I 4/' of the whole structure of the plane.
From the definition it follows at once that the composite of two rigid
motions is again rigid, and also that the inverse of any rigid motion is
itself a rigid motion. Hence the rigid motions of the plane form a group.
Theorem 1. Any rigid motion which leaves each of three non-collinear points
fixed is the identity.
Indeed, if the rigid motion M leaves two distinct points 0 and A fixed, it
must leave the whole line I through 0 and A fixed. Since a third point not
on the line is fixed, the motion carries each side of the line I into itself.
Since any other point B on one side of the line is determined by the dis-
tances OB and AB in the triangle OAB, this point B must also be fixed.
The several familiar kinds of rigid motions can be described and
analysed directly from the axioms, although they are often described by
equations in cartesian coordinates.
Intuitively, a translation is a motion which moves every point in the
same "direction" by the same amount. Formally, a translation is a motion
such that, if A 4 A', B 4 B', then AA' is parallel to BB' and A' B' is
parallel to AB. Given A 4 A' and B, these conditions fix B' as the fourth
vertex of the parallelogram with sides AA ' and AB, as in Figure 1. It fol-
lows that, given points A and A " there is a unique translation T which
moves A to A '. If C is a third point in the plane which is similarly
translated to a point C', consideration of the congruent triangles
78 III. Geometry
c -----------
r
/
/
/
/
A~
Y' _ _ _ _ _ _ _ _ _ _ _ ~
Figure 1
A CB == A 'C' B' yields the congruence BC == B' C'. This last congruence
means that the translation T sending B to B' and C to C' does preserve
distance. Hence T is a rigid motion, according to our definition of such
motions.
The identity is a translation, the inverse of a translation is a translation,
and the composite of two translations is again a translation. Hence all the
translations of the plane form a group under composition, called the
translation group H of the plane. It is an abelian group. If each translation
T is represented by a vector (from a chosen point 0, the origin, to the
image TO = 0' of that point), the translation group H is just the group of
these vectors under the operation called vector addition. Once 0 is chosen,
the possible translations of the plane are determined by the points 0' of
the plane; that is, by the vectors 00' in the plane.
The intuitive idea of a rotation is direct: Take a circular disc, fix the
center 0 and spin the disc. This idea leads to a formal definition: A rota-
tion R of the plane is a rigid motion which leaves exactly one point fixed;
it is called a rotation "about" that point. For completeness, the identity
transformation is also counted as a rotation (about every point in the
plane). From this definition it follows that the inverse of a rotation is a
rotation-about the same point O. All the rotations about a point 0 do
form a group, but to show this one needs
Theorem 2. If the distinct segments OA and OA ' are congruent, with 0 =1= A,
there is exactly one rotation R about 0 taking A to A '.
To construct such an R, first assume that 0, A and A ' are not collinear.
We need to construct the image X' of each point X =1= 0, A. Since the
desired rotation must preserve angles, the angle X 'OA' must be chosen
equal to the given angle XOA; this is surely possible, since the congruence
axiom 111.4 of §2 specifies that a given angle between rays can be "laid
off' on a given side of another ray. It remains to choose which side of
OA '. We know in general that all the points on one side of OA must go
into points on one side of OA '; after inspecting Figure 2 we propose the
rule (which could be formulated without "looking" at a figure) that the
side of OA containing A' rotate to the side of OA' not containing A. In
more detail:
7. Geometry by Motion 79
y'
A'
x'
x
A
Figure 2
(i) If X and A' are on the same side of OA, put the image point X', with
OX == OX', on the side of OA ' opposite A;
(ii) If Y and A' are on opposite sides of OA, put the image point Y', with
oY == 0 Y', on the side of OA ' containing A.
This prescription covers all cases. From the figure, it then appears that
LXO Y == LX '0 Y'; from the axioms of congruence one can prove that this
is always the case, so that the rotation X I-> X', Y I-> Y' does carry each
angle at 0 into a congruent angle and each segment from 0 into a
congruent segment. From this it then follows by the congruence axioms
that this rotation R carries every segment XY into a congruent segment
X' Y'. Therefore R is a rigid motion leaving only 0 fixed, as desired.
It remains to prove that this transformation R is the only rotation about
o taking A to A '. But the only alternative would be to make the opposite
choices in rules (i) and (ii) above; in this case one sees that the ray OB
bisecting the angle AOA' (see Figure 2) must go into itself, so that B == B'
would be an added fixed point, contrary to the definition of a rotation.
(Incidentally, the opposite choice of the rules (i) and (ii) would construct a
different rigid motion; namely, the motion reflecting the whole plane in
the angle bisector OB.)
In the excluded case when A and A' are on the same line through 0,
with A =1= A " the image of each point X may be constructed by prolong-
ing XO over 0 to the point X' opposite X, with OX == OX'. This motion
X I-> X' is then a rotation about 0 by the angle 'fT, sometimes called a
"half-tum". In each case, the rotation R constructed is the only rotation
with A I-> A '. From this uniqueness it follows that the composite of two
rotations about 0, if not the identity, leaves only 0 fixed, hence is a rota-
tion. Therefore all these rotations about 0 form a group.
It is remarkable that the simple idea "Hold 0 fixed and swing A to A '
leads to such a subtle proof, using sides of lines. This subtlety reflects the
80 III. Geometry
(I)
is the translation TJ taking 0 to the point RO'. This can be proved for-
mally from the axioms, or checked visually by showing that the compos-
ites RT and TJ R have the same effect on the points of the right angle
AOB in Figure 3. Similarly (Figure 3 again), if T translates 0 to 0' and A
to A " while R rotates OA to OA", then the composite
(2)
is the rotation about 0' taking 0 'A' to 0 '(TA "). Recall here that TRT- J is
called a conjugate of R, so this equation states that translation T conju-
gates a rotation about 0 into a rotation (through the same angle) about 0 '.
Similarly, equation (I) states that the conjugate of a translations by a rota-
tion is another translation.
These equations enable us to prove a familiar fact:
B"
A"
RO'
B B' B B'
B'~A'"
\~
\
\\
\~---1--+-_ _ _-!:-:-_-I A'
A o A 0' A'
Figure 3
7. Geometry by Motion 81
o,L
Figure 4
B
I
A
I I
I II
I
I I
I
I I
I A'
I
B'
Figure 5
82 III. Geometry
Theorem 4. Let I be any line in the plane. Any rigid motion of the plane is
either a proper rigid motion P or a composite P·L where L is the reflection
in the line I.
PROOF. The motion is determined by what happens to each of three non-
collinear points; that is, by what happens to a triangle. (Hence the impor-
tance of triangles in Euclidean geometry.) So consider the triangle OAB
with side OA along the given line I and its image 0'A 'B' under the motion
(Figure 6). Since OA == 0 'A', there is a proper motion (rotation R fol-
lowed by translation 1) which takes OA to 0'A '; one can then place the
image B' on the desired side of I, by using first a reflection L in I, if
necessary. Hence the motion has a unique representation as either T·R or
T· R· L, with R a rotation about O.
These theorems serve to show that groups (transformations and their
composites) are firmly embedded (though not explicit) in classical
Euclidean geometry.
I'
0'
B I
I
o
~A I
I" ................ " .......
..... ......: A'
Figure 6
8. Orientation
Hitherto "angle" has meant the angle between two rays rand s from a
point O. For trigonometry and elsewhere we also need to use the angle
from the ray r to the ray s. But this is ambiguous (Figure 1) unless we
specify that we mean the "counterclockwise" angle from r to s. But this is
meaningless for the plane as it has been axiomatized, because the counter-
clockwise sense, viewed from in front of the plane, will be clockwise when
viewed from behind the plane. Thus we must make a deliberate choice of
one of the two possible "senses" for angles. Such a choice is called on
orientation of the plane, and the choice of an orientation is an additional
structure on the Euclidean plane.
Practically speaking, someone (the first clockmaker?) chooses a direc-
tion of rotation for his clock; then one can carry this clock (or copies, such
as wrist watches) around so as to determine the same clockwise sense for
all other clocks. Here to "carry around" means a proper rigid motion; this
idea we can now formalize (in the plane).
To fix the ideas, consider, instead of clock-hands, ordered right angles
from AO to EO, with legs say of unit length, as in Figure 2. Here
LAOB and LA '0' B' have the same sense, while LA "0" B" has the oppo-
site sense. Formally, two such ordered right angles have the same sense if
and only if the first can be carried into the second by a proper rigid
motion (a rotation followed by a translation). Since there is exactly one
such proper motion carrying OA into the congruent segment 0'A " there
are exactly two possible senses, as given by the opposite perpendiculars
0' B' and 0' B'" in the middle of Figure 2. Put differently, the ordered
right angles with unit legs fall into two distinct classes (or "orbits") under
the action of the group of all proper rigid motions. Choosing one of these
classes is the choice of a "sense" or an orientation of the plane; for
instance, the chosen sense may be called the counterclockwise one. The
choice is usually made by picking some one right angle, from OA to OB, to
be counterclockwise.
Once the orientation is made by this choice of LAOB, one can introduce
the usual four "quadrants" I-IV for this right angle and then cartesian
coordinates, since we now have a "positive" direction OA or OB on each
axis of Figure 3. We can also consider and measure ordered angles (say,
the angle from the ray OA to the ray Ot in Figure 3) on the familiar radian
scale from 0 to 2'17, so that these angles are real numbers taken modulo 2'17.
We can also speak of the "left" side of the ray OA; it comprises all the
B'
0'
o
A'
B"
Figure 2
84 III. Geometry
II
-+.---'----- A
III IV
Figure 3
rays from 0 in quadrants I and II. We will write sLr for rays rand s from
oto say that s is on the left side of r.
A plane with orientation is really not the same object as one without.
The plane with an orientation has more structure-namely, the choice of
the orientation. At the same time, it has less symmetry; the automorphism
group of the oriented plane is the group of all proper rigid motions (i.e.,
no reflections), while that of the unoriented plane is the group of all rigid
motions, including the reflections. This is a first example of a striking
observation: A geometrical "thing" (the plane) can be formalized in
different ways to give different mathematical objects. In the oriented
plane, one has angles up to 2," and coordinates as well-but these are not
present in the unoriented plane described by the Euclidean axioms.
The added structure of the oriented plane has been described as a
"choice of an orientation". It can be formulated in other ways, so as to
resemble an "axiomatic" structure. For example, consider the relation sLr
for "s is on the left side of r", where rand s are rays from a common
point O. If -r denotes the ray opposite r, but along the same line through
0, this relation has the following properties:
9. Groups in Geometry
Rigid motions not only carry points into points, but they also carry seg-
ments into (congruent) segments, angles into (congruent) angles, and so
on; one says that the group E of all rigid motions "acts" on the set of seg-
ments or on the set of angles, and that the "orbit" of an angle is the set of
all congruent angles. This example motivates a general definition. A group
G is said to act on a set X when for each g in G and each x in X there is
given an element gx in X (the result of g acting on x) in such a way that
Ix =x (1)
for all gl and g2 in G, for the identity I of G, and for all x in X. The orbit
of an element x in X under this action is the set Gx of all gx for g in G,
and the action is transitive when Gx = X for all x; that is, when to each
x and y in X there is at least one g with gx = y. In any case, the ele-
ments fin G which leave a given point x fixed form a subgroup Fx of the
group G, called the subgroup fiXing x or the isotropy group of x.
For example, when the group Eo of proper rigid motions acts on the
points of the plane, the isotropy group Fx of each point x is the group of
rotations about x. Any two such groups Fx and Fy are isomorphic. All the
proper motions carrying x to y can be written as products TR for R in Fx
and T any motion (say a translation) carrying x to y. All these motions
constitute a so-called "coset"
map x to the same point on the orbit, so that the set of cosets corresponds
to the set of points in the orbit, as in the bijection
86 III. Geometry
r r r r
Fx hFx IFx mFx '" .
(4)
x hx Ix mx, ..
where Eo is the group of proper rigid motions and H the group of transla-
tions. The plane has just two orientations, call them i and t . The
whole group E acts on the set of these orientations, in that each rigid
motion M induce a permutation a(M) of the set {i , t }; thus when
M E Eo is a proper rigid motion, the orientation is unchanged, so a( M)
is the identity permutation, while a reflection L interchanges the two
orientations. For the composite M·M' of two motions the effect is
HR = {all TR IT E H}.
The cosets correspond to the rotations R, so that the cosets may be said to
form a group Eo IH isomorphic to the group of rotations about O.
These observations, and many like them, indicate the very close relation
between Euclidean geometry and group theory-so close that one might
say that groups were implicit (though never explicit) in traditional
geometry. For these reasons, it is clear that the basic ideas of group theory
belong early in the conceptual order of mathematical structures. Histori-
cally, groups did not appear until the 19th century, implicitly with Gauss
and others and explicitly with Galois. When they were fully recognized,
they were applied promptly to geometry, by Klein, Lie, and others. In
particular, hyperbolic and elliptic geometries also involve appropriate
groups of motions-as does solid geometry.
.
B
p o p'
.
C
Figure 1
posite of two reflections, the points A may be identified with those involu-
tions which can be written as the composite of two involutions, while the
lines k are iderttified with those involutions (to wit, the L k ) which are not
such a composite. The point A lies on the line k if and if only if L k • RA =
RA ·Lk . If the motion M takes the point A to the point B, then the half-
tum RB is clearly MRA M- 1, so the motion M conjugates the involution
RA to the involution RBo One may similarly describe the effects of any
motion on a line, represented as an involution.
This start can be continued to give a complete list of axioms on the
group E sufficient to describe E as (generated by the involutions
Figure 2
II. Solid Geometry 89
representing) the points and lines of the Euclidean plane. Details are
given in Bachmann (1973) and in Guggenheimer (1967). The reader may
wish to formulate for himself some geometric facts in group-theoretic
form. For instance, AB = BC states that B is the midpoint of AC!
AB (Figure 1). This gives a "spherical triangle" ABC. One may show that
S·R leaves C fixed and hence is a rotation (through what angle?) about
CO. This curious fact about rotations is one of the results of solid
geometry much used in the study of rotations in mechanics. However,
even this apparently "solid" geometrical result has its predecessor for the
plane: There is a similar description of the composite of two rotations
(about different points) in the plane.
The orientation of 3-dimensional space also involves no essentially new
ideas. To describe it, consider three perpendicular rays x, y, z from a point
o (they are, in effect, coordinate axes). Then the ordered triple (x,y,z) is
defined to have the same orientation as a similar triple (x ',y ',z ') if and
only if there is a rotation leaving 0 fixed and carrying x to x', y to y', and
z to z '. This definition provides for exactly two orientations at 0 Gust as in
the plane). For, one can always rotate x to x' (about an axis perpendicu-
lar to the plane determined by x and x '). The new position of y can then
be rotated (about the axis x') to the position y' (Figure 2). The resulting
position of z must then be perpendicular to x' and y' and hence must
coincide either with z' or with its opposite-therefore there are just two
orientations at O. Note that the triple (x,y,z) has the same orientation as its
cyclic permutation (y,z,x)-just rotate x to y (about z) and then the new y
(about the old y) to z. In other words, the orientation of (x,y,z) is
Figure 1
y
y'
°v----x
z
z'
x'
Figure 2
12. Is Geometry a Science? 91
geometries of plane or of space are indeed possible and useful, and they
can be reduced to axiomatic form. Hence geometry is a variety of intellec-
tual structures, closely related to each other and to the original experi-
ences of space and motion.
There arise from this study other structures which are less geometric-
distance and angle as geometrical magnitudes, algebraic manipulations of
these magnitudes, and thus real numbers, developed geometrically. There
also arise structures which are not geometric at all-groups are implicitly
present in the transformations of geometry. Logic is (historically) first
fully exemplified in the deductive structure of Euclidean geometry. Con-
tinuity and topology are hidden there as well.
The development of geometry also turns up a number of general ideas.
The very formulation of axiomatic geometry requires ideas of line, plane,
angle, and triangle. Later developments tum up more subtle ideas, such as
that of orientation or of composition of motions. Geometry is indeed an
elaborate web of perceptions, deductions, figures, and ideas.
CHAPTER IV
Real Numbers
1. Measures of Magnitude
This chapter will explore the origins and development of the system of
real numbers. These numbers form the most central structure in all of
Mathematics. Like other structures it arises from more primitive human
activities-in this case, from the measurement and comparison of magni-
tudes of various kinds.
Comparison of magnitudes, in its simplest qualitative form, may just
assert that "A is bigger than B" without specifying by how much A is
bigger. The idea of such qllalitative comparisons leads to the formal
notion of a linear order, as already discussed in §I.4. Now we are con-
cerned with the associated quantitative question "A is how much bigger
than B?". Such questions arise in a number of different regards "A is how
much further away" or "how much heavier", or "how much longer", or
"how much later". Such comparisons are not limited to just two objects
A and B, but may well intercompare the sizes of three, four, or many
objects. This makes it effective not just to compare two objects, but to
locate all the relevant objects on some on scale of sizes. Once a unit of
size is chosen, the scale becomes a scale of numbers. It is a familiar but
nonetheless remarkable fact that one single scale of numbers will be
applicable to each of many types of quantitative comparisons: To dis-
tance, to weight, to length, to width, to temperature, to time, to height,
and so on. Once a unit is chosen, each of these magnitudes exemplifies
one and the same scale: That of real numbers, considered as a scale laid
out as the points of a line with chosen origin and unit point; and so
emphasizing the interpretation of the scale by distances:
I I
-2 -1 o 2 3
The ubiquity of this scale may account in part for the prominence of the
real numbers in Mathematics. This ubiquity can also be read as a state-
94 IV. Real Numbers
ment about the nature of the world: All sorts of physical magnitudes can
be reduced to one scale-a situation which has sometimes misled social
scientists to use fake magnitudes.
The importance of this scale depends also on the fact that the scale is
complete. It is not limited just to the whole number points and not even
just to the rational points. All the points at irrational distances from 0 are
to be included, as, for example, Y2. By the Pythagorean theorem, this
number, considered as a length, is the diagonal of a unit square; it cannot
be expressed as a rational distance as Y2 = min. (Proof: reduce to
lowest terms, then one of m or n is odd, but Cv1)2 = 2 = m 2 1n 2 gives
2n 2 = m 2, so m and then also n must be even.) There are many other
such irrationals which are algebraic (roots of a polynomial equation with
integer coefficients, as in x 2 - 2 = 0). In addition, there are numbers on
the scale which are transcendental (i.e., not algebraic). The first example is
77, the ratio of circumference to diameter in any circle, but the proof that
77 is transcendental is not easy. Another transcendental number is the base
e of natural logarithms.
There are many more transcendentals. Completeness (to be formulated
below) is the assurance that they are all there. They are not all in use at
anyone time-but their presence for potential use is what makes the scale
effective.
Any real number t can be written modulo 2'TT as t = to + 2'TTk for some
integer k, and so determines the angle () = ()t with measure ()t = to. The
function t ~ ()t then "wraps" the whole real line around the circle S 1,
sending t to the point P; i.e., to the angle ()t = LA CPo Moreover this
wrapping function is periodic:
(2)
Figure 1
96 IV. Real Numbers
~ _ _-.,7r/2
Figure 2
3. Manipulations of Magnitudes
When two objects are put together, side by side or end to end, the magni-
tude of this combination is just the sum of the two separate magnitudes.
This operation, called addition, arises for all sorts of magnitudes-for dis-
tance, weight, time, height, area, and the like. Geometrically, the addition
of two segments consists in laying off one segment after the other along
the line. This geometric operation corresponds exactly to the arithmetic
operation of adding numbers.
A second operation on magnitudes, that of multiplication, is suggested
both by the multiplication of numbers and by geometric formulas; thus
the area of a rectangle is obtained by multiplying its base by its height. A
complete geometric description of the multiplication of segments requires
more than one line in the plane. Thus to multiply two positive linear mag-
nitudes x and y one may represent them on two intersecting linear scales:
Segment OA and then AB on the first line with measures I and x, respec-
tively, and then OA' with measure y on a second ray from O. Then
drawing (Figure I) BB' parallel to AA ' constructs similar triangles OAA '
and OBB', while the proportionality theorem for similar triangles makes
OA 'lOA = A'B' lAB and hence shows that A 'B' represents the magni-
tude xy. This may be regarded as the geometric definition of multiplica-
tion of magnitudes.
Other types of magnitudes such as weight or time may more easily be
multiplied first by whole numbers-thus to multiply the weight of a given
item by three, take the combined weight of three such items. By division,
this extends first to multiplication by rational numbers and then by con-
tinuity to the multiplication of a weight by any number, rational or irra-
tional. It is again remarkable that one gets the same operation of multipli-
cation for all these types of magnitudes.
To summarize: The "practical" operations of addition and multiplica-
tion on various types of magnitudes lead to the algebraic operations of
sums and product for the real numbers on the linear scale. The various
rules for these manipulations of numbers were well known before they
O . . . . . : : : : : - - - - ; - - - - - A . L - - - - -X- - - -B
-:-------
Figure 1
98 IV. Real Numbers
4. Comparison of Magnitudes
Many practical observations about magnitudes amount to the determina-
tion of which of two magnitudes is the greater. In geometry, the notion of
betweenness provides for such a comparison of the magnitudes of two
segments. On a line, the directed segment AB is less than the directed seg-
ment AC if and only if B lies between A and C. If the real numbers band
4. Comparison of Magnitudes 99
(Note that here too there is just one axiom connecting order and addition,
and just one axiom connecting order and multiplication.) Observe also
that the rational numbers (as well as the reals) form an ordered field in
this sense.
These axioms provide for all the usual manipulations of inequalities. In
particular, a real number e is positive if and only if e > 0, while the abso-
lute value I b I of a number b is b or -b according as b >
0 or b < o.
Then one proves rules such as
b<mln<c. (3)
The idea of the proof is that c - b > 0, so must exceed the reciprocal
1In of some integer n. This can be formalized as follows. Since
c - b > 0, the Archimedean law provides a natural number n with
n(c-b) > 1; that is, with c > b + lin. By the same law and mathemati-
cal induction, there is also a smallest natural number m with m·l > nb.
This means that b < min. Since m is the smallest such natural number,
we must also have m - I <
nb; that is, min <
b + I In and hence
min < c. Thus min is between band c, just as required.
A similar argument will show that the interval from b to c contains a
rational with denominator a power of 2. This suggests again that every
such number, such as b, can be approximated by rationals with denomi-
nators powers of 2.
Now that the notions of "order" and "absolute value" are formalized,
one may also give a formal definition of the idea that a sequence con-
verges to a limit. To say that a sequence {an} = al ,a2, . .. of real
numbers an converges to a real number b as limit should mean that the
successive terms an get close to b-ultimately closer than any preassigned
measure. Here "ultimately" is to mean "after some index n", while the
preassigned measure is to be a positive (but "small") real number, usually
written £ > 0. The idea "ultimately closer than this £" is better expressed
in the opposite order, that given such an £ one can find an index n beyond
which an is indeed that close to b. In formal language, this becomes the
all £ > °
find an index n) have been replaced by the use of logical quantifiers (For
there exists a k ... ). The particular choice of k is not relevant,
for once k works for an £, so will any larger natural number k '. The infor-
mal idea that an gets close to b seems to have disappeared, but it is
covered in the phrase "for all positive £"-and hence in particular for a
small £. Indeed, once the statement is true for one £, it automatically holds
for any larger £2, and with the same k. For instance, by the Archimedean
law there is to each positive £2 a natural number m > °with 11m < £2.
Hence for convergence it is enough to require that to each natural
4. Comparison of Magnitudes 101
m(m-l) 2
(1 + x)m = 1 + mx + x + ... + xm,
1·2
but when m is a fraction this becomes an infinite series in powers of x
m(m-1) 2
(1 + xt = 1 + mx + 1·2
x +
+ m( m - 1) ... (m - n - 1) X
n
+ ...
1·2 ... n
The formula suggest an infinite number of additions. This infinite opera-
tion is not literally possible; instead one approximates by finite sums.
The study of convergent series is essentially equivalent to that of con-
vergent sequences. An infinite series is a formal infinite sum
c] + C2 + C2 + ...
102 IV. Real Numbers
of real numbers Ci, and is said to converge to the limit b if and only if the
sequence Sn = Cj + C2 + ... Cn of "partial sums" of the series con-
verges to b. One could, vice versa, define convergent sequences in terms of
convergent series, and the understanding of either notion is improved at
the hand of examples of series which do not converge-as for instance
with the harmonic series 1 + 1/2 + 1/3 + 1/4 + .... Infinite series also
occur essentially in complex analysis (§X.7), to expand known functions
and to define new ones.
with real number coefficients and with only a finite number of negative
powers of I-but with no convergence required. Add two such "formal"
series by adding the corresponding coefficients, and multiply two such
~ries in a purely formal way; for example
By suitable calculations, one may verify that these elements form a field,
called the field of formal power series R( (I)). One may order this field by
specifying that s > 0 if and only if its first non-vanishing coefficient a- n
is positive. Then also s > 0 if and only if s - I is positive. This means, in
partic~lar, that 1 > 0 and also that.1 > 0; how~ver,. no. integral. multipl.e
of I wIll be as large as I-so that I IS a sort of mfimteslmal, whIle I - IS
"infinitely large" compared to 1. Moreover, the infinite series s of ele-
ments in this field is convergent in the sense of our definition of
convergence-using among the ( > 0 all the powers ( = In for n > o.
this property is called a complete ordered field. The real numbers form
such a field. The formal power series do not.
This completeness axiom implies the Archimedean law. For suppose to
the contrary that there were positive reals a > 0 and b > 0 such that no
multiple na exceeded b. Then the set S of all multiples na, for n a natural
number, has an upper bound, namely b. By the completeness axiom, S
then has some least upper bound, call it b*; thus b* ;;:: na for all n. This
also means that b* ;;:: (n+ l)a for all n, and so that b*-a ;;:: na for all n.
Thus b* - a is less than b* and is also an upper bound for S, a contra-
diction to the choice of b* as a least upper bound for S.
The force of the completeness axiom is to insure that all the real
numbers that ought to be there are there. For example, the irrational Y2
must be there, as the least upper bound of the set 1, 1.4, 1.41, 1.414, ... of
rationals (those approximating Y2). Similarly 'TT must be there, as the least
upper bound of the set of decimals 3, 3.1, 3.14, 3.141, 3.1415, 3.14159, ....
Indeed, since there is a rational between any two reals, any real can be
expressed as the least upper bound of a set of rationals. The completeness
axiom is also used in more sophisticated ways, for example in the proof of
Rolle's theorem and of the mean value theorem of the calculus (Chapter
V).
There are other equivalent forms of the completeness axiom. Instead of
requiring that every bounded set of reals have a least upper bound, one
may require anyone of the following:
Dedekind Cut Axiom. If the set R of reals is the union of two disjoint
non-empty subsets L and V such that x EL and y E V imply x < y, then
there is a real number r such that x <
r for all x EL and r <
y for all
YEV.
°
multiplication. Then °
Suppose that R' is any complete ordered field, with unit element I' for
< I', for otherwise I' < 0, which would give
This set L has the special property that x < y and y E L imply x E L, so it
is the lower half of a "cut" in the rationals. Since it is bounded, it has an
(ordinary) real number r as least upper bound, and r in turn determines
L. The one-to-one correspondence r ~ r' then maps the ordinary real
numbers on the given complete ordered field R '. One can show that it
preserves sums, products and order, hence it is the desired order isomor-
phism R == R'.
Note that this argument makes essential use of sets, such as the sets L
of rationals. In this it is like the use of sets in the induction axioms to
prove that the Peano postulates uniquely determine the natural numbers.
We now have two different axiomatic descriptions of the reals: Here,
the realjield, described as a complete ordered field, in §I.4 the real contin-
uum, described as an unbounded ordered set with a denumerable dense
subset. Each description determines the set of reals uniquely, up to an iso-
morphism of the structure concerned. However the structures are drasti-
cally different. The real continuum has only the order structure, and there
are many automorphisms of this structure. The real field has both order
and algebraic structure, and its only automorphism (by a proof like that
just above) is the identity automorphism. These differing structures on the
same "thing" (here the reals) are much like the differing structures on the
plane (without and with orientations, as in §III.8). These differing struc-
tures furthermore reflect practical differences. Thus the ordered contin-
uum handles comparisons of many items where one knows only which of
two items is the larger, with no measure of "how much" larger.
In either structure, the real numbers form a "continuous" scale. Physi-
cists (and others) sometimes suggest that an "atomic" or "discrete" scale
would be more "real"; finitists propose finite scales of magnitude. These
proposals turn out to be hard to execute-the continuous scale of reals
works smoothly!
6. Arithmetic Construction of the Reals lOS
The axiomatic approach assumes that the real numbers are already there,
as geometrical or other magnitudes, and describes them-uniquely up to
an isomorphism-by axioms. There is a wholly different and deliberately
arithmetic approach, in which one starts from the natural numbers and
constructs the reals as sets of natural numbers. Since we already have con-
structed the rationals, it will be sufficient (and more appropriate) to con-
struct the real numbers directly from the rationals and sets of rationals in
such a way that the resulting reals do satisfy the axioms.
This construction can be done in several ways, in parallel to the
different forms of the completeness axiom. The construction by Dedekind
cuts in the rationals is perhaps the most direct. Define a Dedekind cut in
the field Q of rational numbers to be a pair of disjoint non-empty subsets
(L, U) with union Q such that x ELand y E U imply x < y; to get
uniqueness, require that the "lower set" L have no maximal element. Any
such cut is determined by the set L alone, because it is a bounded non-
empty set L of rationals, with no maximal element, which does not con-
tain all rationals but has x ' E L whenever x' < x E L. Call such a set L a
real number. To add two such real numbers Land L', take the sum L" to
be the set of all sums x + x' of rationals x EL and x' EL'. Multiplica-
tion is more delicate, because the product of two negative rational
numbers is positive. If both the real numbers Land L' contain positive
rationals, take their product LL' to be the set of all products xx' of
rationals x EL and x' EL', with at least one of x,x' positive. The remain-
ing cases of multiplication can then be handled by replacing the set L by
its negative, defined as the set of all rational y with x + y < 0 for every
x E L. Finally, define the order by specifying that L =< L' if and only if
L c L'. A systematic proof (cf. Landau [19S1]) then shows that the real
numbers L so constructed do form a complete ordered field R. Moreover,
the rationals Q can be embedded in R by the monomorphism (of ordered
fields) which sends each rational x into the set L of all rationals y with
y < x. (This may recall the uniqueness proof of §S.)
There is an alternative construction of the reals as Cauchy sequences of
rationals. Since different such Cauchy sequences may have "the same"
limit, one must for this construction consider two Cauchy sequences {an}
and {b n } as equivalent when the sequence an - bn converges to O. Under
suitable operations of addition and multiplication, and with a suitable
linear order, these equivalence classes of Cauchy sequences form a com-
plete ordered field. By the uniqueness of such fields, this field is iso-
morphic to the field of reals constructed by Dedekind cuts. However, in
other foundations of Mathematics in terms of elementary topoi the
Dedekind and Cauchy sequence constructions may yield differing results
(see Johnstone [1977]).
106 IV. Real Numbers
(This latter is essentially the formal product of the two intended infinite
series.)
This construction really amounts to the use of the sequence
Sn = C, + .. , + Cn which is both monotone and bounded:
7. Vector Geometry
The real numbers provide a one-dimensional geometry-a line with a
chosen origin and unit. The higher dimensional analog is vector geometry.
Thus in the Euclidean plane, choose an origin 0 and a segment OU
representing the unit of distance. Then each point P in the plane is
represented by the directed segment OP, called a vector v. Vectors OP and
OQ can be added by the usual parallelogram law: Form the parallelogram
with edges OP and OQ, and take the diagonal vector OR as the sum (Fig-
ure 1). Under this addition, the vectors form an abelian group; it is just
the group of all translations of the plane (§III.7). Using the chosen unit
segment OU, each real number r can be laid off along the line OU. The
geometric construction of multiplication by r (Figure 3.1) then defines an
°
operation of multiplying the vector v = OR by the real number (scalar) r
to give the vector rv along the line of OR. If r > it is in the direction of
v; if r < 0, it is in the opposite direction, in both cases with I r I times
---- ------
Q
o
------ -
u
Figure 1
108 IV. Real Numbers
for all vectors v and wand all scalars r,s E R. In view of these properties
of addition and of scalar multiplication, we say that the vectors form a
vector space over the real numbers.
Many geometric facts can be established readily by the use of vector
algebra. For example, this algebra gives an easy proof of the theorem that
the three medians of a triangle meet in a point.
Vectors may also be introduced in 3-dimensional spaces (and in higher
dimensional spaces); moreover, exactly the same formulas (1) and (2)
apply, whatever the dimension. More generally, a vector space V over a
field F is an abelian group (under addition) with another operation
F X V --+ V, written (r, v) f-> rv for elements rEF, v E V, which satisfies
with addition the four laws (l) and (2) above. One can also describe for-
mally the "dimension" of such a space. In the plane, if Ul and U2 are two
non-collinear vectors, every vector v in the plane can be written as a
"linear combination" V= Xl Ul + X2U2 with unique choices for the
scalars Xl and X2. In three dimensions, we need three such vectors Ul, U2
and U3, along three "axes". More generally, vectors Ul , . . . ,Un in V are
said to form a basis for the vector space V if every vector v in V has a
unique expression
v= Xl UI + ... + XnUn ;
then the Xi are the coordinates of v relative to the basis Ui' A first theorem
of vector geometry (= linear algebra) states that any two (finite) bases for
a given vector space must have the same number of elements. This
number n is the dimension of the space.
Vectors also provide a convenient description for certain functions
transforming the plane into itself (and leaving the origin fixed):
Transformations such as expansions from the origin (v f-> rv) for a fixed
r, compressions on a line through the origin, and shears. All of these
transformations preserve the vector operations. More generally, a linear
transformation on V is a function t: V --+ V which preserves both addition
of vectors and multiplication of vectors by scalars. This means that the
equations
hold for all vectors v and wand all scalars r. For example, a rotation
about the chosen origin 0 is a linear transformation (it maps the parallelo-
8. Analytic Geometry 109
(4)
8. Analytic Geometry
Another and earlier reduction of plane geometry to algebra is provided by
the familiar method of cartesian coordinates. Given an orientation, a
choice of origin and unit, and two perpendicular coordinate axes, each
point P is represented by its coordinates, a pair (x,y) of real numbers.
Each line in the plane may be described as the set of those points whose
coordinates satisfy a linear equation, while the distance between two
points is given by the familiar formula in coordinates, derived from the
Pythagorean theorem. In this way, all sort of geometric facts about the
plane are handled by algebraic machinery; in effect, the plane is reduced
to the cartesian product R X R = R2 of two copies of the real line R.
However, this reduction does depend on the choice of origin and of axes,
and one must betimes verify that truly geometric facts are independent of
the choice of coordinates.
Geometry and coordinates arise first in dimensions 2 and 3. The need
for higher dimensional geometry is motivated by phenomena which need
110 IV. Real Numbers
9. Trigonometry
Trigonometry is essentially a procedure for turning angular measures into
linear measures. This appear directly in the definition of the two basic
°
trigonometric functions sin () and cos () of the angle (). In the oriented
plane take a point P on the unit circle with center at the origin so that
the segment op makes the given angle () with the X coordinate axis. Then
cos () and sin () are defined to be the X and y coordinates of P; since the
circle has radius 1, this immediately gives the identity
(1)
This defines sin () and cos () for angles () of all sizes (not just for an angle ()
in the first quadrant, as displayed in Figure 1).
When angles are measured (in radians) by numbers t we can also think
of the sine and the cosine as functions of the number t, so that
Thus there are really two legally different functions: The Sine of a
number, here with capital S, and the sine of an angle, with lower case s.
This pedantic (but real!) difference is usually ignored. It implicitly
9. Trigonometry III
L-------~~~--------~-----x
(1,0)
Figure 1
involves the wrapping function (}t of §2 above. Recall that this function
sends each real t to the point P = (cos (}t, sin (}r) on the circle such that
the length of the counterclockwise circular arc from (1,0) to P is congruent
to t, modulo 2'1T. Then the definition (2) reads
In other words, Sin and Cos are periodic functions, of period 2'1T, as in the
familiar graph of Figure 2. Now that this has been stated, we drop the S
in Sin and the function (}r; they would just get in the way of trigonometric
manipulations, but we emphasize that the wrapping function accounts for
radian measure and the periodicity (4) of the trigonometric functions.
Indeed, the whole study of periodic functions, as carried on in Fourier
analysis, concerns the expression of more or less arbitrary functions f(t)
of period 2'1T in terms of the periodic functions sin nt and cos nt for the
natural numbers n.
A rotation of the circle of Figure 1 about the origin through the angle ()
will carry the point (1,0) to our point P and the point (0,1) to a
corresponding point P'; thus this rotation has the effect
sin t
112 IV. Real Numbers
(1,0) f-> (cos O,sin 0), (0,1) f-> (-sin O,cos 0) (5)
Since (I,Q) and (0,1) form a basis for the 2-dimensional vector space, any
point Q with coordinates (x,y) can be written as the linear combination
(x,y) = x(l,O) + y(O,I) of the two basis vectors. Now rotation is, as
already noted, a linear transformation, so preserves linear combinations
and thus by (5) has the effect
for a rotation (x,y) f-> (x',y') of the plane about the origin. Also, if one
takes Q = (x,y) to be that point Q on the unit circle of Figure 1 for
which OQ makes the angle cp with the positive x-axis, then x = cos cp,
y = sin cp. Now the rotation by 0 clearly carries Q to a point QI where
OQ' makes the angle cp + 0 with the positive x-axis. The coordinates
(x',y') of Q' are then cos(cp + 0), sin(cp + 0) and the equations (6) for
rotation become
b c
e ¢
O~~-----M~---------------L--~B
Figure 3
9. Trigonometry II3
at B, the definition of the sine function shows that the length AM can be
expressed either as b sin () or as c sin €J>; this gives the law of sines:
blc = sin €J> Isin (). On the other hand, one may compute the length c of
the side AB from the Pythagorean theorem for the right triangle AMB to
be
(8)
(9)
valid for all vectors u, v, VI, and V2 and all scalars tl and t2' All the pro-
perties of the inner product follow from these equations. Indeed, given
these equations-for a product U·V of vectors in a two dimensional vector
space over R-one can prove that there is a choice of basis and
corresponding coordinates for which the given inner product is expressed
by the formula (9). Such a basis is called a normal orthogonal basis (for
the given inner product).
We will extend these ideas to higher dimensions in §VII.5.
114 IV. Real Numbers
equations such as x 2 + 1 = °
and x 2 + x + 1 = °
the reals R to include such limits. Finally, there are plausible polynomial
with real
coefficients which do not have real solutions; hence construct the complex
numbers C to provide solutions for such equations.
Were there a solution i for x 2 + 1 = 0, there would also be combina-
tions a + bi for any real numbers a and b. They could be manipulated
by using the algebraic rules for the reals and the fact that i 2 = -1. This
would yield in particular the following rules for addition and multiplica-
tion:
it follows readily that these pairs do form a field, call it C. The given real
numbers are embedded monomorphically in C by the map a-->(a,O); in
particular I becomes (1,0). Moreover, by the multiplication rule (3) the
pair i = (0,1) has i 2 = (-1,0). Then any pair (a,b) can be rewritten as a
linear combination
exactly to the points (x,y) in the cartesian plane (Figure I) with the real
numbers x + iO placed along the x axis, as usual, while the purely ima-
ginary numbers iy lie along the y axis. This provides a geometric con-
struction of the complex numbers with each complex number z a vector
OZ from the origin, while the addition of complex numbers is just addi-
tion of the corresponding vectors. The complex number z may also be
described with polar coordinates: The counterclockwise angle 0 from the
positive x-axis to the vector OZ and the length r >
0 of this vector; one
calls 0 the argument of z and r its absolute value, and writes
In other words
z =x + iy
---¥~--~-----------------.x
Figure I
116 IV. Real Numbers
remarkable fact that this one adjunction suffices to provide solutions for
all such polynomial equations. Indeed, the fundamental theorem of alge-
bra first proved by C.F. Gauss asserts that any equation
anx n + an_1X n- 1 + ... + alX + ao = 0 with complex coefficients ai
and with an =F 0 has at least one complex number as root. The proof is
not easy. At least one form of the proof (presented in Birkhoff-Mac Lane,
Chapter V) uses heavily the geometric realization of the complex numbers
in the complex plane. Another proof will be indicated in §X.9.
This construction of the complex numbers does not depend on any spe-
cial virtues of the equation x 2 + 1 = 0; other polynomial equations can
serve as well. For example, the equation x 2 + x + 1 = 0 also has no
real root, because the minimum of x 2 + x + 1 for x real lies at
x = - 1/2 and is 3/4 > O. One might then introduce a new symbol w for
a non-real root of this equation and manipulate the combinations a + bw
for a and b real, using the rule w2 = - W - 1, derived from this equation,
in multiplying two such symbols. The resulting symbols again form a
field; however, this field is isomophic to C under the map sending w to
- 112 + (V3/2)i. Indeed w3 = _w2 - w = 1, so this symbol w is in
fact a (complex) cube root of 1. By the rule for products of complex
numbers, the argument of any cube root of I must be 0 or ±2'7T/3; we
have taken w = cos(2'7T13) + i sin(2'7T13).
There could be a different choice as
w' = cos( 4'71' /3) + i sine4'71' /3) = cos(2'7T /3) - i sin(2'7T /3).
sphere onto the plane. This means that one prolongs the line NP from the
north pole N of the sphere to P till it meets the plane at a point P' (Fig-
ure 1). This provides a transformation P I-> P' which is a bijection from
the sphere (omitting the north pole) to the plane. It is called stereographic
projection. It carries the south pole of the sphere to the origin in the plane
and the equator to the circle of radius 2 with center at the origin. Each
line L in the plane is the projection of a circle passing through the north
pole of the sphere-it is the circle cut out from the sphere by the plane
passing through the line L and the pole N. Also, if two lines Land L' in
the plane meet at an angle (J, the corresponding circles on the sphere meet
at the same angle (J on the sphere. Thus stereographic projection is a con-
formal transformation, in the sense that it preserves angles-though it
manifestly does not preserve distances.
Under this stereographic projection all the points P on the sphere reap-
pear as points P' on the plane-except for the north pole N on the sphere.
But one can "extend" the plane by adding a point 00, called the point "at
infinity" and then specify that stereo graphic projection sends the north
pole N on the sphere to 00 on the extended plane; it is then a bijection
from the whole sphere to the whole (extended) plane. As a point P' on
the plane moves along a straight line L away from the origin, the
corresponding point P on the sphere moves toward the north pole N.
Hence the point 00 at infinity in the extended plane lies on all lines in the
plane, moreover, as a complex number z "approaches 00" in absolute
value, the corresponding point in the extended plane approaches this
point 00. This can be formalized by defining "neighborhoods" of 00 so as
to make the extended plane a topological space.
The virtue of the extended plane (or the corresponding Riemann
Sphere) is that functions such as w = liz are now defined for all z.
Specifically, for z = rete =1= 0, the corresponding w = liz is (lIr)e -te
(draw a picture), while for z = 0, W = 00 and for z = 00, W = 0. The
N~,
\ '"
\ '-
\ Q \-,,>O..,.--------~
/-------,--" \
\ Q'
\
\
where the point z = -die which makes the denominator zero is sent Lv
the point at 00 in the w-plane. These transformations form a group, with
many fascinating properties. But the extended complex plane is useful in
many other ways, for the effective geometric understanding of more gen-
eral functions w = f(z) of a complex variable z (Chapter X).
found, it developed that there can be real cubics with all real roots for
which the formula inevitably involves the use of complex numbers (For
details, see for example Birkhoff-Mac Lane, Chapter XV, Theorem 22.)
In other words, there are phenomena with real numbers which can be
properly explained only with complex numbers. In electricity and magne-
tism, complex numbers provide a convenient formalism. Finally, many
ordinary functions (eX, sin x) of a real variable are better understood
when they are extended to be functions (e Z, sin z) of a complex variable
z. As we will see in Chapter X, the use of complex numbers yield a much
deeper understanding of the nature of a "well behaved" function. In
short, the real "foundation" for complex numbers lies in their manifold
uses in better understanding of Mathematics. This is typical of the con-
ceptual expansion of Mathematics.
The complex numbers are just one of the many new aspects which
expanded the scope of Mathematics in the 19th century. Others include
the use of integers modulo m and congruence classes, non-Euclidean
geometry, n-dimensional geometry, decompositions of algebraic numbers
into "ideal" factors (§XII.3), non-commutative multiplication in groups
and quaternions, infinite cardinal and ordinal numbers, and the various
necessary uses of sets and of logic, especially in the foundation of calculus
(Chapter VI). These 19th century developments made obsolete the simple
view that Mathematics is the science of number, space, time, and motion.
N,Z,Q,R,C
have aimed to preserve the appropriate algebraic properties. All of these
systems, save the natural numbers, are commutative rings and the final
three are fields. Thus properties deduced from the axioms for a field hold
in all three cases-as well as in other fields such as the finite fields Z/(p)
for prime p. The familiar formula for finding roots for a quadratic
equation applies to anyone of these fields, as do the less familiar solu-
tions of cubic and quartic equations. The same applies to the solution of
simultaneous linear equations and to determinant formulas for their
solution-although the determinant can also be given a geometric
interpretation as an area (for 2 X 2 determinants) or as a volume in the 3
X 3 case (Birkhoff-Mac Lane, §10.3). Matrices and linear algebra work
120 IV. Real Numbers
for every field. In this way the rules for the manipulation of ordinary
numbers lead to codified rules for the manipulation of all sorts of
numbers, and the various newer sorts of numbers were constructed so as
to obey these rules. But it turns out that addition and multiplication
apply not just to "numbers" but also to other mathematical objects: To
polynomials, to congruence classes of integers, to formal power series (§4)
and the like. Algebra replaces numbers by symbols; it is then possible that
these symbols do not stand for numbers or "quantities". Hence it
develops that the actual subject matter of algebra is not the manipulations
of numbers under addition and multiplication, but the manipulation of
any objects satisfying the rules (the axioms for a ring or for a field) for
such manipulation. Algebra is thus inevitably abstract, even though this
was not fully recognized until the 1920's when Emmy Noether and her
disciples saw clearly that many algebraic phenomena could be grasped
more effectively when stated and proved abstractly.
plication, with vector addition, makes R4 a field; i.e. a system with com-
mutative multiplication. In a field, the equation x 2 + 1 = 0 can have at
most two solutions; in the division ring of quaternion there are many:
±i, ±j, ±k, for example. Moreover, the multiplication table (2) is not
just "pulled out of the air"; it can be used to describe rotations in three
space much as the complex numbers of absolute value 1 describe rotations
in the plane. Also the multiplication rule for two "pure" quaternions
(t = 0) reads
The first term on the right is the "inner product" of the vectors (x,y,z)
and (x',y',z'); the remaining terms form the "vector product". It is no
wonder that texts in Physics still write vectors with the three basis units
i, J, and k.
All this suggests that the quaternions are inevitable (though perhaps
shocking to the commutativity-minded). They are. Consider any division
ring D which contains the real numbers R as a subring; then D is a vector
space over R. A famous theorem then asserts: If D is finite-dimensional
(over R), then it must be isomorphic (as a ring, preserving R) to the
quaternions, or to the complex numbers, or to the reals themselves. There
are no such finite dimensional division rings beyond dimension 4, and in
dimension 4 the quaternions (possibly with a different choice of basis) is
the only possibility!
15. Summary
Measures of many different sorts of magnitudes are all reduced to and
codified by the real numbers. The simple formal axioms for the reals
involves both algebraic aspects (the field axioms on addition and multipli-
cation), ordinal aspects (the linear order), and aspects of continuity (the
completeness requirement). Of these, the completeness requirement is the
most geometric and the deepest-and has the most varied expression. For
the algebraic axioms it is remarkable that multiplication appears to have
no natural geometric representation on the line, but only one in the plane.
One dimensional geometry is evidently quite weak.
The arithmetic constructions of the reals from the rationals are varied,
straight-forward, and not of great weight. More significant is the use of
the reals to reduce problems of geometry to questions in algebra, by way
of vectors or of coordinates. The reduction of angular measure to linear
measure is the source of trigonometry and the starting point for the alge-
braic treatment of vector spaces with inner products. It is also vital to the
122 IV. Real Numbers
Functions, Transformations,
and Groups
1. Types of Functions
Functions probably first appear with the practical experience that the size
of some one magnitude depends on the size of another-the weight of a
block of ice depends on the size of the block, or the distance travelled
depends on the speed, or the area of a rectangle depends on its dimen-
sions, or the angle subtended by a circular arc depends on the length of
that arc. Thus practical problems, physical problems, geometric facts, and
algebraic manipulations all indicate that one "quantity" may depend
upon others. This leads to an idea of functional dependence, often
described in suggestive but imprecise ways. A modern version of such an
informal description would say that a function acts like a machine which,
given any number as input, will produce as output some other number,
"depending" on the input.
Algebraic operations provide many examples of functions of numbers.
For instance, the operations "add 2 to the given number" or "multiply the
given number by 3" or "take the square of the given number" are func-
tions which we indicate by the "assignment" notation (with a barred
arrow f--» to designate the destination of the given number x, y, or z:
124 V. Functions, Transformations, and Groups
x ~ 2 + x, y ~ 3y, (1)
One such function, followed by a second one, will give a composite func-
tion. In elementary parlance, "substituting" 2 + x for y and then 3y for z
gives the composite of the three functions (1) as the quadratic function
x ~ 36 + 36x + 9x 2 . (2)
x ~ (3x 2 - 1)/(x 2 - 3x + 2)
produces from x a real number if and only if the denominator is not zero;
that is, if and only if x =1= - I and x =1= - 2. It is then a function to the set
R of reals from that set with -I and -2 deleted: R - {-1,-2} ~R.
The assignment x ~ + Vx (take the positive square root of x) acts only
on non-negative reals (or on non-negative rationals). (The two-valued
expression ± Vx does not count for us as a function.) A function may be
determined by one of several different operations, with the choice depend-
ing in the location of the input number. Thus the absolute value of a real
number may be described by the assignments
x ~ x if x > 0, x ~ -x if x < O.
- 1 and + 1 are possible for x = sin {}, and there are then many choices
for the angle. To get a function, one may choose the angle always
between - 'TT 12 and 'TT 12 inclusive; this gives the function "arc sin" from
the interval consisting of all real x with - I < < x I to the interval of all
real {} with - 'TT 12< {} < 'TT 12. Recall that such inverse functions are
needed both in solving triangles and in performing certain integrations,
such as f (l-x 2)-I!2dx. Another basic function which is not algebraic is
the exponential eX and its inverse, the logarithm log x - defined only for
positive real x.
"Variable quantities" in physics and other sciences are essentially mag-
nitudes measured (according to established scientific rules) by real
numbers. Thus weight, length, volume, temperature, velocity, and the like
are such quantities. Various elementary laws of physics assert that one
such quantity is a function of another. Thus, for materials of fixed density
p, weight w is w = p V, where V is the volume. The distance transversed
by a falling body is a function of the time t of fall, according to the famil-
iar formula s = gt 2/2. For an ideal gas, pressure P, volume V and
temperature T are related by the law PV = kT, for a suitable constant k.
Thus each of these three quantities is a function of the other two. There
are many such relations between variable quantities, and the study of the
manner of their variation is at the origin of much of Mathematics.
2. Maps
Functions of points arise in geometry. The problem of representing the
globe or a part of the globe on a piece of paper is the problem of map-
ping a portion of the sphere S2 on the Euclidean plane R2, by some func-
tion S2 ..... R2. The path of a projectile in space is specified by giving its
coordinates x, y, and z as they depend upon time t by three functions
x( t), y( t), z( t); that is, by a single function from (part of) the time axis
into R3. These and many similar examples provide "maps" -that is,
functions-from one set X of objects, points, or numbers into some other
set Y. In other words, functions need not be numerical. The operation of
rotating a plane about a given point and through a given angle maps each
point of the plane into another point, hence is a function R2 --> R2,
expressed in coordinates via trigonometry as in the linear equations
(IV.9.5). More generally, any linear transformation T of the plane is a
function R2 --> R2 which takes linear combinations to linear combinations.
Hence, writing each vector of R2 as a linear combination
(x,y) = x(l ,0) + y( 0, I) of the standard basis vectors, the whole
transformation T is determined by the images T( 1,0) = (a,e) and
T(O,I) = (b,d) of the two basis vectors; indeed one may use these
numbers to write T as
[a
cd'
b). (2)
3. What Is a Function?
The various intuitive ideas about functions and functional dependence are
helpful but vague. Here are some of them.
Graph. A function is a curve in the (x,y) plane, such that each vertical line
x = a meets the curve in at most one point with coordinates (a,b). When
it does so meet, the number b is the value of the function at the argument
a. For other arguments a, the function is undefined.
This description emphasizes the geometric aspect. It is persuasive for
functions of real numbers which are smooth or at least continuous, but it
doesn't fit well with a function which jumps from 0 to 1 as the variable
changes from rational to irrational. It involves also the (undefined) notion
of a curve, and so makes arithmetic depend upon geometry.
where the double headed double arrow ~ is short for "if and only
if'. An axiom (the axiom of extensionality) then requires that equals can
be substituted for equals:
here the double arrow ~ is short for "implies". Another axiom asserts
that for any two sets a, b there is a set {a,b} such that
in other words, {a,b} is the set whose only elements are a and b. In partic-
ular, {a,a} is the set whose only element is the set a; it is usually written
{a}. On this basis we can define the ordered pair of two sets a, b to be the
set
This is a set whose elements are sets of sets. One can then prove that
which states that this set <a,b> has the property one would expect of an
ordered pair of any two things. Other axioms prove that, given two sets
X and Y, there exists a set
X X Y= {<x,y> I x E X, Y E Y} (6)
whose elements are exactly all the ordered pairs of elements x EX and
y E Y. This cartesian product of sets has already been used, for example
in the description of the coordinate plane as the cartesian product R X R.
We also use the formal definition of a subset SeX: A set such that
XES implies sEX.
y s
C)x
Figure 1
130 V. Functions, Transformations, and Groups
(as sets of ordered pairs and thus as functions) if and only if, for all
x E X, f(x) = g(x) in Y.
Our examples of functions have involved many instances of functions
constructed by composition, such as (2x + 2)2 or cos 7(). There is a
corresponding formal definition. Given any two functions f and g, when
the codomain of f is the domain of g, as in
f g
X ~ Y ~ X,
Here the usual convention that f(x) means "apply the function f to the
argument x" has the consequence that the composite function g' f means
"first apply f, then apply g". The definition of the composite function g. f
is then equivalent to the statement that (8) holds, and that
expIO(logIO x) = x, x > 0,
states that log: {x I x > O} ~ R is a right inverse of exp (it is also a left
inverse). The operation "take the positive square root" is a function
{x I x > O} ~R. The equation x = (Yx"y means that "square" is its
left inverse-but not its right inverse. In general again, g' f = Ix and
f·g = ly together mean that g is a (two-sided) inverse of f. When
f: X ~ Y has such an inverse g, that inverse is unique; moreover the sets
X and Y have the same size (formally, have the same cardinal number).
When any f: X ~ Y is regarded as a comparison of the set X to the set
Y one naturally may ask when f hits all elements of Y (is "surjective") or
when f keeps distinct elements of X distinct (is "injective"). Formally, a
function f: X ~ Y is called an injection when Xl =1= X2 in X implies
jx] =1= jX2 in Y; intuitively, an injection maps the set X in one-to-one
fashion onto some subset of Y, called the image of f. Unless X is the
empty set </>, a function f with domain X is an injection if and only it has
a left inverse g; specifically, g takes each element of the image back to the
necessarily unique element of X from which it came, and the remaining
elements of Y (if any) back to any old element of the (non-empty!) set X.
5. Transformation Groups 133
5. Transformation Groups
The analysis of the symmetry of figures, of formulas, of ornaments or of
crystals leads inevitably, as in Chapter I, to the study of transformation
groups. Such groups also arise in geometry, as groups of rigid motions,
and can even provide a foundation for Euclidean geometry (§IIl.lO). In
this section we will note additional examples of transformation groups.
134 V. Functions, Transformations, and Groups
second. Thus in the Euclidean plane two triangles are equivalent in this
sense if and only if they are congruent, in the oriented Euclidean plane
two triangles are equivalent if and only if they are congruent and have the
same orientation, while in the affine plane any two non-degenerate trian-
gles are equivalent.
This description of the Erlanger program fits all manner of geometries.
Thus topology involves for each topological space X the group of all those
bijections t: X ..... X which are continuous with a continuous inverse. More
generally, a homeomorphism t: X ..... Y of topological spaces is a continuous
bijection with a continuous inverse. Thus the surface of a sphere is not
homeomorphic (why?) to the torus (the surface of a doughnut), but is
homeomorphic to the surface of an ellipsoid or of a cube. However the
latter homeomorphism cannot be "smooth" (at the corners of the cube);
this suggests still another kind of geometry, that of eX! structure, to be
discussed in Chapter VIII.
All told, the initial and quite intuitive observations of symmetry, of
motion, and of transformations in geometry and of their composition
leads to the formal set-theoretic notion of a transformation group. Its
study involves algebra, geometry, and continuity.
6. Groups
The definition of a group G by axioms, as given in §I.8, arises by abstrac-
tion from the notion of a transformation group-one ignores the fact that
the elements are transformations of something but retains the composition
operation, the identity, and the inverse and requires the associative law
(valid automatically in case the group elements are indeed transforma-
tions). No further axioms are needed to formalize the properties of com-
position, because these axioms suffice to prove the Cayley theorem that
every (abstract) group G is isomorphic to a transformation group G'.
Specifically, each element g E G can be regarded as a transformation
g': x f-> gx of the elements x of the set G. By the associative law,
k(gx) = (hg )x, so the composition hg of group elements matches exactly
the composition h 'g' of the corresponding transformations; thus the bijec-
tion g f-> g' is an isomorphism of the abstract group G to the transforma-
tion group G'. It is called the (left) regular representation of G, because
each element g of G is "represented" by the operation "multiply on the
left by g".
Other groups arise not as groups of transformations but as groups of
numbers under multiplication (or addition; cf. §J.8). Most of these groups
are abelian, but algebraic considerations also yield examples of non-
abelian groups, such as GLn and the multiplicative group of non-zero
quaternions (§IV.14). From given groups one may construct new groups
as products or as semidirect products (§8). One may also construct groups
136 V. Functions, Transformations, and Groups
G • H
~ is'
s I
I
t
(1)
7. Galois Theory
Symmetry and its formalization by transformation groups arises not only
in geometric situations, but also in purely algebraic cases. We have
already noted a first example, in the operation C ---C of conjugation
x + iy ~ x - iy for complex numbers. This operation interchanges the
two complex cube roots wand w2 of 1. Conjugation leaves only the real
numbers fixed, and can be viewed geometrically as a reflection of the
complex plane in the real axis. As a symmetry group, it consists of just
two transformations: conjugation and the identity. It arises from the equa-
tion x 2 + 1 = 0, since it interchanges the two roots of this equation.
Now consider a polynomial equation of degree n such as
an '1= O. (1)
We will assume that the coefficients an,an-l , ... ,ao lie in some subfield
F of the field C of complex numbers, and that the polynomial f is irredu-
cible over F-that is, cannot be factored in F as f( x) = g( x )h( x) into two
polynomials g and h of lower degree.
Galois theory is concerned with formulas for or properties of the roots
of f(x) = O. By the fundamental theory of algebra, there is at least one
complex root al and hence a corresponding factorization of f(x) as
f(x) = (x - al)g(x), where the second factor g(x) is a polynomial (of
degree n - I) with complex coefficients. This polynomial in its turn has a
complex root a2 so continuation of this process ultimately gives n roots
and a factorization
(2)
From the fact that f is irreducible, one may prove that these n roots
an are all different.
al , . . . ,
7. Galois Theory 139
of the field of complex numbers. It is called a splitting field for the poly-
nomial f over the base field F, because it is a smallest field containing F
in which the polynomial f "splits" into linear factors. In fact (forgetting
the complex numbers) this property determines the field F( 0'1, . . . , O'n)
up to an isomorphism leaving the elements of F all fixed. In the case of
the polynomial x 2 + lover R, the splitting field is just the field C of
complex numbers.
Now consider the symmetries of the splitting field N relative to F. A
symmetry is by definition a transformation t: N -> N which is an automor-
phism of fields (i.e., a bijection which preserves sums and products) and
which leaves fixed all the elements of the base field F. All such sym-
metries constitute a transformation group G, the Galois group of the split-
ting field N-and of the polynomial f -over the base field F. Since each
symmetry t leaves all the coefficients of f fixed, it must carry any root 0'
of f(x) = 0 into another root of this polynomial. Hence such a symmetry
induces a permutation of the roots 0'1, . . . ,O'n of f; since N is generated
by these roots, the symmetry is determined by the permutation, so that G
is isomorphic to a subgroup of the symmetric group Sn. This means in
particular that the Galois group G is finite. It can be described either as a
group of automorphisms of N of as a group of permutations of the roots.
Linear algebra provides a more specific measure of the size of the
Galois group. If we neglect multiplication in the splitting field, then addi-
tion, plus multiplication by elements of F, still make N a vector space over
the field F-and the dimension of this vector space is equal to the order of
the Galois group. For algebraic reasons, this dimension is called the
degree [N:Fj of N over F. (This is a striking simple case of the use of
geometric dimensions in algebra.)
For examfle, the easy equation x 3 - 5 = 0 over the rationals has
three roots: Vs, w3 Vs, and w2 3 Vs, where the first root is the real one,
while w is a complex cube root of unity. The whole splitting field over Q
then can be built up in two steps from Q-first the real root, then the
others-as
140 V. Functions, Transformations, and Groups
u n
S = K# = Gal( N:K)
u n
G = F# = Gal(N:F).
G = Go :J GI :J G2 :J ... :J Gm - I :J Gm =I (3)
Ss :J As :::J I,
8. Constructions of Groups
The remarkable richness of group theory rests in part on the extensive
variety of specific groups, especially finite groups. Initially, Mathemati-
cians wanted to know all groups, and so were tempted to try to list all the
possible finite groups of a given order, of course counting isomorphic
groups as the same. Thus, there are two different groups of order 4, the
cyclic group 24 and the group of 4 symmetries of a rectangle (the so-
called four group). In a group of order 5, any non-identity element must
have order 5, so any such group is (isomorphic to) the cyclic group Z5.
There are two different (i.e., non-isomorphic) groups of order 6, the cyclic
group Z6 and the symmetric group S3, and only one group Z7 of order 7
because 7 is prime. The continuation of such a simplistic catalog, how-
ever, soon becomes cumbersome and is not very enlightening. Too much
depends on the arithmetic properties of the order of the group.
Instead, long experience with examples has developed more useful con-
structions.
If a prime p divides the order n of a group G, as n = pen' with n' rela-
tively prime to p, then G has at least one subgroup P of order pe, and any
two such subgroups are conjugate. These subgroups P are called the
Sylow subgroups of G; one shows that any subgroup S of G with order
some power of the prime p is necessarily contained in a Sylow subgroup P
(i.e., one of order pe). The investigation of these subgroups (Algebra,
8. Constructions of Groups 143
The functions (a,g) f-+ a and (a,g) f-+ g provide two homomorphisms
'TTl: A X G-+A and 'TT2: A X G-+G, called projections. Given any
homomorphisms tl: H -+ A and t2: H -+ G from a group H to the factors
A and G, there is a unique group homomorphism s: H -+ G which pro-
duces t) and t2, by composition with the projections, as t) = 'TTl'S and
t2 = 'TTl'S. This is expressed by the (commutative) diagram of group
homomorphisms
(2)
A * G. Its elements are all possible words algi a2g1 ... angn which are for-
mal products of elements ai E A and gi E G, with each ai (except possi-
bly for al) and each gj (except possibly for gn) 1= 1. Two such words are
multiplied by juxtaposition and subsequent cancellation (when possible);
thus ag multiplied by g-Ibg l becomes agg-Ibg l = (ab)g, where ab is the
given product in factor A. With some care, one may prove that this multi-
plication is indeed associative, and that a f-+a·l and g f-+ 1· g are monomor-
phisms k l : A ~A * G and k 2 : G~A *G which enjoy the following "univer-
sal" property: To any morphisms tl: A ~H and t2: G~H into an arbi-
trary group H there is exactly one morphism s: A * G ~ H which yields tl
and t2 as the composites tl = s·k l and t2 = s·k2 ; that is, which makes
the following diagram
(4)
of the plane (z a complex number) with integral coefficients and such that
the determinant ad - be = 1. This group (which plays a considerable
role in number theoretic investigations) can also be described as the mul-
tiplicative group of all 2 X 2 matrices of integers
1 -7 A -7 E -7 G -7 1, (6)
where 1 designates the group with just one element. This sequence is said
to be exact because at each node the image of the incoming homomor-
phism is "exactly" the kernel of the outgoing homomorphism. Thus
exactness at A means that k is a monomorphism, and exactness at E
means that the (normal) subgroup kA is the kernel of the projection '1T.
The direct product A X G yields one such extension E, but there are
many others. For instance, if E is the symmetric group Sn and A the alter-
nating subgroup An, the group G in (6) is cyclic of order 2, but Sn is not
the direct product An X Z2.
In any such sequence, the requirement that A (or k(A» be normal in G
means that each element e E E induces an inner automorphism
a l->eae- 1 onA, call it cpe. Hence cp is a homomorphism E-.Aut A. If A is
abelian (and in some other cases), A will be in the kernel of this map, so
that cp "induces" a homomorphism 0: G-.Aut A. Conversely, given A, G,
and such a homomorphism 0: G-.Aut A, sending each g E G to a I->ga,
one may construct a group E on the set A X G by the multiplication
Then a -.(a, 1) and (a,g) -. g does yield an exact sequence (6). This group E
is called' the semidirect product of A and G, with "operators" O.
The dihedral group t:.n is such a semidirect product. For each natural
number n, the group consists of all the 2n symmetries of a regular n-gon
t:.n (thus, for n = 3, of an equilateral triangle). Call a transformation t of
t:.n even ( + 1) if it leaves the n-gon right side up and odd ( - 1) if it turns
the n-gon over; these labels constitute an epimorphism '1T: t:.n -,Z2 to the
cyclic group ( ± 1) of order 2; its kernel is the normal subgroup consisting
of all n possible rotations of the n-gon. If R is such a rotation though 2'1T/n
146 V. Functions, Transformations, and Groups
Many other extensions are not split. In particular, when A is abelian (and
written additively) all extensions of A by G with operators () can be
described by giving a multiplication on the set A X G by the formula
9. Simple Groups
Any group G has itself and 1 as normal subgroups; it is simple when it has
no other normal subgroups. The Jordan-Holder theorem (§7) for compo-
sition series indicates that any finite group may be constructed as an
iterated extension of simple groups, though the order in which the simple
factors may occur in the series can vary. For this and many other reasons,
it would be useful to know all the finite simple groups. They include the
cyclic groups of prime order, the alternating groups An for n ;) 5, and
others. From an examination of other examples, W. Burnside had, about
1900, conjectured that all finite simple groups except the cyclic ones had
even order. This conjecture was proved true, with a complicated argu-
ment, by Feit and Thompson in 1962. Their methods were novel and
powerful and were then further extended, till there is now a complete list-
ing of all possible finite simple groups. This list begins with certain famil-
iar infinite families: The cyclic groups of prime order, the alternating
groups An for n ;) 5 and 16 families suggested by geometric construc-
tions; for example, one may modify the definition of the special orthogo-
nal group by replacing the field of real numbers by a finite field, so as to
get a finite group. There are then 26 simple groups which do not fit in any
such systematic infinite family. They are called the sporadic groups. For
example the largest one, called the "monster", has order the following
product of prime powers:
10. Summary: Ideas of Image and Composition 147
This number is approximately 8·ro53 . This large group has been con-
structed as a group of the transformations preserving certain geometric
structure in a space of 196,883 dimensions! This dimension, in its turn, is
closely connected with coefficients in the expansion of certain modular
functions.
These remarkable results serve to illustrate the richness of group theory.
five features also had deep consequences. This we can do only tentatively
in a few cases. Thus monoids do have properties (a) and (d), but do not
have a wealth of additional applications-and have "too many" finite
models with "too little" structure. In set theory, properties of intersection,
union, and complement can be abstracted in the axioms for a Boolean
algebra. Here (d) holds, in view of the Stone representation for Boolean
algebras, (and (a) and (c) hold as well). However, the finite models of
Boolean algebra are dull: For each natural number n, there is just one
model with 2n elements-namely, the Boolean algebra of all subsets of an
n element set. There are in Mathematics many other axiomatic notions-
but few of them enjoy all five of the properties (a}-(e) above.
The material of this chapter may be summarized in the following net-
work, which indicates, only in part, how these ideas are all interrelated.
10. Summary: Ideas of Image and Composition 149
Counting Formula
I
Quantity Graph
/
I
Next ~ DepeJdence~ Transformation
II
Successor Composition Permutation
\
I
Addition Translate
I
Multiplication
I
1~ 1--
R R FUNCTION--I X ~ Y - { Rotate
1
Winding fcn
I
Sine, Cos
I
1
Table -....;o:..;cr.=.de:..:cr.:..,e1d:..p:...:a::;ir - sets~ ~I
Rigid rotion
/
",.L, _____
Subgroup Primes
SJru~4 L
Semi-Direct Products Abelian Groups Infinite Groups Finite Groups
: :~:i~::~'
Homj'0gy Group Reprerntations ~ The Monster
SOOP'i
Concepts of Calculus
1. Origins
Many sorts of calculations press themselves upon us. Thus, given a piece
of surface, how does one calculate its area? Or, given a section of a curve,
how does one calculate its length or the direction of its tangent line at
some point? More generally, how does one calculate the rate at which this
or that variable quantity is changing with time? The striking discovery (by
Newton and Leibniz) that there were systematic methods to calculate all
these things, and many more like them, had a major influence on the
directions and structure of Mathematics. For a considerable period, more
practical calculations of such things tended to dominate conceptual under-
standing, in a way that emphasizes the observation that Mathematics
takes its origin in human activities.
The calculation of areas began in Euclidean geometry, with formulas
for the areas of triangles and squares. Next in line was the area of a cir-
cular disc. This could be determined by inscribing and circumscribing reg-
ular polygons in the disc; as the number of sides of these polygons
increased, the area inside the circle was "pinched" between the (calcul-
able) areas inside the larger and the smaller polygon-and this pinching
process produced the area inside the disc as a sort of limit. This method,
neatly adapted to the case of the circle, suggested to Archimedes and oth-
ers a search for extensions of the method to calculate other areas-that of
an elliptical disk, and that for more irregular figures. Similarly the meas-
urements of length began with the Pythagorean theorem used to deter-
mine the length of a slanted line and hence the perimeter of any polygon.
The circumference of a circle could then be found (§III.2) through succes-
sive approximation by inscribed polygons. These and other problems of
measurement (of volumes, weights, centers of gravity, and the like)
emphasized the ubiquitous role of approximation and may have indicated
the need for a systematic understanding of the way in which such succes-
I. Origins 151
2. Integration
It is remarkable that so many different processes of approximating total
measured quantities by adding together little bits of these quantities can
all be subsumed under one process, that of integration. Area, volume,
length, pressure, moment of inertia, weight, and the like all can be
managed by such sums. To be sure, the usual formal definition of the
Riemann Integral is usually presented as the calculation of an area-
specifically the area below a curve y = f(x), above the x axis, and
bounded left and right by the ordinates x = a and x = b. This area is
broken up into the usual thin vertical and rectangular strips of width dx
running from x = a to x = b, as in Figure 1. The area of such a strip is
altitude f( x) times base d, hence is written f( x )dx and the sum of them
all-and hence the total area desired-is the definite integral
J
b
f(x)dx. (1)
a
y
/'
-
--r---.... 1"--_
/
x
a b
Figure 2
n n
L(j,a) = ~ (MinJ)(xi - Xi-I) < U(j,a) = ~ (MaxJ)(xi - Xi-I); (2)
i=1 i=1
the desired area under the whole curve must be squeezed between these
two sums. To actually express this area, we must then take a limit of these
sums for successively finer subdivisions a, so let the size of a be measured
by I a I ' the maximum of all the interval lengths I Xi - Xi-I I . When f
is continuous (on the whole interval a =< X =< b) it is then the case (as we
will soon see) that for each measure ( > 0 of approximation there is a
~ > 0 such that, whenever I a I <~, then I U(j,a) - L(j,a) I < t:.
This implies that Land U have a common limit as I a I approaches 0,
this limit is the definite integral (3) of the function f over the interval
a =< X =< b. By its construction, it represents what the area under the
curve over this interval should be; if you wish, it is the definition of this
area; for those applications such as pressures and volumes it can also
serve as a definition. Such definite integrals, so defined as limits of sums,
are still written in the classical notation (3) suggesting the infinite sum of
infinitesimals; in fact that intuitive view of the matter allows us to easily
set up the integral representing all sorts of other quantities: The weight of
a thin slab of known but variable density, the water pressure on a slab-
like portion of a dam, or the volume enclosed by a surface of revolution.
Much of the instruction in elementary integral calculus consists of repeti-
tive exercises practising the formulations of such integrals. When they
cannot be done by slices or slabs, they can be managed (with more tech-
nique but in a wholly similar spirit) by multiple integrals.
This formal definition of the Riemann integral of a continuous function
replaces the intuitive idea of successive approximations by the use of lim-
its and of the standard logical quantifiers (for all ( > 0 there exist a
~ > 0 such that ... ) needed for the exact description of these limits.
Various general properties of the integral also follow directly from this
154 VI. Concepts of Calculus
3. Derivatives
The derivative of a variable quantity y with respect to another such quan-
tity x on which it depends is to be the instantaneous rate of change of y
relative to the change in x. This description has intuitive appeal, espe-
cially in the special case when x is taken to be time. Instantaneous rates
are modeled on average rates: For a value of YI at time XI, changed to a
value Y2 at a different time X2, the average rate of change is the ratio
(Y2 - YI )/(X2 - XI) of change in y to change in x. The "instantaneous"
aspect might again be formulated by an infinitesimal change dx from X to
x + dx. Then if y depends on x by a function y = f(x), the instantane-
ous rate of change is the ratio [f(x + dx) - f(x)]ldx or, in an evident
extension of the dx notation, dy /dx, the quotient of two infinitesimals.
Such infinitesimals serve for quick calculation. For example, if f(x) = x 2 ,
then the instantaneous rate is
dy dx _ dy
(2)
dx·Tt - dt'
and one has (but hasn't quite proved) the important (because necessarily
most useful) "chain rule" for the derivative of a composite function. Thus
calculus with infinitesimals is intuitive and an efficient means of calcula-
tion.
But what are these infinitesimals? The archimedean law for the real
numbers (§IV.4) says that a positive number, no matter how small, has
4. The Fundamental Theorem of the Integral Calculus 155
ia
b
f(t)dt = F(b) - F(a). (I)
The expression on the right is often written as [F(t)]%, while the function
F is called the indefinite integral ff( t )dt.
Before we discuss a rigorous version of the "infinitesimal" motivation of
this theorem, let us consider its utility. It provides a formula for calculat-
ing the (Riemann) integral f%, provided the function f( t) inside the
integral is known to be the derivativef(t) = F'(t) of some other function
F. If we have differentiated enough such functions F and have prepared a
"Table of Integrals" giving the results, we may then hope to find to the
given "inside" function f( t) a suitable "primitive" F( t). When this is the
case, the definite integral (representing an area, a volume, or some other
quantity to be determined) may be calculated as the difference
F(b) - F(a) of two values of this function F. Thus the fundamental
theorem conceptually ties the process of differentiation to that of integra-
tion, and provides means of (sometimes) calculating definite integrals.
But alas, our table of integrals may not contain any function F with the
desired derivative f. For instance, if we have so far differentiated only
polynomials, powers, square roots and the like, we will have no function
of t with derivative the square root (l - t 2 )1I2. Progress on the
corresponding definite integral f% (I - t2)1/2dt, with well chosen limits
a and b (i.e., with 0 < < <a b I) is then possible only when we use
trigonometric functions; set t = sin 0 for some angle 0, determine that
the derivative of sin 0 is cos 0, and then use the standard rules for
"change of variables" - dtldO = cos 0 or dt = cos OdO -to get
4. The Fundamental Theorem of the Integral Calculus 157
where the new limits 0' and f3 for the integral over 0 are chosen so that
sin 0'= a, sin f3 = b. Now, if we have noticed that the derivative of
sin 0 cos 0 is 2 cos20 - 1, we have a primitive, so we may again use the
fundamental theorem to calculate
f f3 cos 0dO =
a
2 (1/2)[0 + sin 0 cos Olr
In particular,
(2)
This is a small indication of the way in which integration forces the con-
sideration of derivatives of new classes of functions-here the trig-
onometric functions; later elliptic functions. A still simpler example is the
integral fdx/x, which leads to the function lo~ x. In the present case, the
entry of trigonometric functions is no accident-after all, the integral on
the left of (2) is just a representation of the area '1T / 4 of the quadrant of a
unit circle, and the trigonometric functions serve to provide rectangular
coordinates for points on the circle.
Now return to consider a possible proof of the fundamental theorem.
We can derive it from two more primitive facts; as follows.
Lemma A. If a continuous function F( t), defined for all t with a <; t <; b,
has derivative 0 for every such t, then F(t) is constant for all t with
a <; t <; b.
This is intuitively plausible: If the rate of change of F is zero every-
where then it doesn't change at all, hence is constant. Later we will return
to examine a rigorous proof of this lemma from the Law of the Mean
(§7).
Lemma B. If the function f(t) is bounded between two constants m and M,
so that m <; f(t) <; M for all t with a <; t <; b, then the definite integral
of f(t) satisfies the inequalities
(b - a)m <; ia
b
f(t)dt <; (b - a)M. (3)
If one considers the definite integral as a measure of the area under the
curve y = f(x) and between the ordinates x = a and x = b, these two
inequalities are evident, since (b - a)M is the area of a rectangle includ-
ing all that area under the curve, while (b - a)m is the area of a rectan-
gle wholly under the curve. An exact proof of (3) from the limit
definition of the integral in §2 is straightforward: In the sum (2.2), each
term is by hypothesis bounded as in .
The whole sum is then squeezed between m(b - a) and M(b - a), so the
same must be true for the limit of this sum; that is, for the definite
integral.
As for the fundamental theorem (1), consider the definite integral
fa
I
G(t) = J(t)dt
G( b) = J
a
b J( t )dt = F( b) - F( a),
oS = gt + Vo. (1)
(3)
160 VI. Concepts of Calculus
~~--~~--------------------x
y = -g, x= 0, (4)
Thus, substituting x for t in the first equation, the path of the body is the
locus of the equation
6. Differential Equations
This discussion of the motion of planets and of projectiles is a typical
instance of the general method of describing physical and other
phenomena by differential equations. A typical differential equation, now
of the first order and not "second order", will determine some unknown
function x of the variable t by giving the first derivative of x,
F(b) = F(a) + i a
t
f(t)dt.
In words, the final values of the variable F (or x) is determined by the ini-
tial value F(a) and all the intermediate values f(t) = F'(t) of the deriva-
tive on the interval a <; t <; b. This underlies the use of "initial values"
in getting solutions of differential equations. In explicit terms, this
amounts to a search for some function x = g(t) defined for a suitable
range of t and such that this function with its derivatives satisfies the
given differential equation (1):
for all the intended t. To do this, we must again canvass functions g whose
known derivatives might satisfy (2); elementary treatments provide vari-
ous rules and tricks to do this, for example by changing the variables x or
t to other more manageable quantities. As with elementary integration,
the known functions may not suffice to get the solution. At this point one
might use numerical methods, or might try to invent a new function g to
do the job. Such invention might seem a chancy matter, but this need not
be so. Given the general definition of a function, it is possible to prove
"existence theorems" which specify conditions under which a solution g to
equation (2) must exist and (perhaps) conditions which make the solution
unique. One such existence theorem is the Picard theorem.
To formulate this theorem, we consider the initial condition that x = Xo
when t = to and assume for the equation (1) that the function h on the
right is continuous, say in the square D consisting of all (x,t) such that
I x - Xo I < a and It - to I < a. A solution of (1) with the given
initial conditions is then a function g(t), defined for t in some interval
162 VI. Concepts of Calculus
I t - Xo I < ~ about to, and with I get) - Xo I < a there, such that
g(to) = Xo and that (2) holds for all I t - to I <~. One version of the
Picard theorem then asserts that there is a ~ > 0 for which such a solu-
tion exists and is unique, provided that the given function h satisfies the
Lipschitz condition: There is a constant M > 0 such that, for all (X,tl)
and (X,t2) in the square D
(3)
v = j(x,t), x = v
7. Foundations of Calculus
The differential and integral calculus, as we have seen, starts with prob-
lems of calculating various quantities and the invention of uniform
methods of making these calculations. These methods rest initially on
vague but persuasive ideas of infinitesimals, infinite sums, and of rates of
change; their very success and rapid development forcibly raises the ques-
tion of formulating a rigorous foundation for these methods. This founda-
tion, developed in the 19th century, must start from a clear notion of a
function (the thing which might have a derivative). The foundation then
7. Foundations of Calculus 163
°<
valued function f, continuous on the unit interval [0,1], which consists of
allx, x < 1:
(i) The function f is bounded above and below; that is, there are real
numbers m and M such that, for all x in [0, I],
(V' ( here is short for "for all (> 0"), while ordinary continuity
throughout the interval is the same statement (2) preceded by the
sequence of quantifiers
which does satisfy the hypothesis g(O) = g(l) of Rolle's theorem. The law
of the mean also proves Lemma A of §4.
All of these theorems apply at once to real-valued functions f continu-
ous on any finite closed interval, say the interval [a,b] of all x with
a < < x b. In that case, for example, the law of the mean states that
for some point g with a < g < b: The total change in the function is the
length of the interval times the rate of change at some intermediate point
g. This law is another way of expressing one of the basic ideas of calculus:
The first derivative gives a linear approximation for the function.
Uniform continuity, as formulated in (v), is powerful. For example, it
can be used in connection with our description in §2 of the definite
integral fb f( X)dx. Given { > 0, uniform continuity yields a 8 > 0 so
that I XI G_ X2 I < 8 implies I f(xI) - f(X2) I < {/(b - a). Thus if
we subdivide the interval of integration into pieces Xi - Xi-I of length
less than 8, the maximum and the minimum of f in any subinterval differ
by less than deb - a), and this in tum implies that the lower sum L(j,a)
and the upper sum U(j,a), defined as in §2, differ by less than
(d(b - a»~xi - Xi-I) = €. Hence they do approach the same limit, as
we asserted in (2.2) in defining this limit to be the definite integral.
The assertion in (v) of uniform continuity depends essentially upon the
hypothesis that f is continuous in the closed interval [0, I ]-the interval
including its endpoints. For example, the function g(x) = l/x is continu-
ous on the open interval (0, I), consisting of all real numbers x with
7. Foundations of Calculus 165
°><
f °
x < 1, but g is not uniformly continuous on this open interval-given
0, as x gets closer to even smaller ~ is required in (2). In fact the
uniform continuity property (v) is a consequence of the following property
of the topological space X = [0,1].
°°
{Vi l i E I} of open intervals. Consider the set S consisting of all those
real numbers x E [0,1] such that the closed interval [O,x] from to x can
be covered by a finite number of the Vi. This set S is bounded by and 1,
hence has a least upper bound, call it ~. If ~ is less than 1, this ~ belongs to
one of the V, say to Vj-and Vj contains numbers e > f Since ~ is a
least upper bound of S, Vj also must contain some XES. Then the
interval [O,x] is covered by a finite number of the Vi, and this number,
together with Vj, cover the interval [O,e] stretching beyond ~, a contradic-
tion to the choice of ~ as the least upper bound of S. Therefore ~ = 1, so
the interval [0,1] is covered by a finite number of U's.
One should note that this is not a "constructive" argument. It does not
produce the finite cover, but argues by contradiction.
°
Heine-Borel suffices to prove uniform continuity (property (v) above).
For let f > be given. Since f is continuous at each point Xo there is for
each Xo an open interval VXo (with center xo) such that
I f(xl) - f(x2) I < { whenever Xl and X2 are in Vxo' Now (a small
trick) consider the intervals VXo c Vxo with the same center Xo and half
the radius. Since each Xo E VXo ' these smaller intervals cover [0,1]. By
Heine-Borel, a finite number of them will cover. Take ~ to be the
minimum radius of this finite number of intervals. Clearly ~ > O. Now
consider any two points Xl and X2 in [0,1] at distance apart less than ~.
Since the intervals cover, Xl is in some interval Vxo' By the trick, Xl and
X2 lie together in the larger interval VXo so I f(xl) - f(x2) I < f, as
desired.
With this result, we have completed the announced proof of the
existence of the definite integral. However, for double integrals we will
need Heine-Borel for a square. The proof of uniform continuity, how-
ever, involves nothing really new. To understand it, think of the interval
(or the square) as a metric space and hence a topological space and call
such a space compact when it satisfies Heine-Borel. In other words X is
compact if every covering of X by open sets Vi contains a finite sub cover-
ing. Then the proof we have just given for (v) really proves
166 VI. Concepts of Calculus
There are many compact spaces. For example, any product of compact
spaces is compact; in particular the unit square or cube is compact. Any
closed subset C of a compact space X is compact; here a subset C c X is
said to be closed when its compliment X - C is open in X, while the open
subsets of C (which define its topology) are just the intersections U n C
with open subsets U of X. A subset of the line or the plane is compact if
and only if it is both closed and bounded. On the other hand, the open
interval of all real x with 0 < x < 1 is not compact; when it is covered
by expanding proper open subintervals, no finite number of them will
suffice to cover. Similarly the whole real line R is not compact, and it car-
ries functions such as x 2 which are continuous but not uniformly continu-
ous.
Compactness is related to convergence. One readily proves that in a
compact space X every infinite sequence of points Xn has an infinite
subsequence converging to some point of X; here the convergence of a
sequence of points is defined just as was the convergence of a sequence of
numbers (§IV.4). This indicates again why R is not compact, because the
sequence of natural numbers has no convergent subsequence. A metric
space X is compact if and only if every infinite sequence of points in X
has an infinite subsequence converging to a point in X -but this result
fails for general topological spaces with convergence defined using neigh-
borhoods. The recognition of the importance of compactness and of its
description by coverings is a major step in the understanding of topologi-
cal spaces. It developed only slowly-and was not really codified until
Bourbaki, in his influential 1940 volume on topology, insisted.
This concept of compactness is only one of the many issues arising from
the development and foundation of the calculus. Geometry and mechan-
ics had led to the intuitive ideas of rates of change, area, and summation.
The somewhat vague formulation of these ideas by means of
infinitesimals proved to be powerful tools in the 18th century, but various
difficulties developed. For example (Titchmarsh [1932], 1.75), the
sequence of functions In(x) = n 2x(l-xt converges to zero for all x in
the interval 0 < < x 1, but the definite integral of the In(x) from 0 to I
is n 2 /(n + 1)(n + 2), so the sequence of integrals does not converge to O!
(The original convergence is not "uniform".) This is typical of problems
arising with the interchange of two infinite processes (here convergence
and integration). Such difficulties eventually required a more sophisticated
formulation of the calculus, based on careful development of the concept
of a limit. This concept, in its turn, involves a careful use of logical
quantifiers and suggests the consideration of limits in R2, in R 3, and in
much more general topological spaces. The study of the definite integral
inevitably involves new notions of uniform continuity and then compact-
8. Approximations and Taylor's Series 167
ness, and the proofs must rest upon a careful axiomatization of the real
numbers-an arithmetization of one-dimensional geometry. The axioms
alone do not suffice-there is a whole list of subtle consequences. Also,
compactness in its turn must be disentangled from the more pictorial
notions "closed and bounded". There are many more developments,
especially of more general integrals, such as the Lebesgue integral. Thus,
all told, the initial intuitive ideas and problems, and their extensions, and
applications, lead to notions which in their turn demand more subtle con-
cepts internal to Mathematics.
where the linear function on the right agrees with j and with j' at
x = a; in other words, it is the linear function whose graph is the tangent
line to y = j(x) at x = a. Moreover, one can estimate the error in such a
linear approximation. If the first derivative j '(x) is continuous on the
closed interval from a to x, while the second derivative j"(x) exists there,
a simple iteration of the Law of the Mean (§7) shows that there is a real
number ~ between a and x such that
(X a)2
j(x) = j(a) + (x - a)j'(a) + j"(~) 2
(2)
In other words, if the second derivative j" is small in the interval, the
linear approximation (1) is good.
This formula (2) is a special case of Taylor's theorem. If a function
j(x) has a continuous (n - l)st derivative j(n-Il(x) on the closed inter-
val from a to x and if the nth derivative exists on the open interval from a
to x, then there is a real number ~ between a and x such that
(4)
called the Taylor's series for f. The explicit meaning of such infinite sums
is then given by the convergence of the partial sums, as discussed in
§IV.4. Often the remainder formula given by Taylor's theorem (3) can be
used to show that the Taylor series converges for some (or even all) values
of x, or that it converges "uniformly" for some closed interval of values of
x. This leads to the familiar power series for eX, sin x, and cos x. They-
and their variants-are then at hand for the computation of tables of trig-
onometric functions. Indeed one can use the power series to define analyt-
ically the function sin x, thereby avoiding the winding function of
§IV.2-and obscuring the reasons for the periodicity of sin x. The
numerous other applications of such power series indicate that the initial
elementary idea of approximation, in particular linear approximations,
does indeed have extensive consequences. This does not justify the
overuse of merely linear approximations in the multiple regression
methods (least squares) so popular in econometrics.
9. Partial Derivatives
For elementary problems in algebra, it is always vital to know whether to
add or to multiply. For example, the total change due to two separate
causes clearly ought usually to be the sum of the separate changes. On
the other hand, one must multiply to get a composite rate of change: For
example, given the exchange rate from pounds to dollars and the rate
from dollars to francs, the product will be the exchange rate from pounds
to francs.
These two simple observations of the practical meaning of addition and
multiplication appear formally in the chain rule for differentiation in the
calculus. This we have already used: If z = g(y) and y = h(x) are two
functions with continuous derivatives, then in the relevant range
z = g(h(x)) is a function of x and has derivative
dz dz dy
z'(x) = g'(y)h'(x), or (1)
dx dy dx·
9. Partial Derivatives 169
The proof requires a little care with the limits entering in the definition of
the derivatives involved; the limits involve the formulas for the
corresponding finite increments ~x = x - Xl, for then
~z ~ ~y
~X ~y ~x·
Thus (1) expresses the underlying reason why one should multiply rates.
The corresponding chain rule is more striking for functions of several
variables, such as a quantity z given as a function z = f(x,y) for all
points (x,y) in some open set U of the cartesian (x,y)-plane. There is no
problem in finding what derivatives might mean here; if one holds y fixed,
the quantity z remains just a function of X; its derivative, when it exists, is
called the partial derivative with respect to x. Thus at a point (x,y) in U
this derivative, for h =1= 0, is
az -_ f' x (x,y )
-a - l' f(x
- 1m
+ h,y)h - f(x,y)
. (2)
X h~O
The intuitive principle that the total change is the sum of the separate
changes in x and in y then suggests that the derivative of z with respect to
the parameter t in the direction 0 should be the linear combination
-dz = [az]
- cos 0 + [az].
- sm 0, (3)
dt ax ay
usually called the directional derivative. When both the partial derivatives
of z are continuous, this formula holds; it is a special case of the following
170 VI. Concepts of Calculus
This clearly includes the motivating case (3). A similar formula applies
when z is given as a function of more than two variables or when these
variables x and y depend not on one but on several parameters.
This chain rule (4) has several different aspects.
First, think of dx = (dx/dt)dt as an infinitesimal change in x, caused
by the (equally) infinitesimal change dt in t. Then, multiplying (4) by dt
and cancelling gives
az
dz = - d x
az
+ -dy. (5)
ax ay
This expression is called the total differential of z; for given values of x
and y it gives the total change in z due to infinitesimal changes dx and dy;
we will soon give a less "infinitesimal" interpretation.
Starting from a point xo, yo with finite changes x - Xo and y - yo,
the formula (5) suggests a linear approximation z - Zo to the change in
z:
(z-zo)= [~]
~ 0
(x-xo)+ [~] (y-yo).
~ 0
(6)
This suggests (and correctly) that there is a Taylor's formula and also a
Taylor series, each valid for functions z of two variables x and y which are
sufficiently "smooth". Here and later we will use the term smooth for
functions with enough continuous derivatives, in cases where we do not
wish to specify in detail how many such derivatives are in fact needed.
In the chain rule (4), dz / dt can be regarded as an "inner product" of
two "vectors", as follows
x 9l
[ ddt , ddt ]1==1 = (g'(to), h '(to)). (9)
0
It is called the tangent vector to the path at the point. All the tangent vec-
tors to trajectories through the point (XO,yo) form a two dimensional vec-
tor space, called the tangent space To to the plane at this point.
Now the chain rule, in the form (7), gives dz /dt as the product of the
gradient vector of z by the tangent vector to the path. If one considers
both vectors to lie in the same two-dimensional space, this product is just
the inner product as described in §IV.9. However, it is preferable to con-
sider the tangent and cotangent spaces as conceptually distinct. The
"product" in (7) is then a real-valued function of two vectors, one from
each space. This function is linear in each vector when the other is held
constant, so is said to be bilinear; in Chapter VII we will indicate why it
makes the cotangent space "dual" to the tangent space.
The cotangent space may be constructed formally as follows: Take all
smooth functions f(x,y), each defined in some neighborhood of (xo,Yo);
they form an (infinite dimensional) vector space under addition of values
and multiplication by real constants. Call two such smooth functions
z = f(x,y) and w = k(x,y) cotangent (or equivalent) at the point (XO,Yo)
when they have the same first partial derivatives there:
The equivalence classes of these functions under this relation then form
the desired two-dimensional space, called the cotangent space at the point
(XO,yo). This construction works not just for the plane, but for other
curved surfaces such as the sphere, when the coordinates (x,y) in the
plane are replaced by suitable coordinates, such as latitude and longitude
for the sphere.
172 VI. Concepts of Calculus
df = [ ~l dx + [ ~l dy
of the total differential. Thus the differential, born as an infinitesimal, may
be defined to be the gradient V f -a vector in the cotangent space. This
is why the cotangent space differs from the tangent space.
The tangent vector at to to the path get), h(t) also determines the usual
tangent line to that path, with the parametric equations
Z - Zo = [ : lo (/ - /0) (11')
But these three equation (11) and (11') together satisfy the linear equation
(6) for the approximate change z - Zo in z. This linear equation (6)
represents a plane in 3-space; since it is satisfied by (11), it must be the
tangent plane to the surface S according to our definition.
In this way, the chain rule combines ideas from geometry (tangent
planes), from mechanics (velocity vectors), from calculus (linear approxi-
mation), and from algebra (dual spaces), with the results appropriately
added or multiplied. It gives meaning to the "total differential".
Some of these ideas are more vivid in pictures. Thus the gradients of a
function f(x,y) defined in the whole (x,y)-plane give a vector at each
10. Differential Forms 173
point in the plane-hence a vector field in the plane (Figure I). Alterna-
tively the loci where f(x,y) = constant give a family of curves in the
plane-the contour lines for f. Their use is suggested by topographic
maps, picturing in the plane the varying heights above sea-level (Figure
2). When f is smooth the gradient vectors, if non-zero, are orthogonal to
the contour lines; for topography, they represent the direction of fastest
ascent. For Mathematics, their exploitation has proved decisive in topol-
ogy and Morse theory.
fba f(x)dx
f(c.d)(p(x,y)dx + Q(x,y)dy)
(a.b)
is taken along some path in the plane and adds up infinitesimal incre-
ments both in x and in y, each weighted by a function P(x,y) or Q(x,y).
These ideas will be clarified if we begin by thinking first just about the
gadget which sits under the integral sign.
174 VI. Concepts of Calculus
~-~ (2)
ayax ax ay
There is a standard proof, using the mean value theorem, of this intui-
tively plausible result. It means that a differential form w in (1) which is a
total differential must satisfy the condition
ap aQ . (3)
ay ax'
such a differential form is said to be closed.
A variety of examples, some to appear below, suggest the following for-
mal definition of a line integral h w. Here the "line" L means a smooth
curve in the region U of the (x,y )-plane where w is defined. The integral
is again a limit of a sum: Subdivide the curve L at points PO,PI , ... ,pn,
with Pi = (Xi,Yi), into n "short" pieces and form the sum
n
~ P(Xi,Yi)(Xi - Xi-I) + Q(Xi,Yi)(Yi - Yi-I), (4)
i=1
each summand is then the value of the differential form w at one of the
points Pi (or, alternatively, at some other point on the part of the curve
from Pi-I to Pi), with the differentials dx and dy replaced by the actual
increments Xi - xi-lor Yi - Yi-I in the coordinates. Under suitable
hypotheses, this sum approaches a limit as n ..... 00 and the length of each
piece of the curve approaches zero; this limit is the line integral f L w. The
idea here is just that of the ordinary integral; the identity in ideas can be
expressed formally. Represent the line L by smooth parametric equations
X = g( t), y = h( I); the line integral is then equal to an "ordinary"
definite integral between suitable limits in the parameter I,
Jw =
L
flo
I)
[P(g(t),h(t»g '(I) + Q(g(t),h(/»h '(/)]dl ;
10. Differential Forms 175
meets C in just two points-one, (XI,YI), at the left and another, (x"y,), at
the right, as in Figure 1. Then the double integral of aQ/ax can plausibly
be replaced by an iterated integral-first on x, then on y, and the integral
on x along a horizontal line (i.e., the sum along a horizontal strip) is by
the fundamental theorem just [Q(x"y,) - Q(X[,YI )]dy; the second
integration on y then produces the line integral of Qdy in two pieces-left
side and right side of A. As the arrows in the figure may indicate, it is
here important to integrate along the curve C in the appropriate direction,
so that the area A always lies to the left of C as traversed. (This requires
the geometric ideas of orientation, as discussed in Chapter III.) Our con-
cern here is not with the necessarily careful details of rigorous proof of (5)
(say, for more convoluted boundaries C) but with the more sweeping
observation that what appears in the Gauss lemma (6) is another (neces-
sarily formal and careful) realization of the idea that the total change in a
quantity (the total along the boundary curve) is exactly the sum of the
infinitesimal changes-just the idea which was already expressed (§4) in
the fundamental theorem.
The same idea reappears in higher dimensions. An integral over a two-
dimensional surface S in Euclidean 3-space has the general form of a
double integral
Sfs[L(x,y,z)dydz + M(x,y,z)dxdz + N(x,y,z)dxdy],
where L, M, and N are smooth functions of the coordinates x, y, and z
and the integrand is a second order differential form. Such an integral
may be defined as an appropriate limit of a sum or by reduction to a dou-
ble integral taken in the plane of two parameters sand t when the surface
S is given by parametric equations x = g(s,t), y = h(s,t), z = k(s,t). If
the surface S is the total boundary S = av of some volume V, Green's
theorem (also called Gauss' lemma) asserts that
SSJ,rv [ -aL
ax
- -aM + -aN dxdydz.
ay az
1 (6)
Figure 1
10. Differential Forms 177
Similarly Stokes' theorem deals with the integral of a first order form
taken over a curve as bounding a piece S of surface:
i ~
[Pdx + Qdy + Rdz] = If.s [ -aQ
ax
- -ap] dxdy +
ay
[ - - daQ
y ]d z .
+ [ -aR ] z + -aR
ap d
--dx
ax az ay az (7)
In each of these versions (5), (6), and (7) of the fundamental theorem,
the differential form w which on the left is integrated over the "boundary"
determines another differential form which appears as the integrand on
the right; the latter is called the exterior derivative of w, and is written as
dw:
[ -aR 1
x d z + [ -aR - -aQ dydz,
- - dap 1
ax az ay az
With this description of the exterior derivative of a form, all of these ver-
sions of the "fundamental theorem" of the calculus can be written in a
uniform way. Consider a bounded smooth portion V (volume, surface,
area) of some space with a smooth boundary av. Then
I avw = I vdw,
where the smooth differential form w (and thus it exterior derivative) must
be defined throughout the region V and on its boundary. Note that the
formulas above for these exterior derivatives can all be obtained from a
simple memnonic: For a function P or L, dP (or dL) is the usual total
differential; for a product, d(PQ) = (dP)Q + PdQ, for a variable x,
d(dx) = 0, and the differentials dx, dy of the variables are multiplied with
(dx + dyi = °
the understanding that every square is zero. Then (dx)(dx) = 0, while
and hence dxdy =
-dydx. From the rule (2) for inter-
changing the order of two successive smooth partial derivatives, it then
follows that a second exterior differential ddw is always zero.
178 VI. Concepts of Calculus
°
For differential equations, a central idea is that a smooth function f(t)
can be determined completely by its initial value at t = and knowledge
of all of its first derivative. The same idea appears for smooth functions
f(x,y) of two or more variables: They can be determined by "boundary
conditions" and knowledge of the partial derivatives. Such P.D.E. (partial
differential equations) arise not just from the conceptual analogy of these
ideas but from many different cases in theoretical physics where exactly
this kind of data is at hand for quantities depending on several indepen-
dent variables.
For example, let a one-dimensional wave at time t = °
be represented
by a height y (above each point x) given by a smooth function y = f(x)
as at the left of Figure 1. If this wave keeps the same form as it moves to
the right at a constant velocity c (in units distance per time), then the
height at position x at time t is given by u = f(x - ct) (try it out!). This
function u satisfies the first order P.D.E. au / at = - c au / ax and (from
Rolle's theorem) any smooth function u satisfying this P.D.E. must have
the form u(x,t) = k(x - ct) for some smooth function k. Thus the solu-
tion k = k(x,t) of the P.D.E. is determined by its initial values k(x, 0).
Similarly a wave moving steadily at velocity c to the left is given by
v = g(x + ct). Both u and v satisfy the second order P.D.E.,
(1)
o I ""------/ r
1=0 1=2
called the wave equation; the most general smooth solution of this equa-
tion have the form f(x - ct) + g(x + ct) for arbitrary smooth f and g.
In three space dimensions x, y, and z the wave equation reads
(2)
(4)
(5)
1 00
f(x) ~ lao + ~ak cos+kx + bk sin kx. (7)
k=]
The study of such series expansions had a major historical role in the con-
sideration of the question of Chapter V: "What is a function"?
Now suppose that the series (7) does converge to f(x) for all x and try
to determine the coefficients ak and bk by multiplying f and the series by
sin mx or cos mx and then integrating from 0 to 27T. One would like to
integrate the resulting series term-by-term. This is a typical example of the
problem of double limits (interchange convergence and integration). This
is a general question, with many ramifications in analysis. When it can be
justified in the present case, it leads to simple formulas for the coefficients
in (7):
1 1
am = -7T S f(x)cos mxdx,
0
2'IT
bm = -7T S f(x)sin mxdx.
0
2'IT
(8)
The study of the actual convergence of such series makes effective use of
the Lebesgue integral, and leads to many deep theorems. There is also a
uniqueness question: If two such series (7) converge to the same function
f(x), are their coefficients all the same? This question reduces at once to
the question: If such a series (7) converges to 0 for all x, 0 x < <
27T, are
the coefficients all zero? The answer is yes, and also yes if one assumes
convergence to 0 for all x except for a finite number of points. In fact, the
182 VI. Concepts of Calculus
f b
a
F(y,y ',x )dx, y = f(x).
~ ~~ ~~ /
Changing Quantity Approximation Total Quantity
Tangents velrity
--;::::
/ ~
/ ""'-
Area Arc Length
/1 ~
\j
Rt Nm Add Up Alii Ch",,, / \ j,fi,iI,
I
Infinite Series Existence Theorems Metric Space
I
Taylors Series Partial Derivatives Gauss-Green-Stokes
p.J~n ~rfaces
\ ?tiruEO=,
Fourie! Series - - - - - - - Rule /
~
Fluid Motion
/
Periodicity Potential--- Laplace Equation Exterior Derivatives
I
Trigonometry W'1 Diri'W,I, probl,m
I
Harmonic ~DIYI'sStinSbu/tl'oBnOsUndary ~Qond'ts, M h ,~TOPO~DgYfC t' I G t
uan urn ec ames I ,eren la eome ry
Linear Algebra
1. Sources of Linearity
To say that an effect is "linear" means that the effect respects proportions
and that the effect of a sum is the sum of the separate effects. The formal
description of such linearity can be stated in the context of a "linear" vec-
tor space defined much as in §IV.7. Such a space, defined over a given
field F of "scalars", is a set V of "vectors" equipped with two operations:
addition (v, w) 1-+ V + w of two vectors v and w, and multiplication
(a, v) 1-+ av of a vector v by a scalar a E F. The axioms require that the
vectors form an abelian group under addition, and that multiplication by
scalars satisfies the identities
a( v + w) = av + aw, Iv = v, (1)
for all vectors v and w in the space V and for all scalars a, b in F. These
two operations of addition and scalar multiple yield the more general
linear combinations, such as the combination
186 VII. Linear Algebra
(3)
stating that T preserves sums (is additive) and that T preserves scalar mul-
tiples (is homogeneous). If both S: W -+ U and T: V -+ Ware linear, so is
their composite S· T: V -+ U. A linear transformation T: V -+ V of V into
itself is a linear endomorphism. It is said to be non-singular when it has a
two-sided inverse T- I : V -+ V. This inverse (unique as always) is neces-
sarily linear.
This "transformation" language serves to codify the many cases of
Mathematical operations which are additive and homogeneous.
In calculus, the operations of differentiation and integration are linear.
Thus on an interval of the real axis, say the unit interval
I = {x I 0 < < x I}, consider the set Coo = C[oo of all those functions
f: I -+R which have continuous derivatives of all orders. This set Coo is a
vector space over R with addition and scalar multiple defined in "term-
wise" fashion by the equations
vectors. The same vector spaces can be described with scalar multiples on
the right. It will be convenient to use such right vector spaces in the next
sections.
A (2)
This is the familiar way in which the square matrix X also represents the
transformation T in terms of coordinates (or "variables") Xi; in particular,
the j th basis vector U j is sent into the j th column of the matrix A.
The coordinate representation (3) and the basis representation (1) of T
are equivalent. however, the matrix A for T depends on T and on the
2. Transformations versus Matrices 189
3. Eigenvalues
A matrix description of any mathematical object depends on a choice of a
basis, and so requires attention to the effect of a change in that basis-a
change which replaces old coordinates Xi by new coordinates Yi, where
Yi = "i.PijXj and the square matrix P with entries Pij is non-singular
(because one can change back). Two matrices A and B which represent
the same endomorphism of V relative to (possibly) different bases are said
to be similar. Thus in old (X) and new (Y) coordinates the same transfor-
mation reads
X' = AX, Y' = BY.
The change of coordinates is expressed by the matrix equations Y = PX
and Y' = PX', so an easy calculation with matrix products gives
B = PAP-I. In words, square matrices A and B are similar if and only if
there is a non-singular matrix P with B = PAP-I. This is the formal set-
ting; we will return later (in § 12 below) to the explicit question of deter-
mining when two given matrices are similar.
What is the simplest matrix similar to a given matrix A? In particular,
can A be similar to a diagonal matrix D-one with entries (say)
AI , . . . , An along the main diagonal,
D= (1)
o
192 VII. Linear Algebra
In terms of the identity matrix I (diagonal with 1's down the diagonal)
the equation AX = XA can be written as (A - AI)X = O. Hence a scalar
A is an eigenvalue if and only if this system of n homogeneous linear
equations in X has a solution X =1= 0; that is, if and only if A - AI is a
singular matrix. But a square matrix is singular if and only if its deter-
minant is 0, and the determinant I A - AI I of A - AI for A n X n is a
polynomial of degree n in A,
dXi n
~aijxj, i = l , ... ,n (3)
dt j=1
in n (real) variables Xi' The real matrix A with entries aij has n eigen-
values AI, ... , An, possibly not all different and real or possibly complex.
If A is similar to a diagonal matrix, the corresponding change of coordi-
nates to YI , ... , Yn then transforms the equations (3) to the form
dYi
dt = AiYi, i=l, ... ,n.
4. Dual Spaces 193
4. Dual Spaces
Vector spaces come naturally in pairs, acting on each other. Thus the
matrix product of a row by a column is a scalar; by this product, the "row
vectors" may be regarded as linear functions of the "column vectors", and
vice-versa. In §VI.lO the directional derivative of a function z = f(x,y)
along a path turned out to be a "product" of the gradient of f by the
tangent vector to the path, so the gradients (cotangent vectors) are thereby
linear functions of the tangent vectors, and vice-versa. To develop this
idea, notice first that scalar-valued functions on any set X constitute a vec-
tor space. Specifically, if F is a field, the collection F X of all functions
f,g: X --> F is a vector space when the vector operations are defined by the
equations
for all x on X and all scalars a E F. One says of (1) that the operations
are defined pointwise. For an infinite set X, this vector space FX is infinite
dimensional, but when X is finite, say with n elements, FX is isomorphic to
the familiar vector space F' of n-tuples of elements of F.
If X = V is itself a vector space over F, it is natural to consider just
those functions f: V -->F which are linear. Under the pointwise opera-
tions (1) they constitute a vector space (a subspace of FV)
called the dual space of V. This space is conceptually different from V, but
when V is finite dimensional, it has the same "size" as V. Specifically, if V
has a basis of n elements u I , . . . , un, then the value of the function f on
any vector with coordinates Xi may be calculated as
(3)
194 VII. Linear Algebra
i = j, (4)
= 0, i =1= j,
where the values ui(ui) = 8ii , called the Kronecker 8, are the entries of
the n X n identity matrix. Then I = ~iUi' so the Ui do form an n-
element basis of the dual space V*. This basis Ui , ... ,Un of V* is called
the dual basis to u 1 , . . • ,un. (The latter, we note, is written with indices
upstairs, all so arranged that sums ~/Ui' ~UiXi over an index i have one
index up, the other down.) In any event, we conclude for the dimensions
v. This in turn means that, for each fixed wo, b(wo, -): V ->F is an ele-
ment of the dual space V*, so that there is a map
V there is also an isomorphism V == V*, say that sending each basis vec-
tor u i of V to the corresponding dual basis vector Ui of V*-but this iso-
morphism depends on a choice of basis, so is "unnatural".
Matrices work backwards or forwards. Thus a rectangular m X n
matrix A gives on column vectors X a linear transformation X f-> AX of n
space to m-space. Equally well, on row vectors Y it yields a different
linear transformation Y ~ YA of m space to n space. Alternatively, A for
basis vectors comes on the right in (2.1), for coordinates, on the left in
(2.2). This may be explained by duality. If T: V ~ W is a linear transfor-
mation (say, the transformation on column vectors given by the matrix A),
then each vector I in the dual space W* is really a linear map f: W ~ F,
hence the composite IT: V ~ F is an element of the dual space V*, so that
I f-> IT is itself a linear mapping
(T + T')v = Tv + T'v.
In case W is just the base field F, regarded as a vector space over itself,
this space hom( v,F) is just the dual space V*. If V and W have dimen-
sions m and n, a choice of basis in each replaces each T by a matrix and
so proves the vector space hom( V, W) isomorphic to the space of all
n X m matrices with entries in F. Its dimension is then nm, with a basis
(say) those matrices which have one entry 1 and all other entries zero.
plane, we already saw in §IV.9 the utility of the inner product of two
plane vectors in describing angles and distances. It turns out here (as in so
many other cases) that the two-dimensional case is typical: One can get
the appropriate "Euclidean" geometry in an n-dimensional space simply
by adjoining to the vector space structure the additional structure of an
inner product.
An inner product space E is thus defined to be a vector space E over the
field R of reals with an inner product-a real valued function U'v of two
vectors u and v-which (as in equations (10), (11), and (12) of §IV.9) is bi-
linear, symmetric, and positive definite. Such spaces are at hand. Thus for
each dimension n the space Rn of n-tuples (XI, ... ,xn) of real numbers
has the "standard" inner product
Sof(x)g(x)dx.
I
f . g = (3)
With any inner product, the length or the norm 1 u 1 of each vector can
be defined by the equation
1U 12 = U • u, 1u 1 > 0, (4)
u . v = I u I I v I cos 0; (5)
This can be proved from the axioms of an inner product space. Moreover
one can define a distance between two vectors u and v-that is, the dis-
tance between their end points-as I u - v I . From the Schwarz inequal-
198 VII. Linear Algebra
ity it follows that this distance satisfies the standard triangle inequality.
Hence, with this distance, the inner product space is a metric space-as it
should be. In fact, in Rn with the standard inner product (l), the distance
between two n-tuples X and Y is just
the right hand side is the coordinate expression of the inner product as a
"bilinear form" in the x's and the y's. However, for the standard inner
product the matrix G = (gi j ) of this form is just the n X n identity
matrix. This can be arranged in any finite dimensional inner product
space, by choosing c normal orthogonal basis-one in which the inner
product ui·u j is the Kronecker ~ij. This requirement means geometrically
that each basis vector has length I and any two are perpendicular (ortho-
gonal). Any given basis vi, ... , vn can be made orthogonal as follows:
First tip v2 to be perpendicular to v 1 (by adding to v2 an appropriate mul-
tiple of v I). Then tip v3 to be orthogonal to v 1 and v2, and so on. Finally
shrink each of the resulting vectors to a length 1. This is called the
Gram-Schmidt process. Similar uses of normal and orthogonal bases crop
up for infil'ite dimensional spaces in Fourier series, Hilbert spaces, and
elsewhere. The idea is transportable from geometry to analysis.
The inner product is automatically a dual pairing of the space with
itself. Hence each finite dimensional space with an inner product is iso-
morphic to its dual space-because each vector v is also a linear function
"inner product with v". Hence for such spaces one can drop the distinc-
tion between a space and its dual.
6. Orthogonal Matrices
In a space with an inner product one has all the concepts of Euclidean
geometry: One can define spheres and rigid motions. For such a space E,
the appropriate endomorph isms are the orthogonal transformations; those
functions T: E --'>E which are linear and which preserve the inner product,
in the sense that Tu· Tv = U· v for all pairs of vectors u, v. This means also
6. Orthogonal Matrices 199
(1)
its transpose is its inverse. Since the determinant of a product is the prod-
uct of the determinants, this equation implies that an orthogonal matrix
has determinant ± I And if the orthogonal matrix A has a real eigenvec-
tor X *" 0, the equation AX = X\ and preservation of length implies that
A = ± I-the only possible real eigenvalues of an orthogonal matrix.
In two dimensions, an orthogonal transformation is a rigid motion leav-
ing the origin fixed, so must be either a rotation or a rotation followed by
a reflection. Hence the only 2 X 2 orthogonal matrices are ±Al/ where
is the matrix of a rotation through the angle {J about the origin, just as in
equation (IV.9.6). The characteristic polynomial of this matrix is
A2 - 2A cos {J + I; hence the eigenvalues are A = cos {J ± i sin {J-
non-real complex numbers unless {J = 'TTk for some integer k. Thus,
except for the identity rotation and the half-turn, there are no (real)
eigenvectors-no vectors rotated into a multiple of themselves. The eigen-
values are complex numbers of absolute value 1.
This fact is general. The eigenvalues of an orthogonal matrix A (and
hence of an orthogonal transformation) are always complex numbers of
absolute value 1. To see this, we treat the real-number entries in A as
complex numbers, and let A act on the complex vector space enof
columns X of n complex numbers. In this space en,
the inner product
should be given by the formula
(v·u) = (u·v)*
for any two vectors u and v. A vector space over C with such an inner
product (bilinear, positive definite, and antisymmetric) is called a unitary
space. The theory of these "unitary" spaces over the complex numbers is
wholly analogous to the theory of inner product spaces over the reals. In
particular, a linear transformation is unitary if and only if it preserves the
inner product, and an n X n matrix A of complex numbers is unitary if
and only if its conjugate transpose is its inverse. It follows that the eigen-
values of a unitary matrix (a transformation) all have absolute value l. In
particular a real orthogonal matrix is necessarily unitary-and hence the
result cited above about its eigenvalues. This is again a case where prob-
lems about real numbers require complex numbers for a solution.
From the definition it follows that the n X n orthogonal matrices form
under multiplication a group, called the orthogonal group On. Equivalently,
it can be described as the group of all orthogonal endomorphisms of an
n-dimensional inner product space. This group includes reflections which
invert the orientation of space, where orientation can be described as in
§III.8. The matrices of such transformations have determinant ± I. The
proper orthogonal matrices (those with determinant + I) form a subgroup
SOn of On, called the special orthogonal group.
An orthogonal transformation may also be described as a rigid motion
which leaves the origin (the vector 0) fixed. Just as in the plane, the most
general rigid motion of an n-dimensional inner product space is an
orthogonal transformation followed by a translation.
To summarize, we see that the geometric ideas of space, transformation,
and rigid motion extend naturally beyond two and three dimensions (and,
for analysis, into infinite dimensions) and have an effective algebraic for-
malization by vector spaces, linear orthogonal transformations, and their
representations by matrices.
7. Adjoints
As we have seen, any linear transformation T: E ..... E has a dual
T*: E* ..... E*. But if E is an inner product space, E can be identified with
E* by regarding each vector v of E as the linear function v· -, "inner
product with v". Then T*v, by definition of the dual T*, is just the compo-
site function v· T. In other words,
(T*v)· u = v· Tu (1)
7. Adjoints 201
j ,k = I, ... , n.
f f
b b
L(y)z = yM(z).
a a
The proof must use complex numbers (to prove a theorem about real
vector spaces!); it will be convenient to formulate it in terms of a (real)
symmetric matrix A representing T. By the fundamental theorem of alge-
bra, the characteristic polynomial I A - AI I of A has (possibly) complex
roots; they are the eigenvalues of A. Take one such eigenvalue,\. There
must then be a non-zero and (possibly)-complex column vector X which is
an eigenvector, so that AX = X'\. Let * denote the operation which
transposes each matrix and takes the complex conjugate of each entry;
then X*A * = ,\ *X*, where X* is a row and A * = A because A * is real
and symmetric. By the associative law for matrix multiplication (row by
matrix by column)
(X*A)X = X*(AX),
x2
7±17=1
i (I)
(+ for ellipses, - for hyperbolas). What about the locus of a more gen-
eral quadratic equation
ax 2 + 2bxy + ci = I? (2)
8. The Principal Axis Theorem 203
(3)
For the equation (2) this theorem means that with new (rectangular)
coordinates x' and y' the equation reads
A B
o
204 VII. Linear Algebra
B: U X V --3> W (I)
is said to be bilinear when the values B(u, v) are linear in u E U for each
fixed vector v and also linear in v E V for each fixed u. For example, the
9. Bilinearity and Tensor Products 205
inner product (5.7) is bilinear. It turns out that all of the bilinear functions
B described under (1) can be represented by linear functions and a single
bilinear function ® from U X V to a new space, called the tensor product
of U and V. The properties of this space are very useful for a wide range
of geometric and algebraic problems. Its existence is asserted in the fol-
lowing theorem.
Theorem 1. Given two vector spaces U and V over a field F there is a third
vector space U ® V and a bilinear function
®:UXV-"7U®V (2)
UXV~fI
B
tv
I
3'T,
(4)
I
t
w ,
to express the idea that the new bilinear function ® has the property that to
any old bilinear B there is exactly one linear T (dotted arrow) which makes
T ® = B; that is, which makes this diagram "commute".
Bilin( U, V; W)
D
u x v ----"=--.... u"D v
with both 0 and 0 bilinear and universal. Now since 0 is universal, it is
universal for 0, and so there is a linear T as shown, with T 0 = O. Vice
versa, since 0 is universal, there is a linear S as shown, with SO = 0.
This means that the composite S·T: U 0 V ..... U 0 V has (S·1)0 = 0.
But the identity map I of U 0 V also has this property I 0 = 0, while
the definition of the universality of 0 said that there is only one such
linear map. Therefore S· T is the identity of U 0 V. By exactly the same
argument T·S is the identity (of the second space U 0 V). In other
words, S is a two-sided inverse for T, so T is an isomorphism. The two
spaces U 0 V and U 0 V are isomorphic-and by an isomorphism which
carries the universal bilinear 0 in the first space into the 0 for the second
space.
Note first that this is really not an argument just about the universal
bilinear function, but about a universal what-not. Another example has
already appeared. In §V.S, the free product A *G of two groups was
defined to have as elements certain words algi ... angn in elements of A
and of G; then there were two group homomorphisms k l : A ..... A *G+-G
and k2 which were "universal" among all pairs of homomorphisms from
A and G to a third group H, as in (V.S.4)
A --?> H <E- G.
The argument we have just given for the tensor product shows that the
free product A *G with the homomorphism kl and k z is determined
uniquely, up to an isomorphism respcting kl and k2, by this universal
property. In other words, what matters about the free product A *G is not
its specific construction by means of words, but its universal property.
9. Bilinearity and Tensor Products 207
Recall also that the (direct) product A X G of two groups with its projec-
tions A --A X G~G was also shown to have a universal property-see
(V.S.2). In other words, the product of groups (or of spaces) need not be
constructed from ordered pairs of elements. Any other construction will
do, provided only that it yields the universal property!
Now we can return to formulate our construction for the tensor product
® of spaces, as in (2). Suppose for instance that U and V are finite dimen-
sional, with bases say UI , . . . , Urn and VI , . . . , Vn . First take mn symbols
Uj ® Vj for i = I, ... , m and j = I, ... , n and manufacture a new
vector space U ® V == pnn over F with these symbols as basis. Its vectors
are then formal linear combinations "J:.ai/ui ® Vj). Now consider any bi-
linear B to any space W. By its bilinearity, the values of B are all deter-
mined once one knows its values B( Uj, Vj) on the basis vectors. Now if we
define T by
indices. Once given a basis u 1 , •.. ,un of V one then has the dual basis
Ul , . . . ,Un of V* and therefore, in the above case, a basis for
V* ® V ® V consisting of n 3 vectors Ui ® u i ® Uk. One may then exhi-
bit a tensor by its coordinates relative to such a basis. One may readily
derive the (many indexed) formulas for changing coordinates from one
basis to another. They were once used to define what a tensor is, but this
we have avoided by the invariant definition of the tensor product itself.
For an inner product space V = E, there are also formulas for shifting
upstairs indices downstairs-on the basis of the canonical isomorphism
E == E*.
There are many other useful properties of tensor products. For exam-
ple, if T: U -+ U ' and S: V -+ V' are two linear transformations, then
< u, v> f-> Tu ® Sv is a bilinear function U ® V -+ U I ® V '; hence there
is by universality a corresponding linear function
vt,
mutative diagram:
p
v~. P(S)<O,
T 1 (1)
I
t
W , T(S) = o.
This may be proved by taking P to map each vector v of V into the
"coset" (or hyperplane) S + v consisting of all vectors s + v for s in S;
these cosets themselves form a vector space under operations such as the
addition given by (S + v) + (S + w) = S + (v + w). Essentially,
these cosets are just like the congruence classes Cm(a) modulo m used to
construct the integers modulo m, because each coset S + v consists of all
vectors v' in V with v' - v in S (in effect, with v' == v (mod S)). How-
ever, once the universal property (1) is established, it no longer matters
how the elements of this quotient space V / S are described, because it is
this universal property which formalizes everything about the quotient
(and which proves that the quotient is unique up to an isomorphims of
vector spaces).
This result allows one to collapse any given collection of vectors
Sl , . . . ,Sk of a space V-simply take the subspace S to be the set of all
linear combinations alsl + ... + aksk of these vectors; that is, the
subspace spanned by the Si; clearly any transformation T which maps all Si
to zero must map to zero all of the subspace.
As a typical example of the use of this process, let us describe another
construction of the tensor product U ® V of two vector spaces U and V.
First take all pairs < u, v> of vectors u f U and v f V, and write each pair
as if it were a product U·V. Form the (very large) vector space L which has
all these pairs U·V as its basis; thus L consists of all finite linear combina-
tions of symbols Ui'Vi with coefficients in the field F. Now <u,v> f-> U·V is
a function F: U XV ..... L, but it is by no means bilinear. One can force it
to be bilinear simply by collapsing in L all the things which would be zero
if F were bilinear-thus collapsing all elements
210 VII. Linear Algebra
(2)
and similarly for linearity in the second factor v. Now F becomes a bi-
linear functions U X V -->LIS into the collapsed space. One may verify
that it is universal, so that this construction LIS is indeed a tensor prod-
uct.
This example illustrates a use of infinite dimensional spaces to construct
finite dimensional ones-and also shows that there are many different for-
mal ways to construct what is effectively the same tensor product.
Mathematics uses many other types of "collapse" for other structures.
In each case one must note first what can be collapsed; that is, what can
be mapped to zero by a homomorphism: For vector space, any subspace;
for groups any normal subgroup. For a ring R, define an ideal of R to be
a subset K c R such that k( and k2 in K imply k( + k2 in K, while k in
K and r in R imply that rk and kr are in K. In any homomorphism
f: R -->S of rings, the subset of R consisting of all k E R with fk = 0 is
necessarily an ideal. This ideal is called the kernel of the homomorphism.
Conversely, given any ideal K in a ring R, there is a ring R / K and an
homomorphism R --> R / K with kernel K which is universal among
homomorphisms from R with kernel containing K. This quotient ring may
be constructed by taking its elements to be cosets K + r of K. This con-
struction includes the case of integers modulo m, because the set of all
multiples of m is an ideal in the ring Z of integers. It also includes one of
the constructions of the complex numbers C from the field R of real
numbers. For this, first form the ring R of all polynomials in a symbol x
with real coefficients. In this ring, the set of all multiples of the polyno-
mial (x 2 + I) is an ideal K, and the quotient ring R / K is (isomorphic to)
the complex numbers-because the collapse of x 2 + I to 0 forces x (when
collapsed) to satisfy the equation x 2 = - I defining the basic complex
number i.
This process also makes Galois theory possible without any recourse to
complex numbers. Over any (abstract) field F the polynomials in x form a
ring R = FIx] in which the multiples of anyone polynomial f(x) form an
ideal (f). If f is irreducible, the quotient ring R/(f) is a field containing
F and an element ~ (the coset of x) with f(~) = O. In short, R/(f) is a
field F(~) generated by F and one root of f; the remaining roots can be
adjoined in the same way to give an (abstract) splitting field and its Galois
group.
This notion of ideal, useful for collapses, will turn out to have other
uses in number theory and arithmetic (§XII.3).
These are rules intended to apply to the vectors dx, dy, dz in the
cotangent space at a point, say, in Euclidean 3-space. They need not be
mysteious, because they are algebraic rules which can be achieved by suit-
able collapse of formal products taken over any vector space V. This col-
lapse leads to exterior algebra.
Starting with V, the successive tensor products yield a string of vector
spaces
Their elements are called (covariant) tensors; those in the n-fold tensor
product Tn being tensors of rank n while (for completeness) the scalars in
F count as tensors of rank O. Taken together, all these tensors form an
algebraic system with the following rules of operation: Any tensor can be
multiplied by a scalar; any two tensors of the same rank can be added;
any two tensors can be multiplied by using the product 0; for example,
the product of the tensors u and v 0 w is
All these tensors, with these algebraic operations, form the so-called tensor
algebra T( V). This algebraic system isn't quite a vector space, because we
do not (and do not need to) add two tensors of different ranks; for the
same reason it is not quite a ring under sum and product. Technically, it is
called a graded algebra; i.e., both a graded vector space and a graded ring
(with grading by ranks); see Algebra XVI.4). The reader will find it
straightforward to write down the axioms for such a graded system, noting
that the convention (8.9) about triple tensor products ensures that the
multiplication is associative. However, the multiplication is not commuta-
tive.
Every tensor is a sum of products of vectors, so that T( V) is generated
by the vector space V; it is the "most general" graded algebra so gen-
erated, and so is called the free graded algebra over V. It thus has a
"universal" property, as follows: For any graded algebra A, each linear
transformation V -->A into the vector space of elements of grade I in A can
be extended uniquely to a morphism T( V) -->A of graded algebras. In
consequence, any such A generated by V can be obtained by suitably col-
lapsing the tensor algebra T( V).
For the graded algebra of differentials (1) we need a product with
u 0 v = - v 0 u and u 0 u = O. These identities do not hold in the
tensor algebra T(V). Hence we simply take the appropriate quotient of
T(V). This can be done by collapsing in T(V) all the elements t ® t (for
any tensor t of positive rank) and all the multiples (left and right) of any
such t 0 t by another tensor. The resulting quotient algebra E(V) is
212 VII. Linear Algebra
(4)
(5)
(6)
of basis vectors with indices in ascending order. For k > n there can be
no such product, so Ek(V) is zero for k > O. For smaller k, it turns out
that the (k,n -k) possible products of (6) do form a basis. This one may
see more easily for n = 3, say. Here we have the list of basis elements
suggested:
Eo: I,
E) : U), U2, U3,
(7)
E2: U) AU2, U) AU3, U2 AU3 ,
E3 : U) AU2 AU3 .
All the products of these elements can be computed by the rules (4) and
(5); with these product rules one verifies that any
t = a) U\ + a2U2 + a3u3 has tAt = o. Hence the required collapse
already happened when all the elements in the list (7) are taken to be
linearly independent. Therefore they are indeed linearly independent in
E(V).
For U) = dx, U2 = dy, and U3 = dz and so V = R3 this exterior alge-
bra then does provide a formal setting for the calculations with
12. Similarity and Sums 213
(8)
(1)
has zero as its only eigenvalue, but it is not similar to the matrix of all
zeros because it does not represent the zero endomorphism. Hence finding
a "simplest" matrix for a general T will require something much more
elaborate than just diagonal matrices.
One approach is to consider V not just as a vector space but as a
"richer" algebraic system (a "module") given by the additional presence
of T and its iterates. What this means is that any vector v of V can be
multiplied not just by scalars in the ground field F, but also by polynomi-
als in T with coefficients in F, as in
214 VII. Linear Algebra
a( v + w) = av + aw, I v = v. (3)
to the factors, given by v ~ <v, W> f-> w. They are "universal", so make
M EEl N a product of M and N in the sense used in §V.9 for groups. It also
has linear maps
M --33- M EElN ~ N
from the factors, given by v f-> (v,O) and W f-> (O,w). They are also
universal, so make it a "coproduct" -the construction which for groups
was called a free product in §V.9.
12. Similarity and Sums 215
-am_Ix m-I
0 0 0 -ao
1 0 0 -al
0 0 -a2 (6)
0 0 -a3
It is called the companion matrix (of the polynomial [(x». These matrices
will be the building blocks for the analysis of similarity.
The analysis uses modules over special kinds of rings. An integral
domain is a commutative ring in which bc = 0 implies b = 0 or c = O. A
principal ideal domain is a domain D in which every ideal K is an ideal
K = (d) consisting of all the multiples (in D) of some one element d.
216 VII. Linear Algebra
Both FIx] and Z are principal ideal domains. For modules over such aD
one can prove (with considerable trouble; Algebra, Chap. X) the
This result does include vector spaces (as modules over a field F), since
F is a principal ideal domain whose only ideals are (0) and F itself. How-
ever, the result (7) exhibits the complexity of modules relative to vector
spaces; for a finite dimensional vector space V over F one has only
V == F $ ... $ F-just the first r summands of (7), with r the dimen-
sion. For general D the additional modules D/(d;) here are called cyclic
modules because, like cyclic groups (§V.6) they are generated by one
element-the I of D.
What does this theorem mean when D is the polynomial ring F[x]? A
module M is then just a vector space V together with a linear endomor-
phism, "multiply by x". In case V is finite dimensional, there can be no
factor D = F[ x] in the decomposition (7), because each such factor D
would correspond to an infinite dimensional vector space F[x]. hence
there is only the string of cyclic modules D/(d;). Each of these
corresponds to a transformation with matrix a companion matrix (6).
Hence the
(8)
The number k and the polynomials d;, chosen monic, are uniquely
determined by the transformation T.
These monic polynomials are called the invariant factors of T. They
form a complete system of invariants under similarity: Two square
12. Similarity and Sums 217
matrices are similar if and only if they have the same invariant factors.
Correspondingly, the representation (8) is called a canonical form for a
(square) matrix, under similarity-every square matrix is similar to exactly
one such canonical form. Much more can be said (but not here) about
this "rational canonical form" and other canonical forms for matrices,
such as the Jordan canonical form.
What does the theorem mean when the principal ideal domain D is the
ring Z of integers? Consider any (additive) abelian group A. In A a
repeated sum (say three times, as a + a + a) amounts to the multiplica-
tion of a E A by an integer, here by 3, as 3a = a + a + a. This
"scalar" multiple enjoys the properties (3) and (4) used to define a
module. In other words, an abelian group A is just the same thing as a Z-
module.
(10)
Combined with (9) this gives the decomposition of finite abelian groups
already announced in (V.8.4)-the direct product there is the same as the
direct sum here.
Our main observation now is that this result (9) for abelian groups and
that for companion matrices are essentially the same. Separate proofs
(often presented by mysterious manipulations of matrices) would come
down to the use of essentially the same devices, and these devices are
codified-and hence better understood-by using the concept of a module,
generalized from that of vector space.
218 VII. Linear Algebra
13. Summary
Linear algebra starts with geometrical pictures of vectors and with ele-
mentary ideas about "linear" operations-those which preserve sum and
proportion. The exact formulation of these ideas is possible only with the
notion of a linear transformation between vector spaces. The manipula-
tion of such transformations involves both conceptual formulations and
calculations with matrices and presents a central problem, that of similar-
ity: When do two matrices represent the same linear endomorphism?
Geometric and analytic considerations introduce various constructions on
spaces-dual spaces and inner product spaces. Linear functions lead to
bilinear functions and more general products, using the tensor product of
spaces. In the presence of an inner product, the similarity problem for
such symmetric matrices can be solved by the use of eigenvectors and
eigenvalues. In the general case, one of the canonical forms for matrices
under similarity requires that vector spaces over a field be generalized to
consider modules over a suitable ring. Further developments of linear
algebra, not summarized here, involve the same pattern-an interaction
between geometry and analysis, leading to successive formal generaliza-
tions helpful in formulating and understanding aspects of linearity and its
manifold uses, for linear approximations and analytic operations.
Forms of Space
1. Curvature
In Euclidean and Non-Euclidean geometry, the phenomena of space were
analysed in terms of lines, triangles, angles, and congruence; in brief, such
a geometry is primarily linear. Many other geometrical phenomena, how-
ever, involve curved lines in the plane and twisted curves and curved sur-
faces in three-dimensional space. By approximating these curves by
straight lines, the methods of calculus come into play, leading to the sub-
ject of differential geometry, to which we now tum. The resulting elemen-
tary methods of analysing curvature lead inevitably to a study of the
intrinsic geometry of surfaces and to problems in the classification of sur-
faces and higher-dimensional manifolds.
A smooth path (that is, a parametrized curve) in the x,y plane is given,
as in §VI.9, by a pair of smooth functions
in the tangent space Tp of the plane at that point. If the tangent vector is
never zero, the path is said to be regular. The parameter t can be changed,
say to u = k(t), provided the smooth function k has k '(t) =1= 0 throughout
the interval in t. Such a change alters the length of the tangent vector (2)
but not its direction-and gives the same set of points (x,y). This collection
of points is the curve traced out by the path (1). Thus a change of param-
eter gives the same curve, traversed at a (possibly) different speed and
with points labelled with (possibly) different parameter values.
One first wants the length of such a curve. To measure the length of this
curve from tl to t2 one inscribes a polygon in the curve, calculates the
length of the polygon by the Pythagorean theorem for each piece, and
takes suitable successive such polygons with shorter and shorter sides. In
the limit, this gives the length expressed as the (Riemann) integral
(3)
(4)
(5)
K = y"(x) (7)
[l + y '(x )2]3/2 .
is defined to be the vector n in the direction of t " but with length 1. This
means that the derivative t' is proportional to n; indeed, t' = Kn because
the curvature K (of a plane curve or a space curve) represents the rate at
which the tangent is turning. For a space curve one may take a third unit
vector perpendicular to both the normal n and the tangent t and suitably
directed; it is called the binormal b. Thus at each point of the space curve
one has an orthonormal basis t, n, b (for the tangent space at that point).
The curvature and torsion together specify how these basis vectors (the
"frame") changes as one moves along the curve-as in the equation
t' = Kn and other such equations constituting the "Frenet formulas".
This method of "moving frames" has an immediate pictorial content
(think of the three first fingers of a hand stretched out to be orthogonal
and moving along the curve). This method applies not just to curves, but
to surfaces and beyond, and is a powerful formal tool in the use of cal-
culus to understand geometry.
In brief, the lines, circles and planes of elementary Euclidean geometry
can be used to approximate the curves (and later the surfaces) in space,
thereby making the ideas of the calculus apply in a geometric context.
z = r sin ().
.. x
~+-,----+---+-
in the (J-cp plane onto all of the sphere except the north pole, the south
pole and the "international date line" (the half great circle where cp = 'TT).
This map /2 ~S2 is called a chart on the sphere. Since the sphere is mani-
festly not the same figure as the rectangle, it is impossible to cover the
whole sphere with one such chart. Two charts will do-this one and a simi-
lar one where the "poles" are taken on the present equator. One may then
say that the sphere is obtained by pasting together two such charts (where
they overlap). Other surfaces can be similarly described by pasting
together several charts, where each chart is a map defined on an open
square like (2) (or more generally on an open set in the plane). For exam-
ple, the surface of a torus can be covered by three such charts, but no two
will cover the whole torus.
Often, a piece of a surface is given by a smooth equation z = f(x,y) in
rectangular coordinates x, y, and z; and this is just a chart with the x - y
coordinates as parameters.
The curvature of such a piece of surface should somehow measure how
rapidly the tangent planes to the surface change direction as one moves
from point to point. This change can be observed equally well by the "tip-
ping" of the normal vector (the vector orthogonal to the tangent plane).
Specifically, using the calculus and an equation z = f(x,y) as in §VII.9
one can determine at each point p of the surface S a plane tangent to S
and hence a vector 0 of length I orthogonal to the tangent plane; this vec-
tor is called a unit normal vector to S. If we choose these vectors smoothly
at nearby points of the chart we get a field of such vectors, one attached to
each point of this piece of the surface. (A reversal of sign, from 0 to - 0 ,
gives a second choice of such a local field.) Now the curvature of the Sur-
224 VIII. Forms of Space
face ought to be measured by the way in which these unit normal vectors
vary (tip about) from point to point on the surface. The idea is much the
same as that for a curve, where the curvature measures the rate at which
the tangent vector (or equally well the normal vector) turns. This intuitive
idea for measuring curvature for a surface can be formalized in several
ways.
First, one may try to reduce the question to one dealing directly with
the curvature of plane curves. To do this, consider all the planes contain-
ing the point p and the unit normal vector D at this point. They will all be
orthogonal to the tangent plane to the surface at p, and each will cut out
from the surface a plane curve which has some definite curvature K at p.
If the surface is a sphere, these "sectional" plane curves will all be circles
of the same curvature, but in general the sectional curvature K will vary as
the plane through D rotates about the axis D. This is the case, for example,
with the different curves cut out in this way at one vertex of an ellipsoid.
This example suggests that it is reasonable to concentrate attention on the
maximum curvature Kl and the minimum curvature K2 of these "sec-
tional" plane curves. These two are called the principal curvatures of the
surface at the point p and their directions (in the tangent plane at that
point) are called the principal directions at p. (In fact the principal curva-
tures can be calculated as eigenvalues of a suitable matrix, and the two
principal directions turn out to be eigenvectors orthogonal to each other.)
At any point on a convex surface (such as an ellipsoid) both the principal
curvatures will have the same sign (say, positive). However, at a saddle
point ! of a surface (for example at the origin for the surface
z = x - i) the principal curvatures have opposite signs, because one
of the sectional curves bends up, while the other bends down at p. If one
wants a single measure of the curvature of the surface at the point, one
may take a suitable combination of the two principal curvatures-for
example, the mean curvature (Kl + K2)!2. But notice that this mean cur-
vature might come out to be zero at a saddle point, although the surface
is surely "curved" there.
A different and deeper concept was developed by Gauss. On a little
piece A of the surface about the point p, take all the unit normal vectors D
at points of A and translate them so that they all start from the origin O.
The ends of these unit vectors then trace out a region B on the unit sphere
about the origin. Moreover, the more sharply the surface curves at p, the
larger the region B. Hence, assuming that areas can be measured, one
defines the Gaussian curvature of the surface at p to be a limit
is a function mapping the tangent plane Tp into itself; one then sees that
it is a linear transformation
L: Tp ---?- Tp.
With careful formulation (see for example Barrett O'Neill [1966]) it turns
out that this transformation is self adjoint (i.e., has a symmetric matrix
relative to any basis of Tp). This transformation is sometimes called the
"shape mapping" for the surface in question.
This shape mapping clearly contains information as to how rapidly the
unit normal is turning, for motions in any direction from the point p.
Hence it is not surprising that it can be shown (see e.g. O'Neill loco cit.)
that L contains all the information about the curvature of the surface at
the point p. Specifically, the determinant of the shape mapping (calcu-
lated as the determinant of anyone of the 2 X 2 matrices representing
that mapping) is the Gaussian curvature K of the surface, complete with
the desired sign. Like every self-adjoint transformation, this particular
transformation L can be brought to principal axes. It turns out that these
axes are exactly the principal directions of curvature, as defined above,
while the eigenvalues (the diagonal entries) are the two principal curva-
tures K) and K2. Since the principal axes of a symmetric matrix are ortho-
gonal, this shows that the principal directions of curvature are ortho-
gonal-as one may readily see in particular surfaces such as the ellipsoid.
Moreover the Gaussian curvature (the determinant of the matrix) is the
product K = K)K2 of the two principal curvatures (this defines the sign of
K). This indicates that the Gaussian curvature is positive when the surface
is convex or concave at the point in question (i.e., when the surface near
the point lies all on one side of the tangent plane there). On the other
hand, it also indicates that the Gaussian curvature is negative at a saddle
point, where the surface lies on both sides of the tangent plane.
The detailed demonstration of these results requires time and care and
can be done by various techniques, not necessarily formulated in terms of
the "shape transformation" L. However, the use of this transformation
does illustrate in striking form the close relation which obtains between
the algebraic properties of symmetric matrices and the geometric study of
curvature.
226 VIII. Forms of Space
dz =r cos () d(}.
(1)
One may also motivate this formula by drawing on the sphere a small
"spherical triangle" made up of arcs of circles with hypotenuse ds, hor-
izontal side r cos () dcf> and vertical side rd(}; then (1) is the Pythagorean
theorem for this infinitesimal "right triangle". What is more (and more
than we will prove here), this formal calculation gives the correct
integrand ds for the arc length of a spherical path written in latitude and
longitude coordinates for a parameter t as () = g(t), cf> = h(t). On the
practical side, such length calculations are needed in ocean navigation,
where the great circle joining two points provides the shortest possible
(smooth!) voyage between those two points. On the theoretical side, such
calculations of length form the starting point of much of Riemannian
geometry (§ 10 below).
There are similar formulas for the length of arc applicable to paths on
other surfaces in 3-space. One assumes that a piece of the surface is given
in terms of two parameters u and v by smooth functions
aj ag ah
au au au (3)
af ~ ah
av av av
3. Arc Length and Intrinsic Geometry 227
has rank 2 everywhere (i.e., has its rows always linearly independent).
This means that at each point in the u, v plane the functions (2) maps the
tangent plane of the (u, v)-plane linearly to the tangent plane at the
corresponding point on the surface-and that (3) is the matrix of this
linear mapping. Then the same sort of formal calculations as those for the
sphere yields a formula for arc length (that is, for a ds 2) of the form
ds 2 = E du 2 + 2 F dudv + G dv 2 , (4)
E(ABC)
lim
A,B,C~p area(LiABC)
228 VIII. Forms of Space
with limit taken as the three vertices approach the point p. Thus, just as
on the sphere, positive curvature means angle sum in a triangle larger
than 17, and larger positive curvature corresponds to larger angle sum.
This result makes the intrinsic character of curvature visible. It also recalls
the results on angle-sums in non-Euclidean geometry (Chapter III).
The discovery of this result raised in explicit form the problem of study-
ing the geometry of the surface itself, intrinsically and without reference
to the ambient space. This leads to the problem of describing surfaces and
higher-dimensional varieties without using any ambient space. This is the
origin of the idea of an (intrinsically) "curved space", as used in relativity
theory. Thus it is that the Euclidean viewpoint for geometry is tran-
scended.
----4-----------------~x
Figure 1
4. Many-Valued Functions and Riemann Surfaces 229
11
-----=+--4--+---+---,~--_x
o~ 0 < 271"
when Vr is taken real and positive. The values W1 of the first square root
cover all the upper half of the w-plane, except for the negative real axis OR
(Figure 3), while the values W2 cover the lower half-plane, omitting the
positive axis OA '. Each of these half-planes is the image under Vz of the
whole of the z-plane, so we need two copies of the z-plane, as in Figure 4.
230 VIII. Forms of Space
B ___ ~
j __ ~v~A
o
Figure 3
To make this image continuous, each copy of the z-plane should be cut
apart by a slit along the positive z-axis. Now the whole surface S' is
obtained by pasting together the two halves of the w-plane from Figure
3-ray OB in the top half pasted to ray OB' in the bottom half and then
ray OA' in the bottom half pasted (point for point) to the ray OA in the
top half. Now exactly the same surface S can be described by pasting
together the two slit copies of the z-plane (Figure 4): The lower slit OB in
the top plane is pasted to the upper slit OB' in the bottom plane while the
upper slit OA in the top plane is pasted (point for point) to the lower slit
OA ' in the bottom plane. The result, seen end-on from the point at 00 on
the positive real axis, is illustrated in Figure 5 which is slightly inexact,
because the ray 000 represents two separate rays, meeting only at the ori-
gin O. All told, the surface S on which the function Yz becomes single-
values is thus represented by pasting together two copies of the slit z-
plane, so that each copy carries one of the two possible values (one of the
"branches") of Yz. Note that the location of the slit could be changed by
replacing the positive real axis by any other smooth curve running from 0
to 00. Different such choices of a slit give the same topological surface,
which is described intrinsically as the manifold of all pairs (w,z) with
w2 = z.
This first example illustrates a general method for turning a many-
valued "function" w of a complex variable z into a single-valued function
not of z but of a point on a surface S, a Riemann Surface. First decom-
pose the z-plane by slits so that w becomes single-valued when it is fol-
lowed continuously along paths constrained not to cross the slits. This
yields several single valued functions Wi which may be called the branches
of w. For each such branch wi' take one copy of the slit z-plane to carry
these values wi. Then paste these copies together along the slits
appropriately matched so as to get the desired surface S. This is a process
which visualizes the surface S, which is described intrinsically as the man-
ifold of all the pairs (w,z) involved.
e A o_ _ _ _ _ B'
---C.-
B A'
bottom
Figure 4
4. Many-Valued Functions and Riemann Surfaces 231
2nd plane
/
/
/
/
/
/
1st plane
Figure 5
Figure 6
232 VIII. Forms of Space
We do not need explicit formulas for the two roots WI and W2, but we
note that for values of z very near z = I the values WI and W2 will inter-
connect much as did the values of the square root Yz-=l (just like Vz").
In other words, as the point z moves once around z = ± I in a small cir-
cle, the first root WI will change into the second root W2, and W2 into WI.
There will be a similar interchange near the three points
z = - I, Z = +2, and z = - 2, and our two copies of the z-plane
should be connected correspondingly. This can be done if we take two
copies of the z-plane and slit each copy along the real x-axis from + I to
+2 and from - I to - 2. These slits will insured that a circle or other
closed path not crossing the slits cannot go around just one of the points
± I, ±2, but must (Figure 7) go around two (or four) of them-and so
following such a closed path will not change one value Wi into the other.
Now to follow what happens to Wi across the slits we paste the two z-
planes together, so that the upper edge of a slit on the top z-plane
attaches to the lower edge on the bottom, and vice versa, much as in the
previous Figure 5. Also the two copies of each point ± I and ±2 are
identified. Now the two copies of the z-plane can carry the two values
WI and W2 so that they fit together across the slits. The result is a
Riemann surface such that both z and ware single-valued and continuous
functions of a point on the surface.
This description of the surface is not really intrinsic, since we could
have used differently placed slits, say slits along semicircles from - 2 to
+2 and - I to + 1. However, any such description provides a picture of
the intrinsic manifold ("algebraic curve") of all solutions of the equation
w2 = (z2 - 1)( z2 - 4). The previous Figure 2 displays just the "real"
points on this curve.
As before, one may make a more "geometric" picture of this surface by
starting with two slit Riemann spheres instead of two slit z-planes, then
distorting each of the slits into an open circular hole. The result is then
two spheres, each with two circular holes which are to be joined rim to
rim. This join can be done by using two cylindrical tubes, as in the Figure
8. A distortion of the resulting surface shows that it is "like" the surface of
a doughnut-a torus. For these and more elaborate Riemann surfaces one
wishes to formulate an intrinsic description.
Figure 7. Slits.
5. Examples of Manifolds 233
5. Examples of Manifolds
The sphere can be described as the "manifold" of all solutions (x,y,z) of
the equation x 2 + i + z2 = l. In many other cases the set of all solu-
tions of some problem or the collection of all "things" with some observ-
able property can be regarded as a geometric entity-curve, surface, or
solid-in which "nearby" solutions are pictured as "nearby" points.
One simple example is the manifold of all possible quadratic polynomi-
als ax 2 + bx + c with real coefficients. Since such a polynomial is fully
determined by the three real numbers a, b, and c, it can be represented
by the points in Euclidean 3-space with these coordinates a, b, c. Hence
this manifold of quadratic polynomials is just the space R3.
In mechanics one has occasion to consider a double pendulum in the
plane, with a second bob B hanging by a string from the end of a first bob
A, the latter hung from a fixed point P (Figure 1). To study the possible
motions, one needs first of all the "manifold" of all possible positions of
this double pendulum. Now a position is described completely by giving
two angles () and If>; to wit, the angle made by each bobstring with the
vertical. Here the possible values of the angles () and If> range from 0 to
2'1T radians, with 2'1T counting the same as zero (Figure 2). Hence the possi-
21T r----~p~__,
Q Q'
Figure 2
Q'
R
,.-------......\
T s v
R R S
I Q
1'01'
I
I
T V T P p' V
Q'
I
I
I Q'
I
S S R
Any set X whatever can be regarded as a topological space when all the
subsets of X are taken to be open; this gives the so-called discrete topol-
ogy on X. For an infinite set X there is a topology in which the empty
subset and all subsets V with a finite complement are taken to be open.
This clearly satisfies the axioms for open sets, but is far from the usual
geometric pictures of a space.
In view of this generality, it has been useful to consider various re-
stricted classes of topological spaces. As in §I.lO a space X is said to be
Hausdorff when it satisfies the following "separation" axiom:
For many purposes one may wish to put a space together out of smaller
pieces, just as a sphere may be described by two overlapping charts or a
circle by two or more overlapping intervals. This pictorial idea (scissors
and paste!) can be formalized in the notion of a "covering". An open cov-
ering of a space X indexed by a set I is a family Vi, for indices i E I, of
open sets Ui of X, such that X is the union of the Ui (i.e., such that every
point of X lies in Vi for some index i E I). The space X is thus com-
pletely determined by knowing all the subspaces Vi and how they over-
lap. For example, a subset V of X is open if and only if each intersection
V n Vi is open in the (induced) topology of the space Vi; for the same
reason a function f on X to some other space is continuous if and only if
the restriction of f to each of the open sets Vi is continuous. This result is
the reason why it is useful to consider coverings by open sets-and why
one describes surfaces by overlapping open charts.
Coverings are also used in the definition of compactness (§VI.7).
For many purposes, one studies not general topological spaces, but the
more special compact Hausdorff spaces-in particular surfaces which are
both compact and Hausdorff.
It is also appropriate to ask when a space is connected. The intuitive
idea is that a space is connected when it does not "fall apart" into two or
more pieces. This can be turned into a formal definition in at least two
ways. One definition reads: A space is connected when it is not the union
of two disjoint, non-empty open sets V and V. For example, the topologi-
cal space consisting of the whole real line minus the origin is not con-
nected, because it is the union of two (disjoint) half lines, each open-
because the origin has been omitted.
7. Manifolds 239
7. Manifolds
The second aspect of the "intrinsic" description of surfaces is the sense in
which any surface is "two-dimensional". This should mean that each
point of the surface has a neighborhood which looks like (i.e., is
homeomorphic to) a neighborhood of some point in the Euclidean plane.
We can express this condition in terms of the "charts" which have already
been used (in navigation and) in our description of the intrinsic geometry
of the sphere. Specifically, if S is any topological space a (2-dimensional)
chart on S is a homeomorphism
U c S, (1)
W' , (2)
240 VIII. Forms of Space
u u'
which is a homeomorphism
this is informally called the overlap (or patching) map for the two charts,
because it codes the way in which the two charts are to be pasted together
on the manifold. In other words, a surface is a topological space obtained
by pasting together open sets (more explicitly, open discs) cut from the
Euclidean plane.
This description of a surface S is not yet an invariant one, because one
and the same surface may have many different atlases. It is for this rea-
son that the definition reads "A surface is a topological space for which
there exists an atlas" (e.g., there can be many different atlases for the
sphere!). Alternatively one might take all the possible charts for S; they
form an atlas called the maximal atlas because it contains every other
atlas. Then a surface would be defined as a topological space which is
covered by the domains of all of its two-dimensional charts.
As an example, consider the projective plane, defined to be the mani-
fold of all lines through the origin of R3. As we have seen in §5, it can
also be described as a circular disc in which diametrically opposite pairs
of points on the circumference have been identified. It can be covered by
three charts, for example, by three sectors of the disc, each extended a bit
over its boundary so as to be an open set (see Figure 2). In this atlas, the
charts V and V' overlap right side up along their extensions on AO and
also along BC-but upside down there because CB is pasted upside down
c B
u u'
o
u"
c
A
Figure 3
242 VIII. Forms of Space
b b
a
a
8. Smooth Manifolds
The surface of a rectangular box (a parallelopiped) is a two-dimensional
manifold according to our definitions-it is a compact and connected
topological space which is, moreover, homeomorphic (say by radial pro-
jection) to the surface of a sphere. However, a curvature can hardly be
defined at the corner points of such a box, and at these points there is also
no tangent plane. In order to describe tangent planes, intrinsic curvature,
and the like we must restrict attention to smooth surfaces.
The idea, as in §VI.9, is that a function is smooth when it has enough
derivatives. To expand this, consider first a function f of two real vari-
ables XI and X2 which is defined in some open set W of the plane R2,
thus f is a function
(1)
Since W is open, each point (XI,X2) in W has an open neighborhood also
contained in W; this allows the usual definition of the partial derivatives
of f at that point. Then f is said to be smooth of class C:IJ when f has
partial derivatives of all orders at each point of its domain W, while f is
said to be of class Ck when it has continuous partial derivatives of all or-
ders up through order k.
Consider next a function I/;
(2)
between two open sets W' and W in the plane R2. Here each point p in
W' is determined by its coordinates, say X I and X2, while each point in
W is determined by its coordinates, say Ui and U2; they can be regarded
as functions UI,U2: W ..... R. In this way the function I/; of (2) is described
by two composite functions UI·I/; and U2 '1/;: W' ..... R
(3)
This is the expression of I/; in terms of (local) coordinates. The function I/;
is defined to be smooth if both these coordinate functions f I and f 2 are
smooth (have continuous partial derivatives of all orders).
This definition has three basic properties:
open subset of W', then the restriction of l{; to V'is a smooth function
I V': V' ---> W.
l{;
given as a composite of (suitable restrictions of) <p I and <p -I. This overlap
map W' ---> W goes between subsets of R2, so, as in (2), we can determine
when it is smooth. Hence one can say that the charts <p and <p I have a
smooth overlap if the composite <P/I<PI- I is smooth. If we write UI and U2
for the local coordinates in V' and Xl and X2 for the local coordinates in
V, this amounts exactly to saying that the local coordinates UI,U2 in the
overlap are smooth functions (on <p (V n V')) of the local coordinates
XI,X2·
Now define a smooth surface S to be a (topological) surface S together
with an atlas A of charts <p i: Vi ---> Vi for i E I such that any two charts <p i
and <p j in the atlas A have a smooth overlap. As before, this atlas may
not be maximal, but we can construct the maximal atlas by the following
details. So consider any other chart cf>: V ---> V for S which is "smooth for
A" in the sense that <p has a smooth overlap with every chart <p i in the
atlas A. We want to consider the collection A * of all such charts <p. We
claim that if <p I: V' ---> V' is another such chart, smooth for A, then
<p and <p I have a smooth overlap. Indeed, for each chart <p i of the given
atlas A, there is a (possibly empty) intersection Vi = V n V' n Vi of
the domains and a diagram (Figure I). Then, for suitable restrictions of
the charts <p we have smooth overlaps
the first because cf> has smooth overlaps with all cf> i, and the second simi-
larly for <p'. By composition (principle 10 above) the map
246 VIII. Forms of Space
~.
¢J' 0 w'
't ¢J i
'V ¢Ji(V;l = ¢Ji(un U' n Ui )
cp 'cp - I: cp (Vi) -> cp '( V;) is smooth. On the other hand the domains
Ui of
the given charts cover the whole surface S, so the intersections
cp (Ui nUn U ') cover the whole of W = cp (U n U '). Therefore, by
the patching principle, cp 'cp -I: W -> W' is smooth on the whole of W. In
other words, the atlas A * of all charts smooth for A has the property that
any two of its charts cp and cp , have a smooth overlap. Among all smooth
atlases for the smooth surface S, it is the maximal one-and every smooth
surface can be described in an invariant way by such a maximal atlas. The
description evidently depends on the three basic properties (composition,
restriction, and patching) for smooth maps.
The same ideas will describe smooth manifolds M of any dimension.
One also has the notion of a smooth map
f:M~R (6)
one needs only one chart for the reals R, considered as a one-dimensional
manifold. Thus the definition above states that such a function f is
smooth if on the domain U of each chart of M with local coordinates
9. Paths and Quantities 247
under (6). By the patching property, it is enough to require this just for
the charts of some atlas covering M' -but the notion of smoothness is
intrinsic, since it is independent of any particular choice of charts or of
coordinates.
The plane, the sphere, the torus and the projective plane, with the respec-
tive atlases which we have described, are all smooth two-dimensional mani-
folds, with the expected smooth maps. There are similar smooth manifolds
in higher dimensions. For that matter, the circle, the open interval and the
whole real line are smooth one-dimensional manifolds.
we may suppose that h(O) = (0,0) is the origin (of local coordinates) on S.
The map h is a path through h(O), parametrized by t, while the map I
is a (real-valued) quantity (e.g., a physical quantity) defined everywhere
on the surface S. Such a quantity on S may be pictured by its level lines
(the loci I(Xl,X2) = constant} on S, as in Figure 1. At the origin t =
the path has a tangent vector with coordinates
°
dhl dh2]
[ (2)
dt'dt 1=0
while the quantity f has a gradient (or differential) given (as in §VI.9) by
coordinates
(3)
248 VIII. Forms of Space
d(fh)
dt
I -
1=0 -
[XaXl
dh l
dt +
X
a ah2]1
X2 at 1=0·
(4)
This formula exhibits the cotangent vector (grad 1)0 of (3) as a linear
function of the tangent vector (h' J,h '2) of (2), and it indicates that two
quantities 1 and g have the same cotangent vector at this origin when
they have the same directional derivatives along all paths-and dually that
two paths through the origin have the same tangent vector there if and
only if they give equal directional derivatives there for all quantities.
This last remark suggests how one can define tangent vectors (or
cotangent vectors) without any use of coordinates. So consider all smooth
paths through the origin and all smooth quantities 1 there. Under addi-
tion and multiplication by reals a (real scalars a E R) these quantities do
form a (high-dimensional) vector space. Without using coordinates, each
smooth composite function Ih has a derivative
d(fh)
Do(fh) = ( j ( 11=0 (5)
and call it the "tangent vector" defined by the path h at O. In much the
same way, one may call two quantities !,g: S ..... R "cotangent" at 0 (in
symbols, ! ~og) when Do(fh) = Do(gh) for all smooth paths through O.
This relation is again reflexive, symmetric, and transitive, so we may intro-
duce for each! the class
and call it the gradient (or the differential or the cotangent vector) for !
at O.
NOW!I_Ogl and h-Og2 imply that
(8)
(9)
This idea of a cotangent bundle also arises from mechanics (Chapter IX).
The tangent bundle T.S is similarly described; its points are all the
tangent vectors at all the points of the surface S; they form a four dimen-
sional smooth manifold with the evident coordinates, and with a smooth
projection T.S --'>S. A smooth cross section of this projection is called a
vector field on S. Such a field assigns to each point of S a tangent vector
at that point-and this in a smooth way. For example, if S is the plane
with coordinates x,y, a pair of simultaneous differential equations
dx
dt = f(x,y), d: = g(x,y)
xf + xi ... + xl = 1.
where E, F, and G are smooth functions. More generally, one may express
such a Riemann metric in terms of local coordinates Xi as a "quadratic
differential form"
2
ds 2 = ~ gi,j(Xi,Xj )dXidxj, (1)
i.j=1
where the gij are smooth functions of the local coordinates and are sym-
metric (gi j = gji) and positive definite. The latter means, as usual, that
(al,a2) =1= (0,0) implies ~ gij(xi,Xj)aiaj > 0. A Riemann metric on a sur-
face means a metric (1) on each chart, such that the metrics agree (under
change of coordinates) on each overlap of charts.
To interpret this, consider a path h: 1--> S on the surface for which the
image h(/) lies in the domain of the chart involved. We have already seen
that each differential form
I dh l 2 dh2
w(Toh) = g (XI,X2) dt +g (XI,X2) dt'
252 VIII. Forms of Space
(2)
(3)
11. Sheaves
Continuous functions on a topological space X can be described just in
terms of the open subsets of X, since a function f: X ~ Y is continuous
when the inverse image under f of each open subset of Y is open in X.
Smooth functions on a manifold, on the other hand, require more for
II. Sheaves 253
--~--~--~--~--------~x
J
o 2 3
Wi n Vj = Wj n Vi
for all i and j, the union W = U Wi is the unique open subset of V with
W n Vi = W; for each i. Hence Q is a sheaf.
The matching-piecing condition applies also to smooth functions.
Thus for an open set V on a smooth manifold and a covering V = U Vi,
a function f: V ..... R such that its restrictions Ji: Vi ..... R are all smooth is
necessarily smooth itself-because the derivative of J at a point of V
depends only on the values of J in any specific neighborhood of the point.
Hence for any smooth manifold M the sets
Mechanics
1. Kepler's Laws
Fascination with the motions of the planets and the stars is endemic. For
the Greeks, a precise description of planetary motion was provided by
ptolomaic astronomy. Given that the circle was for Greek geometry the
dominant representation of repetitive or periodic motion, there was an ini-
tial inclination to think that the planets move in circular orbits with the
earth as center. However, such an orbit did not suffice to explain the
appearances-in particular the observation that at times the planets
seemed to move backwards, in a retrograde motion. To account for this,
Ptolemy provided epicycles (small circles superimposed on the original
circular orbits-and with enough epicycles most of the motions (perhaps
almost all) could be explained. Such an explanation required a consider-
able number of epicycles; it served until Copernicus introduced a
260 IX. Mechanics
heliocentric theory. It was then Kepler who used many careful measure-
ments of the positions of the planets (i.e., their orbits) relative to the sun.
After elaborate calculations he was able to propose his three laws about
these orbits of the planets. His laws read
(1) The planets describe orbits in a plane containing the sun, in such a
way that the areas swept out in equal times are equal. (This refers to
the area swept out by the radius vector from the sun to the planet.)
(2) Each planetary orbit is an ellipse, with the sun at one focus.
(3) The square of the period of each planet in its orbit is proportional to
the length of the major axis of the ellipse.
In this form, Kepler's laws are a summary (in geometrical terms) of
facts of observation. Newton, using the calculus for this purpose, was able
to deduce these laws from more basic principles of motion. In particular,
Newton's second law of motion states that the force on a particle is its
mass m times its acceleration. Here the force F and the acceleration a are
three dimensional vectors, written with bold face letters as is the custom
in theoretical physics. Thus Newton's second law takes the form of a vec-
tor equation
ma = F, (1)
(2)
of planets and stars, and to the more local motions of bodies (such as pro-
jectiles) in the earth's gravitational field. Newton presented both kinds of
applications in his famous "Mathematical Principles of Natural Philoso-
phy" (1687), but the presentation there did not explicitly use the calculus.
The full development of the subject of "Newtonian Mechanics", with its
use for extended bodies, fluid mechanics, and the like, required more than
a century and was advanced by many other noted mathematicians, espe-
cially Leonard Euler (1707-1783).
Newton did deduce Kepler's laws from his; let us summarize the pro-
cess. Consider the motion of a planet of mass m relative to the sun. We
neglect the gravitational forces from the other planets, so that we have a
two-body problem, with sun and planet as the two bodies. At a chosen ini-
tial time, the vector force on the planet is directed toward the sun, and the
planet has some initial vector velocity. If this velocity happens to be
directed toward (or away from) the sun, all the subsequent motion (by
Newton's second law) must be along the line joining the planet to the sun.
Leaving aside this exceptional case, the initial velocity and the line from
planet to sun together determine a plane containing both the velocity and
acceleration vectors. Hence, by Newton's second law, all the subsequent
motion takes place in that plane. This is the derivation of that part of
Kepler's first law which asserts that the orbit is planar. The proof did not
use the inverse square law.
To get the shape of the orbit, we describe the plane of the orbit by rec-
tangular coordinates x and y relative to the sun 0 as origin. The gravita-
tional force F on the planet P is then proportional to Ilr 2 and directed
from P to the origin 0, where r is the distance OP, This suggests the intro-
duction of polar coordinates (r,(}), as in Figure 1. These polar coordinates
convert to rectangular coordinates (x,y) by the familiar equations
which embody the definitions of sine and cosine. Therefore the force F is
a vector of magnitude F with components - F cos () and - F sin () along
the x and y axes. Thus Newton's second law, written in components,
becomes
here the double dots represent the second derivatives of the (variable)
coordinates with respect to time. These two differential equations can be
combined to give the equation
m(xy - yx) = °
which states that the time derivative of yx - xy is zero, so that
yx - xy = k (5)
is a constant k.
This formal deduction has substance. The quantity yx - xy on the left
of (5), when multiplied by the mass m, is called the angular momentum of
°
the planet P about the point 0. So the proof has shown that the angular
momentum about is constant when the force is directed toward (or away
from) 0. In polar coordinates, the equation (5) for angular momentum
becomes, by (3),
(6)
°
(x,y,O). Their vector product (§IV.14) is then (O,O,yx - xy); multiplied
by m, it is the momentum, considered as a vector through perpendicular
to the plane of the motion.
This is also related to the area A = A(t) swept out by the radius vector
OP in time t; Figure 2 indicates that the increment in area A(t) due to an
increment 6,() in the polar angle is approximately the area 'hr 26,() of a cir-
cular sector of radius r and angle d(): Hence, by .the usual limiting pro-
cedures, the rate of change of area is A = (1/2)r2(). Thus the equation (6)
contains that part of Kepler's first law which states that area is swept out
at a constant rate; again the conclusion does not depend on the inverse
square law!
To get the equation of the orbit, we must use the inverse square law in
the form F =: mp,lr2, where p, is some constant. By (6), this can be written
F = (mp,lk)(); then the second order differential equations (4) become
OL----L-------+----
(7)
By putting in the polar coordinate values (3) of x and y, this is the equa-
tion of the orbit written in polar coordinates. More directly, the right
hand side in (7) is proportional to the distance from the line whose equa-
tion is that right hand side set equal to O. It is easier to see this by choos-
ing the initial conditions at a point where () = 0 and x
is zero. Then
A = 0 and the equation (7) reads
(8)
Now k 2 /MB - x on the right is clearly the horizontal distance from the
planet P to the (vertical) line x = k 2 /MB. Then equation (8) states that
the planet moves so that its distance r from the sun at 0 is always a fixed
proportion B of its distance from a vertical line. But a conic section (Fig-
ure 3) can be defined, in terms of a point 0 as focus and a line D as direc-
trix, as the locus of all those points P with distances from 0 and D so that
PO = ePD, where the constant e is the eccentricity of the conic. The orbit
must then be an ellipse (e < 1), a parabola (e = 1), or a hyperbola
(e > 1); in the case of the planets we know by observation of their
recurrence that the orbit must be an ellipse (with the sun at one focus). As
one knows, an ellipse can also be defined in terms of two foci, as the locus
of points P such that the sum of the distances from P to the two foci is a
constant 2a; then 2a is the length of the "major axis" of the ellipse. A
straightforward analysis will then give Kepler's third law about the square
of the periods of different planets. But we have already established the
essential point: When velocity and acceleration are represented by deriva-
CJ)-+--I"-
Figure 3. An ellipse as a locus.
264 IX. Mechanics
(1)
2. Momentum, Work, and Energy 265
f u
w= f u
(FIdx I + F2dx 2 + F3dx 3)
(2)
= f o(FIX ll.I
+ ·2
F2 x + .3)d
F3X t.
This definition of work as an integral does include the special case (of a
constant force), as introduced above. What we call a "differential form" in
(1) is in physics texts usually called the differential 8W of work, but the
idea is the same-it is the thing which is to be integrated (along paths) to
get the work done along these paths; the symbol 8W does not necessarily
mean that there is a function W of position with this differential.
Now use Newton's second law for a particle of mass m moving under
this force field F along this path u, given by smooth functions Xi(t). Then
i = 1,2,3.
The indefinite integral of this product is (1/2 )(m(x I )2), so that the work
done along the path is
f U
w = (l/2)m[(xI)2 + (X 2 )2 + (x3)21~1, (3)
where the notation at the right means the difference of the values of the
bracketed expression at tl and O. This leads to the definition of the kinetic
energy T of the particle as
(4)
and to the theorem that the work done along the path is equal to the
change (3) in the kinetic energy along that path. In brief, if v is the mag-
nitude of the velocity, then the kinetic energy is
T = (l /2)mv 2 .
Sometimes the differential form w = 8W
of work may be the
differential w = -dV of an actual function - V of position. This means
that the force F; in each direction Xi is the partial derivative
F,. = --
av i = 1,2,3. (5)
I ax i '
This implies that the work done along any smooth path from a point b to
a point c is just the difference V(c) - V(b)-and so is independent of the
266 IX. Mechanics
v= -ymlr; (6)
(7)
for some constant k. This force -kx may be derived from a potential
kx 2 /2 + C, for any constant C (potential energy is determined only "up
to" an additive constant). The equation (7) has the well known integral
A cos kt + B sin kt or
------~------~~--~-----+x
circle. A point Q moves around the circle at uniform speed; its projection
P on the x axis is the point executing the original harmonic motion. Such
"harmonic oscillators" are important because general periodic motions
(oscillations) can be built up (via Fourier series) out of linear combina-
tions of these "simple" harmonic oscillations. The reappearance of the
geometrically defined trigonometric functions cos kt and sin kt in this
connection with mechanics is a remarkable (and classical) instance of the
interrelation of mathematical constructs.
The idea of using the phase plane with coordinates both position and
velocity is relevant, because the initial conditions for the motion (position
and velocity) fix a point in this phase plane. This point (and thus these
conditions) determine the constants of integration A and <I> ; geometrically,
this point determines the circle above. Put differently, this point is a posi-
tion on the real axis and a velocity there, hence is a point on the tangent
bundle for the x axis. In this form, as we will soon see, the idea general-
izes.
3. Lagrange's Equations
The calculation of planetary orbits in § I started with the usual rectangular
coordinates x and y in the orbital plane of the planet, but soon switched
to polar coordinates in that plane. It would have been convenient to have
the equations of motion written directly in terms of polar coordinates.
This can be done, and done in a way to suggest the form of such equa-
tions in any other coordinate system.
The transformation from rectangular to polar coordinates rand () reads
x= - r sin () () + r,
cos ()
(2)
j = r cos () () + sin () r ;
268 IX. Mechanics
(3)
Differentiating the equations (2) again with respect to time gives the rec-
x
tangular components and ji of the acceleration vector a. The radial and
angular components are then (Figure I)
The right hand side at first looks mysterious; for example, the term -r82
on the right in the first equation is often shifted to the le~t side, where it is
called the centrifugal force due to the angular velocity (). (Swing a horse
chestnut on a rope and you can feel the centrifugal force!) However, this
right hand side can be explained more systematically in terms of the par-
tial derivatives of the polar coor~inate kinetic energy function T of (3).
This T is a function of r, ;, and (); calculating its partial derivatives, one
finds
Fr = !£ [ aT 1 - aT,
:n
(6)
dt a; ar
T, = ~ [ (7)
We can even make these two equations look alike by subtracting aT / a()
on the right of (7)-a zero term because the kinetic energy T is in fact
independent of ().
------~o~-------------------x
d [
dt aq 1- aq
aT aT = Qq, (8)
~[£l=F
dt axi l'
j l, ... ,k. (11)
Since T does not depend on the Xi, this is a special case of the Lagrange
equation (8)
Now replace the xi by new coordinates. These new coordinates are
traditionally denoted by q's, so that the xi are given by smooth functions
hi ,
x - hi( q,
i - n)
I ... ,q, j= 1, ... ,k, (12)
270 IX. Mechanics
in terms of the new coordinates. The partial derivatives ah i /aqi are tra-
ditionally written as axi /aqi, thereby avoiding the use of a letter h for the
function, and using the letter xi to denote both the quantity x J (the coor-
dinate) and the function hi of the q's. The components of velocity
dx i / dt = Xi along a path are then, by the chain rule,
axi axi
x
'J'
= --q
aql
·1
+ +--nq,
'n
j = l , ... ,k. (13)
aq
Consequently the kinetic energy T becomes a function of both the qi's and
the i/,s. In these coordinates we will show that Newton's second law takes
the form of the Lagrange equations
u <P f
I ---7 C ---7 D ---7 R, (15)
j = l , ... ,k.
3, Lagrange's Equations 271
(Here axi / al are just the partial derivatives ah i / aqi of (13).) By these
equations, the differential form w = 'i.Fi dx i for work w on the original
manifold D becomes a differential form
cf>*w F dx 1, = ~ ~ F -axi
= ~, 1 - , dq',
, , 1 aq'
1 1 I
f </>·u
w = f u
cf>*w. (17)
(18)
If I is a time interval, with parameter t, then along each path u: I ..... C all
the coordinates qi, l and xi become, by composition with u or with cf> ·u,
functions on I (i.e., functions of t). By the chain rule for differentiating a
composite,
Since ql = dql idl, the right hand side is the same as in the second equa-
tion of (18). hence
(19)
valid along any path u. This means that the partial derivatives here are
those of functions of q 1 , ••. ,qn, q 1 , ••• , qn; they then become functions
of I alone via u.
272 IX. Mechanics
d [ aT
dt ax}
I -
aT - F
ax} - }, j I, ... , k
Qi = L F- ax} = L
J
. J aq' .
J
aaq~ = L} I aa.~
x
aa~~q + aa~x aa~~q ;
here ax} / ai/ = 0, while ax} / ai/ is ax} / aqi by (18) so that the whole
becomes
tude and longitude. Notice here that the dimension 2 of C is smaller than
that of D (hence k =I n in the formulas above) and that the configuration
space C is a manifold, included in the Euclidean space D by the inclusion
map cJ>: C -> D. Moreover the coordinates qi must be regarded as local
coordinates, valid in some chart-the longitude is not defined at the north
pole of a sphere!
In such a case there are forces Fj of "constraint" which hold the parti-
cle to the submanifold, as well as "external" forces Fj. Thus in the
Euclidean space the j th component of the total force is Fj = Fj + Fj.
But the forces of constraint act orthogonally to the submanifold C; as a
result the pullback to C of the differential form for work done by the con-
straints is cJ>*CJ.Fjdx j ) = o. More simply, "the forces of constraint do no
work". What this means is that the required generalized forces Qi on C
may be calculated by pulling back only the external forces ¥.
Lagrange's equations apply also to more general ("non-holonomic")
constraints, such as the case of a ball rolling on a rough table. At anyone
time, the ball is constrained to move on a three dimensional manifold,
with local coordinates the two rectangular coordinates of the point of con-
tact in the plane (the table) plus one angle (the angle of rotation of the
sphere about the diameter through this point of contact). However, by rol-
ling suitably forward and backwards, one may see that the ball can reach
any position in a four dimensional manifold (two coordinates in the plane
and two for rotation of the sphere). The Lagrange equation in such a case
may be derived much as in the argument above, replacing the holonomic
constraint cp: C -> D by a map cp: C X 1-> D which at each time tEl
gives the position cP t = cP ( - ,I) of the constraint to C at the time t. The
generalized forces Qi for the Lagrange equations of motion on C are then
obtained by pulling back the work w along CP*t; this is classically called a
"virtual" displacement.
Lagrange's equations have a simpler form when the forces are conser-
vative. There is then a potential V, which is a quantity defined on D such
that the forces Fj are given by the partial derivatives - 3V / 3x j . This
potential is then also a quantity defined on C (as the composite V·cJ». The
generalized forces Qi on C then are
.= ~ = _~ ~ =_
t } 3x3qi t
p. j 3x j 3V .
Q, 3x j 3qi 3qi .
In other words, they are also derivatives of the potential V in the new
coordinates. One may then introduce L = T - V, a quantity defined on
D and on C, and called the Lagrangian; the Lagrange equations for con-
servative forces then take the form, for coordinates qi on the configuration
space C,
V = - mgr cos o.
The two Lagrange equations in the coordinates 0 and </> then are
The second one integrates once to give ci> = h Isin 20, for h a constant of
integration. The first equation then takes the form
d [ aT
dt ail
1 aT
aq'
i = l , ... ,n.
4. Velocities and Tangent Bundles 275
(1)
n
a2T di/ a2T dqJ aT
- - . =Qi.
~ ail ai/ Tt + ail aqJ dt aq'
(2)
J=I
Since i/ is now a coordinate and not a derivative, we must add the equa-
tions
dqi _ .i
dt - q, i = l , ... ,n. (3)
In general, the matrix with coefficients a2T / ai/ aqJ is usually non-
singular; when this is the case, the equations (2) can be solved explicitly
for the derivatives dqJ /dt. If we label the m = 2n variables i ' ... ,ym,
the equations (2) and (3) then have the general form
d/ _
Tt- G( 1
kY,""y
m)
, k=l, ... ,m (4)
asserts that such a smooth system always has, for each given initial point,
a smooth solution defined for some sufficiently small interval of time-so
solutions exist, even though they may not be expressed in terms of
elementary functions.
The phase space for simple harmonic motion (§2) exhibited both posi-
tion and velocity. Similarly the quantities qi (position) and i/
(velocity)
are the coordinates for the phase space of the motion. A point in this
space is a point in the configuration space C plus a tangent vector at that
point. In other words, it is a point on the tangent bundle B. C to C-which
we write with the letter B, to avoid confusion with the use of the letter T
for kinetic energy. Thus the device of thinking of a point moving in such
a phase space is really equivalent to the geometric idea of using a tangent
bundle to a manifold.
Our proof that the Lagrange form of the equations is preserved under
change of base can be illuminated in these terms. Each smooth map
1> : C -'>D of manifolds carries tangent vectors (to paths) in C into tangent
vectors to D and hence induces a map B.1> : B. C -'> B. D on the tangent
bundles. On each tangent space, it is the linear map given in coordinates
by the equations (3.13). The determinant of this linear map, with entries
ax} / aqi, depending on the q's, is called the Jacobian of the transforma-
tion 1>.
In these terms, the diagram (3.15) can be expanded to represent all the
coordinates involved in the change of base (qi coordinates on C, x j on D):
qi
R .. B.C B.¢ • B. D
T
.R
(5)
qi 7T 7T xi
J - - - - -..... C - - - - - - " D - - - - - - . R.
U xi
5. Mechanics in Mathematics 277
Here the map 'IT is the projection of each tangent bundle (phase space)
onto its base, and the coordinates qi (or xi) on the base become, by com-
position with 'IT, coordinates qi.'IT on the bundle. This diagram also
displays the other functions involved. Thus the kinetic energy T, originally
given as a quantity on the initial phase space (i.e., a function T on B. D to
the reals) becomes by composition with B. cp a quantity on the new phase
space. By convention, the letter T usually stands for either (or both) of
these functions. The whole proof of §3 can then be reformulated in these
terms.
For iterated constraints one has successive smooth maps
5. Mechanics in Mathematics
We pause to emphasize the many and remarkable exchanges of ideas
from mechanics to Mathematics and back. To begin with, the very notion
of the calculus was found by Newton in order to formulate the mechanics
of planetary motion. The idea of rate of change (derivative) was needed
in order to get at the ideas of velocity and acceleration necessary for
mechanics. This in tum led to the consideration of (ordinary) differential
equations and their solutions (exact or approximate) subject to initial con-
ditions. Since these differential equations are typically of second order,
their initial conditions involved initial position and velocity and so led to
the use of first order differential equations on tangent bundles (phase
spaces) where the location of a point is determined by position and
velocity. For several particles, the configuration spaces often required
spaces of more than three dimensions. Both the configuration spaces and
the phase spaces were originally described just in terms of coordinates,
but the effective formulation of the differential equations of motion called
attention to the need for some invariance (say in the form used for the
Lagrange equations) under changes of coordinates. Once this is con-
sidered, one is really dealing with a conceptually described idea of a
smooth differentiable manifold. Although the explicit general definition
of such a manifold was not formulated till the 1930's, it was implicitly
present for at least a century before that in the minds of both geometers
and physicists.
278 IX. Mechanics
This is not all. The physicist's notion of the "differential of work" is the
Mathematician's "differential form". The total work in a process is
obtained by summing up differentials, and this amounts to the definition
of line integrals. Other such connections of ideas will appear later in this
chapter. They are discussed here as examples of the very many cases of
ideas connected from theoretical Physics to Mathematics. They cannot all
be listed in anyone book, but we can mention the recent exciting parallels
which have appeared between gauge theories (the Yang-Mills equations
in quantum field theory) and the Mathematical study of connections in
fiber bundles. The interaction between tensor analysis and relativity
theory is another such case (Weyl [1923]), as is the use of matrices and of
Hilbert spaces in quantum mechanics.
In such cases, a particular idea may arise first in Physics or first in
Mathematics or often with apparent independence in both. At issue is not
the question of which comes first, but the remarkable fact that both
come-that ideas from physical problems and from apparently pure
Mathematical speculation come together.
6. Hamilton's Principle
The form of the Lagrange equations in a configuration space C is
independent of the choice of coordinates in that space. This fact needs
explanation, and has one: Hamilton's principle asserts that the solutions
of Lagrange's equations are exactly those paths in C which "minimize"
the integral of the Lagrangian function L along the path. We consider a
conservative system, and a Lagrangian L which depends not only on posi-
tion and velocity, but also on time t. Such a "time dependent" Lagrangian
is thus a smooth function L: B. C X [~R, where [ is the time interval,
say all I with 0 < <I 1. Thus for each choice of coordinates ql , ... ,qn
on C, L appears as a f unction . L( q,
I...n ,q ' ) G'Iven two
,q I , ... , q. n,I.
fixed points a and b in the space C, we consider smooth paths from a to
b; that is, smooth functions u: [~C with u(O) = a and u( I) = b. Each
point of the path has a tangent vector at that point, so u( t) together with
these tangent vectors defines to a path UB: [~B. C in the tangent bundle;
we say that UB lifls u over 'iT because the projection 'iT: B. C ~ C will again
give u as the composite u = 'iT·UB. Also, sending each time t to itself
yieldsbyl I-> (uB.t,t)amapu#:[~B.C X [.
The integral J to be minimized is the integral of L along this path,
the second version of this formula indicates that one is simply integrating
the composite function L·u# on [. We want to compare this integral J(u)
with the same integral along other smooth paths in C with the same end-
6. Hamilton's Principle 279
points (at the same starting and ending times t = 0 and t = 1), we aim
to find those paths u for which J(u) is a minimum or at least is "station-
ary" (see below).
Now to minimize (or to maximize) a smooth function f of a real vari-
able x one first finds where the derivative of f vanishes. Let us say that f
is stationary at some value Xo if df /dx = 0 for x = Xo. Thus a function
is stationary both at a maximum and at a minimum-and also at a hor-
izontal inflection of the curve y = f(x); to distinguish these cases one
needs more than just the vanishing of the first derivative. Now we wish to
find similar stationary positions not for a function like f(x) of a number
x, but for a function J(u) of a curve u; now u can vary over the set of all
curves from a to b. Such a problem is said to belong to the calculus of
variations (§ VI. 11 ). Here "variation" refers to the variation of the path;
this we interpret as an embedding of the given path u(t) in a one-
parameter family U(t,£) of paths, where £ is a real parameter, say £ E [.
Thus U is a smooth function U: [ X [---C, such that U(t,O) = u(t) is the
given path, while all the "varied" paths have the same endpoints a and b,
so that U(O,£) = a and U(l,£) = b for all £ in [. The situation may be
pictured as in Figure I (for a two dimensional space C). For each such
family U, one can again form the integral J( U); it will be a smooth func-
tion of £. We say that the original integral J(u) is stationary if dJ(U)/d£ is
zero for £ = 0 whenever u is embedded in a smooth family U of paths.
1
U
..
,~,
1
Figure 1
280 IX. Mechanics
dJ ::::
d£
51 dL
0d£
dt:::: fl [
0
aL dq
aq d£
+ aL dq ] dt .
aq df.
dq_d[dq ]
(f; -
dt (f; .
With this substitution, the first term in the integral above can be
integrated by parts according to the familiar formula for two functions v
and w oft;
f ov -d
l dw dv ('I
dt :::: - J, -d wdt
tOt
+ (VW)t=I-(VW)t=o. (2)
In the present case v:::: aLI aq and W :::: dqldf.; also all the varied paths
have the same endpoints, so that W :::: dq I d£ is zero at t :::: 0 and at
t :::: 1. Thus we get
(3)
Now set f. :::: 0 in L. The term in the large brackets is then exactly the
left-hand side of Lagrange's equation for the function L, so its vanishing
(for all t along the path) does imply that d.J j df. :::: 0 and hence that J is
stationary, as desired.
For the converse, assume that J is stationary for every family U, pick a
smooth function TJ: I ~R with TJ(O) :::: TJ(l) :::: 0 and use these functions to
construct a variation of the path as
o :::: So
I -- -
d [aL]
dt
+ -aL
aq aq
I
TJ(t)dt. (4)
6. Hamilton's Principle 281
This holds for every such smooth function 1/. The following lemma then
shows that the term in the large brackets must vanish-in other words,
Lagrange's equation must hold.
for every smooth 1/: I -->R with 1/(0) = 1/(1) = 0, then M is identically zero.
PROOF. Suppose not, so that M(t3) =1= O-say M(t3) > O-for some t3 in
the interval I. We can assume that t3 =1= 0 and t3 =1= 1. Since M is continu-
ous, one must then have M(t) > 0 in some small interval about t3. Now
choose b to be a bump function b: I -->R: smooth, zero outside this small
interval, positive inside the interval and equal to 1 at t3. Next choose the
variation to be 1/ = bM. Then
J.oM1/dt = f
I I
0
bM2dt > 0,
The methods used in this argument are not limited to mechanics; they
also apply when the Lagrangian is replaced by other smooth functions K.
In the plane with coordinates x and y, consider all smooth curves
y = y(x) from a point (XO,Yo) to a point (xl,YI). One wishes a curve for
which an integral
S XI
K(y,y I,X )dx (6)
Xo
~ aK _ aK = 0, (7)
dx ay' ay
which for K = L is Lagrange's equation. Such problems arise early in
many connections-for example, in the problem of finding the path of
quickest descent in a vertical plane (the brachistocron) from the point
(xo,Yo) to the point (xl,YI ). For this purpose, one really wishes not just a
curve where the integral (6) is stationary, but one where it is actually a
minimum (or a maximum) among suitable comparison curves. This study
gave rise to a remarkable array of rigorous methods in the Calculus of
Variations-including methods which would apply when the curves used
are allowed to have comers or when the minimizing curve is subjected to
various kinds of "side conditions", On the other hand, in mechanics the
idea of characterizing trajectories as those paths which minimize a suit-
282 IX, Mechanics
able integral has several different forms, with different choices of integral
and "side conditions". Several of these forms go by the name "Principle
of least action". For our form, the "action" can be defined as the integral
over the trajectory of the difference between kinetic and potential energy
(the Lagrangian L = T - V). This is in contract to Newton's second law,
which describes the same trajectory in terms of a local property (a second
order differential equation). For an eloquent description of the physics of
the principle of least action we refer to Feynmann, Leighton, and Sands,
vol. II, lecture 19.
To summarize: The explanation of the invariant form of Lagrange's
equations depends on minimizing a suitable integral over families of
curves, and is intimately tied to the development of the Calculus of
Variations-which in recent times has reappeared as the theory of optimal
control.
7. Hamilton's Equations
Another, more invariant form of the equations of motion is provided by
Hamilton's equations. They use as coordinates not positions and velocities
but positions and the corresponding momenta; they assume that the
forces are conservative and so are derived from a potential function V. In
place of the Lagrangian, they use the total energy H
H=T+V
(the hamiltonian), considered as a function of position and momentum.
Consider first the case of N particles, with position determined by
3N = k coordinates xi (or qi) and the trajectories given by Newton's laws
in the form (3.9)
dPi
(2)
dt
dx',]2
2
2" ~ mi [dt
1
T = ~~~.
2 mi
7. Hamilton's Equations 283
as Pi1mi aT! api, so that, since V does not depend on the momenta,
dqi aH (3)
dt api
In other words, in the 2k dimensional momentum phase space, with
k = n and with cartesian coordinates ql, ... , qn (for position) and
PI , ... , pn (for momenta) the trajectories are the solutions of the 2n first
order differential equations
In the cartesian coordinate case this formula does give exactly the usual
components of momentum. Moreover, as in that case, we will assume that
the matrix with entries
q. J -_ q. J(q I n )
, ... ,q ,PI, ... ,pn (7)
(we will also use the partial derivatives of these functions). Thus L, origi-
nally given as a function on the velocity phase space,
L = L(q,
I
... ,q n,q·1 , ... , q.n) (8)
284 IX. Mechanics
can then also be expressed as a function of the p's and q's on the momen-
tum phase space. (Mind your p's and q's!)
Now define the Hamiltonian as the quantity
n
H -- ~
~ Pjq.j - L, (9)
j=l
to be considered via (7) as a function of the p's and q's. Its partial deriva-
tives relative to the p's are then found by the usual chain rule for the
derivative of a composite as
aH .i ~ ai/
ap ·
I
= q + j=l
~ Pj apl'
in the partial derivatives on the right, the other variables to be held con-
stant are those indicated in the functional dependencies (7) and (8). But
because of the definition (6) of the pi, the two sums here cancel; with the
second set of Lagrange's equations (5) the result becomes just
By the definition of the p;'s, the two sums again cancel; with the first of
Lagrange, it becomes
(lO)
R X.EC R
that point w. The usual dual coordinates PI , ... ,pn of this point in the
dual space are then, as in (11), the quantities
aLx
Pi = -'-i
aq
exactly as in (6). The transformation now appears in the figure (10) as the
arrow (): W -> W*; it is classically called the Legendre transformation for
the quantity L.
Now that we have the transformation () we would like it to have an
inverse ()- I-and it would be handy to have that inverse given as the
Legendre transformation for some suitable quantity K on the cotangent
bundle B·C. This means that we would like to have 1/ = aKjapi, so as a
first guess we might set K = 'ip/l But this won't be quite right, because
the i/, via ()- I, will be functions of the p's and so the partial derivatives
will be
aK a.i aL a.i aL
-
api
= i/ + i~n=
I
Pi -q- = 1/
api
+ ~ - .. -q-
i aqJ api
= i/ + - api .
This is the formula for the Hamiltonian, previously pulled out of the air
at (9). Now it is no wonder that the differential equations become simpler.
The full picture (for all the points x of the configuration space) is
~l 81 PI
R ...- - - B , C - - - - " B' C-----'--_" R
There is still another twist: In the standard case, the kinetic energy is a
Riemann metric on the configuration space. In more detail, the kinetic
energy in the N particle case is a positive definite quadratic form
8. Tricks versus Ideas 287
2T = ~'i aT ~ ·i
~ q - .i = ~ q Pi ,
i aq i
H = ~Piqi - L = 2T - (T - V) = T + V. (12)
In other words, in these cases the Hamiltonian is exactly the total energy
T + V.
Moreover an inner product < -, - > in a finite dimensional vector
space W determines a canonical isomorphism cp of W to its dual; indeed,
as in §VII.5, cp sends each vector v in W to the linear function
cpv = < v, - >: W ..... R. In the present case the inner product of vectors
v = (Vi) and w = (Wi) is written as 'i.gijViW j , where the entries gij make
up a positive definite symmetric matrix. Thus the linear function < v, - >,
inner product with v, has as its jth coordinate "2.gij v'. But, except for a
factor 2 and a change of notation (Vi for qi) this is just the quantity
_ aT _ a ~ .i·j
Pi - ail - ail ~ gijqq
longer but more revealing. First, the new coordinates Pi are physically the
components of momentum, defined by aLI ail Taken together, they are
the components of a differential, the differential dL x of the Lagrangian
along the tangent space at a point x. (It is sometimes therefore called the
"fiber derivative" of L.) The change from the i/ to the Pi is not just a jug-
gling of variables, it is thus a transformation () from the tangent space to
the cotangent space, as determined by L. The inverse transformation is
then given by a different function on the cotangent space-and this func-
tion is the Hamiltonian. Therefore the equation aH I api = dqi Idt (from
Hamilton's equations) just is a statement that H does give the inverse
transformation. Moreover, the kinetic energy is a quadratic form and
therefore an inner product on the tangent space W -and the transforma-
tion () is really just the standard isomorphism of such a space to its dual,
the cotangent space.
These remarks help to understand what is happening and they serve to
connect this development with all sorts of other mathematical ideas, in
particular, ideas from linear algebra and manifold theory. They surely
would not have been formulated, at least in this language, at the time
Hamilton first set up his equations. However, many of the ideas involved,
though not these technical formulations, might well be in the backs of the
minds of people who present the calculation simply as a quick trick.
This case is of considerable interest because of the wide further
development of Hamiltonian dynamics, hinted at below. It also implicitly
raises the question of the possible conceptual background of other tricks.
Analysis is full of ingenious changes of coordinates, clever substitutions,
and astute manipulations. In some of these cases, one can find a concep-
tual background. When so, the ideas so revealed help us to understand
what's what. We submit that this aim of understanding is a vital aspect of
Mathematics.
Understanding is not easy, as I may perhaps indicate by anecdote. It
has taken me over fifty years to understand the derivation of Hamilton's
equations. I first saw them in 1929 in a book on Theoretical Mechanics by
Sir James Jeans, a noted British applied mathematician. That came in a
course taught by Professor E. W. Brown, an expert on celestial dynamics.
He must have felt that the formal presentation by Jeans was inadequate,
for he turned away from his previous steady reliance on Jean's text and
lectured to us from his own handwritten notes. I don't really recall under-
standing the lectures, because I was more impressed with the yellowed
and dog-eared condition of those notes. The very next year I heard the
subject again, this time from Leigh Page, a professor of physics. I can tell
now what he said, because it was all much as in his book Introduction to
Theoretical Physics. He plays the tricks and pulls the Hamiltonian H out
of the air (and a touch of the Calculus of Variations); there was of course
no mention of tangent or cotangent spaces. Nearly 40 years later, I took to
lecturing myself on the subject. My then students took and published
9. The Principal Function 289
careful notes of my lectures. In these notes the trick is duly decked out
with tangent and cotangent spaces. Evidently that didn't satisfy. because
there is then the sentence "We want to understand better how the scheme
produced these equations". There follow two pages of attempts at under-
standing ending "The Hamiltonian function arises from asking the ques-
tion: When is () invertible? (This is probably not the way Hamilton found
it.)". Two years later I found this measure of understanding inadaquate. so
tried again in an article "Hamiltonian Mechanics and Geometry" in the
American Mathematical Monthly Vol. 77 (1970). pp. 57~586. Then after
a twelve year respite the understanding there purveyed seemed formal
(e.g .• in the identification of the tangent spaces to W with W itself) so I
tried again ... to get the presentation above.
The point of this cautionary tale is not the individual events. but the
difficulty of getting to the bottom of it all. Effective or tricky formal ma-
nipulations are introduced by Mathematicians who doubtless have a guid-
ing idea-but it is easier to state the manipulations than it is to formulate
the idea in words. Just as the same idea can be realized in different forms.
so can the same formal success be understood by a variety of ideas. A
perspicuous exposition of a piece of Mathematics would let the ideas
shine through the display of manipulations.
W(a,b,tl) = So L(q,q,t)dt.
II
(2)
Now Hamilton's principle compares paths with the same starting and
arrival points (not starting velocities b). Hence we wish in (2) to replace
the initial velocities b by the final positions ql. Indeed, the equations (1)
for the trajectories do determined (for t = tl) the coordinates qil of the
point of arrival. The implicit function theorem, under suitable conditions,
states that these equations (with t = tl) can be solved for the initial
velocities b i as functions
i = l , ... ,n (3)
of initial position, final position, and final time (implicitly, also functions
of initial time). Because of the form of Hamilton's principle, we wish to
express "everything" in terms of these quantities ai, q\. and tl, regarded
as 2n + I coordinates on the manifold C X C X I. For example, by
substituting the solution (3) in (1), the trajectories themselves have coordi-
nates given as functions
(4)
of these quantities and time t. Here t is the time "along" the trajectory, so
that the components of velocity are
(5)
In the remaining integral the term in brackets is just the ith term of the
Lagrange equations; hence the integral vanishes. At the start
(t = 0), gi = ai is constant, so the partial ag ilag! vanishes; at the end
(t = t]), gi = qL so agi/ag~ = 8) and the final result is simply
Hence, holding the other variables in W constant, the chain rule gives
But the partial derivative on the left is the derivative of the integral of L
with respect to its upper limit, so is just L. Also as laq\ is Pi by (7).
Hence, solving for aS/at],
The quantity on the right is by definition the negative of the value of the
hamiltonian H at the endpoint t = t]. Thus
-as + H(q,p,t) = 0
at
holds at the endpoint t] of each trajectory. Since any point on the trajec-
tory is the endpoint (of a shorter trajectory) it holds for all t. Moreover,
Pi = as laqi by (7), so the last equation reads
292 IX. Mechanics
(1)
as as
at + H(q'aq,t) =0 (2)
will have many solutions for S as a function of ql , ... , qn, and t. We may
not be able to identify which one is the principal function of our mechani-
cal system. Instead we search for a solution S which is like the principal
function, in that it depends on n parameters ai, as a function
S(a l , . . . , an,ql , ... , qn,t), and this in such a way that the n X n matrix
with the entries
i = l , ... ,n (4)
as
aqi = Pi, i = l , ... ,n. (5)
The same second partial of S (except for the order of the two
differentiations) can also be found by differentiating (4) with respect to t
n a2s aH
-
aqi
= 0, j = l , ... ,n.
i7:
~
aaJaqi api at
But we assumed above that the matrix (3) of coefficients here is non-
singular. Hence, for each i,
i = l , ... ,n.
~+H[
at q, ~l=o.
aq (8)
d n aH aqi + aH dPi
dt H( q, p) = i~ = 0,
aqi dt api dt
2 201
Line of nodes
Figure I
top (with its point fixed). These positions can be described in terms of the
angles needed to move from rectangular axes xo, Yo, Zo fixed in space
(origin 0 at the point of the top) to axes x, y, and z fixed in the top, with
the same origin. To describe the position of these axes, think of the point c
at which the axis of the top meets a large sphere centered at the origin O.
Let () be the latitude (measured from the north pole) and cp the longitude
of this point. These two angles () and cp suffice to describe the positions of
the top axis; we need a third angle 'It to specify how much it has rotated
about this axis. Indeed, the top can be brought from an original upright
position to its final position in the following three successive steps (Figure
I):
First rotate the top by the longitude cp about the vertical axis OZo in
space;
Next rotate the top by the latitude () about the top's new Ox axis;
Finally rotate the top about its new Oz axis by an angle 'It.
These three angles
o < cp < 27T, o < () < 7T and 0 < 'It < 27T
The total kinetic energy of the plate is then a sum (better an integral) of
all these contributions mr2w2. The appropriate integral of the mr2 is
known as the moment of inertia I of the plate, and the kinetic energy is
T = (l12)Iw 2
(by the way, this provides another occasion for the use of integrals). In the
case of the symmetrical top there are three such moments of inertia, In Iy
and I z , about the respective axes fixed in the top. Also at anyone instant
the top has instantaneous angular velocities wx , wyand W z about these
three axes. Because of the symmetry of the top, these quantities suffice to
determine the whole kinetic energy of the top as
(2)
(without symmetry, there could be cross terms Ixywxwy, etc.) In any case,
symmetry also gives Ix = I y .
Angular velocities may be represented by vectors. Specifically, a rota-
tion about an axis A with angular velocity w may be represented by a vec-
tor along the line of A, suitably directed, and with magnitude w. This
representation is used because it is effective in combining two angular
velocities; by calculating the composite of two rotations one can prove
that the effect of combining two such angular velocities is represented by
the sum ~f ~he two. corresponding vectors. In particular, if the angular
velocities cp, 0, and 'I' for the Euler angles are represented by vectors, one
can add these three vectors by taking their components along the axes
x, y, and z fixed in the top to get (use Figure 1)
. .
Wx = cp sin 0 cos 'I' - 0 sin '1',
Wy = cp sin 0 sin 'I' + 0 cos 'I' ,
Wz = cp cos 0 + '1'.
298 IX. Mechanics
po = Ix O,
Pi' =
.
I z ( cp cos °+ .
'1').
H = (1/2) Ip~Ix
+ p~
Iz
+ _1 [pcp
Ix ° °]21 +
-.Pi'COS
sm
Mgl cos 0. (5)
Here V = Mgl cos 0, where M is the mass of the top and I is the distance
from the point of the top along its axis to the center of mass, while g is
the acceleration of gravity. The Hamilton-Jacobi P.D.E. then has the
form
1
Ix
[as]2 as ]2 +
ao + I;1 [ a'l' 1
Ixsin20
[as
acp -
as
a'l' cos 0
12
(6)
~ - 2[ ~ + Mgl '"" 0].
To get a complete solution of this P.D.E. for S(t,O,cp, '1'), we use two
devices. The first is "separation of variables": We try to find a solution
°
which is the sum of four separate functions of one variable (t,O,cp, '1')
each. Second, we note that of these variables only appears explicitly (as
cos 0) in the P.D.E. (6), while the others (t,cp and'l') turn up only in the
denominators of partial derivatives. Such variables are said to be "ignor-
able"; the corresponding trick is to make these partial derivatives con-
stant. Thus, all told, we try a solution of the form
(7)
involving three constants ai and a function R. It will satisfy the P.D.E. (6)
if
II. The Spinning Top 299
Thus the separation of variables has replaced the P.D.E. (6) by a first
order ordinary differential equation in three parameters ai for the function
R( 0). In this equation the denominator sin 20 on the right can be rewritten
as I - cos20. The 0 appears only in terms of u = cos 0, so it is natural to
use u as a new variable. Now this equation (8) has the general form
(dRldO)2 = F(u)/(1 - u 2)
(10)
In the first equation, the integral is now that of the inverse of a square
root of a cubic polynomial in u. It is a so-called elliptic integral-not
expressible in the usual tables of integrals in terms of the elementary
functions, but much studied in classical analysis, in particular in complex
variable theory (Chapter X). Since R does not depend on cp and '1', the
matrix of second derivatives described in (10.3) is essentially
a2s
aa2 a'l'
a2s
aa3 a'l'
it thus follows that we have a complete solution in the three parameters
ai·
We are then instructed to find the trajectories (for all initial conditions)
by taking three new constants c i and solving the equations as laai = - c i
for the coordinates 0, 'I' and cp as functions of time. The first of these
equations reads
t = -c
I rO
- Ix J o ----yy;;'
du
(11)
300 IX. Mechanics
Knowing the elliptic integral, this will determine 0 in terms of time t, the
constants GI, G2 and G3 and c I; indeed -c I can be read as the initial
value of t when 0 = O. The second of these equations will then determine
<I> in terms of t, while the third gives 'I' in terms of 0, <I> and t and thus
ultimately in terms of t. The first two equations are of most interest; for
the axis of the top they provide the latitude 0 and the longitude <I> in
terms of time. A full discussion of the consequences has been presented
by Klein and Sommerfield [1965] in a famous four-volume book [1897-
1910].
One may briefly examine some of the qualitative properties of this solu-
tion. Since F( u) = ku 3 + ... is a cubic polynomial, with a positive
leading coefficient k, the values of F( u) range from - 00 to + 00; more-
over, F( u) = 0 generally has three roots, with two roots
UI and U2 (UI < U2) between -1 and + I (Figure 2). These are the only
roots of physical interest, since the substitution u = cos 0 makes
[:~ r
-1 < <u 1. Nowdtldu = IAF(u))-I!2 as above in (11) makes
= Flu)/l] ,
so the zeros of F( u) are the points where du Idt = 0 (i.e., where 0 = 0). If
one uses the positive square root of F( u) in this (ordinary) differential
equation, one gets a solution for u increasing from UI to U2 in some time
interval. Using the negative square root of F extends this solution, by
reflection, to one decreasing from U2 to UI. All told, this gives a solution
for u of the general form indicated in Figure 3. The function u (and hence
the angle 0) is periodic in t. This is the often observed situation where the
axis of the top bobs up and down in latitude 0 as the top spins (i.e., as <I>
increases ).
For the other two angular coordinates <I> and 'I' the momenta P<t> and
Pi' are given by Hamilton's equations as
dp<t> aH dpi'
dt a<l> ' dt
f(u)
u =-1
Figure 2
12. The Form of Mechanics 301
cos /I = u
Figure 3
But the Hamiltonian H does not depend on cp or on '1', so the right hand
sides here are zero. Therefore the (angular) momenta pq, and Py are con-
stants mq, and my. In other words, without friction at the point of the top
(and our formulation has indeed neglected such friction) the top, once
well started, will spin on forever.
The constant momenta mq, and my depend on the a's and the c's. How-
ever, without studying this dependence one can use these momenta to
solve the equation (4) above for the angular velocities ~ and q,. One
finds, for example, that
This gives the speed of precession of the axis of the top around the vertical
axis: As the axis of the top bobs up and down (i.e., as cos () = u varies
between limits) the speed of precession varies according to this formula.
The magic of mechanics is that it works! Necessarily complex calculations
yield realistic results.
lying ideas. They are there and they are well worth the space they would
require. We did note briefly that the use of partial differential equations is
suggested by an analogy with optics, where the light rays are also grouped
into wave fronts.
We have introduced the phase space coordinates p's and q's without
much minding their meaning. This meaning centers on the understanding
of the formal term L pJ/ which cropped up in the definition of the Ham-
iltonian. This term turns out to represent a differential form () = L Pidqi;
a form such as this is present on any even-dimensional manifold which,
like the phase space, is a cotangent bundle-and one can give a
coordinate-free description of this form. For subsequent purposes, it is
better to use the exterior derivative w = d() = L dPidl of this I-form ().
This w is a 2-form (a differential form of degree 2) with a number of use-
ful properties. A manifold with such a form is called a symplectic mani-
fold. Much of Hamiltonian mechanics can be developed most clearly on
such a manifold, with a greater freedom in the choice of coordinates and
with considerable use of certain Poisson brackets of functions defined on
the manifold. This "simplectic" approach frees mechanics from its
apparent dependence on particular coordinates p and q for momenta and
position and explains a classical process of "canonical transformation" by
which one may replace the original p's and q's by new coordinates P and
Q which no longer need represent momenta and position, but which may
be better suited to the problem at hand. This in turn is connected to the
study of differential forms and of Lie groups acting on manifolds. Here a
Lie group means a "continuous" group-such as the group of all rotations
about a point in 3-space, as parametrized, say, by those three Euler
angles. Formally, a Lie group G is a set which is both a group and a man-
ifold, in such a way that the group operations (product and inverse) are
smooth. These operations are therefore differentiable, and this leads from
each Lie group to its associated Lie algebra. This is but one of the many
connections of mechanics with "abstract" mathematics; it illustrates the
way in which abstraction is tied into application. It is to be regretted that
there is no really satisfactory modern exposition of the conceptual
development of the ideas of classical mechanics. A splendid traditional
presentation is given in L.A. Pars [1965].
(1)
By the above convention, this turns into an operator which sends each
function 'I' into
It turns out that these eigenvalues agree with the observed values of the
spectrum of hydrogen.
This equation is clearly connected to the classical wave equation; this is
one of the reasons that some early versions of quantum mechanics were
known as wave mechanics. The relationship is actually deeper, and is
13. Quantum Mechanics 305
subtly connected with the classical wave equations at the origin of the
Hamilton-Jacobi theorem. Indeed, if one replaces the function 'I' in (4)
by
(for S as in the principal function above), substitutes and takes the limit
as h ->0, the result is essentially the Hamilton-Jacobi partial differential
equation (try it with just one space coordinate). This hints at the precise
sense in which quantum mechanics has classical mechanics as a limit (as
h ->0).
For more details on this very summary sketch refer to Mackey [1978].
306 IX. Mechanics
![
Planets Projectiles Oscillations Rotations Spinning Top
K~VSY-A"L"==-----=-
wj" R.," ffCh"" Mom,," / Mom"rum
I
Newton's Laws Angular Momenta - - Conserve Momentum
I
Conserve Energy Diff. Equations - Vector Fields Coordinates
~ ~if"~
~Y'''"Eq"",~ / ~
I \ ~J, ~""dk" ".me,", Roodk"
Light Hamilton-Jacobi
RL / ~ /""'i"ce
PDE Quantum Mechanics - - - - - - - Electrons
/'
Matrices - - - - - - - - Linear Algebra
CHAPTER X
Since the square of a non-zero real number is always positive, there can
be no real square root of - 1. Inventing such a square root i and adjoin-
ing it to the real numbers, as in §IV.lO, leads to extensive and important
developments. On the one hand, the resulting complex numbers x + iy
represent well the properties of the Euclidean x - y plane and derive part
of their "reality" from the geometric reality of the plane. On the other
hand, well behaved functions f of such a complex number z = x + iY
are those functions f which have a complex derivative, and the properties
of these functions are truly remarkable. The resulting study of "complex
variables", that is, of differentiable functions of a complex number z,
leads to deep mathematical theorems with unexpected practical connec-
tions, for example to electrostatic potential and to the steady flow of fluids
as well as to aerodynamics. This chapter will introduce these concepts of
differentiation and the corresponding integrals and will indicate some of
these connections, all with a view to seeing how the apparently simple
algebraic device of inventing a "number" i with i 2 = -1 has both
geometric and analytic consequences-all a striking instance of the
remarkable interconnections of formal ideas. Just as many geometric ideas
first become apparent in plane geometry (Chapter III), so it is that basic
aspects of differentiation and integration are best exemplified by the com-
plex numbers represented in the plane.
(1)
w = f(z)/g(z) (2)
defined for all complex z except for the (finitely many) zeros of the
denominator g. For example, if a,b,c, and d are complex constants with
ad - be =1= 0, then
w = (az + b)/(cz + d) (3)
(4)
In fact, this is why the natural logarithm, 10&, defined as the inverse of
the exponential function, turns multiplication into addition, as in the use
of logarithms for the calculation of products. This formula (4) suggests
that a complex exponential should have eZ = ex+iy = eXe iy . Since we
already know the real exponential eX, this leads us to find the imaginary
exponential eiy ; in other words, to find a complex-valued function of a
real argument y which turns addition in y into multiplication. But this is
precisely what the two addition formulas for cos y and sin y will yield, as
m
(5)
I. Functions of a Complex Variable 309
This has the desired property that eZe w = e Z + w; it turns out to have all
the other desirable properties; in particular, eZ is its own (complex)
derivative.
By this definition, e Z is never zero. (How could it be; were eZ = 0, then
e + w = Oew = 0 would be always zero!). Also Z I--> e Z maps the infinite
Z
horizontal strip {y I 0 <y < 2'lT} of width 2'lT in the z-plane onto the
whole of the w-plane, omitting only the point w = O. Each horizontal line
(y = const.) becomes a ray (fJ = const.) from the origin in the w-plane;
each vertical interval (x = const.) becomes a circle. All told, each point
w =1= 0 is covered infinitely often (once from each horizontal strip of width
2'lT) by points from the z-plane. Therefore the logarithm function, defined
as the inverse to the exponential, must be "many-valued". Explicitly, write
w =1= 0 in the polar form w = s( cbs 'P + i sin 'P) and observe that the
positive real number s has a real logarithm log., s, while cos 'P and sin 'P
have period 2'lT. Therefore, for any integer k the definition (5) gives
r = I z I, fJ = arg z. (7)
and hence
310 X. Complex Analysis and Topology
(9)
By their derivation, the functions so defined agree with the usual ones
when z is real. Moreover, one may verify that cos z and sin z so con-
structed have all the basic properties of the real cosine and sine functions
(period 2'17, addition formulas, cos 2 + sin 2 = 1, etc). We shall presently
see, using the Taylor series, that these are the only definitions which
extend the real functions sin x and cos x to "good" functions of a com-
plex argument.
It remains to explicate which functions of a complex variable are
"good" -and to show that such "good" functions have convergent Taylor
series. Indeed, the familiar Maclaurin series for eX
will also work when x is a complex number z. The new definition fits!
For the moment, we do have the remarkable observation that the
geometric representation of complex numbers and the basic facts of trig-
onometry combine to replace the real variable x by a complex variable in
the functions eX, cos x, sin x, and log x. Analysis, geometry, and trig-
onometry have intertwined in a fashion which becomes clear only when
we use complex numbers.
There are many other interesting functions of complex z. For example,
the infinite series ~ 1In is known to diverge. However, if s = (J + it is a
complex number with real part (J > 1, then the related series
n
s
= eslog. n, log" n real (10)
2. Pathological Functions
Formally, a function f on the reals is defined (Chapter V) to be any set F
of ordered pairs (x,y) of real numbers containing exactly one pair (x,y)
for each real x. This definition does provide a precise way of saying that
"y depends on x" without using any vague notion of "dependence". How-
ever, it allows for many more functions than might have been intended-
including some which are truly bizarre. For example, the definition
2. Pathological Functions 311
=1 when x is irrational
=0 when x = 0
1(0) = 0, x=foO
I/(l-x) = 1 + x + x2 + x3 + (1)
3. Complex Derivatives
As in the calculus, the derivative of a function w = f(z) of a complex
variable z at a point Zo should be the limit
The idea behind this definition is the familiar one: the derivative !'(zo)
3. Complex Derivatives 313
where u and v are now two real-valued functions defined for points (x,y)
in the given open set U. If now we first take the change h in z to be real,
the existence of the limit (1) means that the expression
must have a limit as the real h approaches O. By the definition of the real
partial derivatives of the functions u and v, this limit must be
(4)
-I -
. au (xo,Yo) + -
av (XO,yo). (5)
ay ay
Since the desired complex derivative is given by either of the expressions
(4) or (5), these two expressions must agree. In other words, when
f = u + iv is holomorphic in an open set U, the real and imaginary
parts of f must both have partial derivatives in U which satisfy the equa-
tions
au av au (X,y)EU. (6)
ax ay , ay
eZ = eX ( cos y + i sin y) and also for the functions sin z and cos z as we
have defined them. Hence these functions are indeed holomorphic (in the
whole z-plane).
These equations have further consequences, both in physics and in
geometry. It will presently appear that the existence of the first derivative
1'( z) for a holomorphic function f( z) necessarily implies the existence of
all higher complex derivatives fn l( z) as well. This in turn means that the
real and imaginary parts u and v will have continuous partial derivatives
of all orders. Hence also a2v/axay = a2v/ayax and so, differentiating
both sides of the Cauchy-Riemann equations (6), one has
(7)
°
(3) of §VI.lI. Generally, a twice-differentiable function u(x,y) with
l1u = is called a harmonic function (cf. §VI.ll).
Such functions arise in theoretical physics as expressions of both gravi-
tational and electromagnetic potentials. Consider for example the electro-
static field arising from electric charges distributed uniformly along one or
several very long cylinders perpendicular to the (x,y )-plane. The resulting
potential u is then effectively constant in the coordinate perpendicular to
this plane, so it can be considered just as a function of x and y, which
then must satisfy the Laplace equation (7). In other words, every holo-
morphic function f yields such a potential u as its real part. The level
curves u( x,y) = constant are then the "equipotential" curves, while the
imaginary part v of the holomorphic function f gives curves v( x, y) =
°
constant which represent the "lines of force". At all points z = x + iy
where f'(z) =1= these lines of force are orthogonal to the equipotentials.
For a given electrostatic potential of this type, finding the right potential
function means finding a holomorphic function of z for which the real
part u( x, y) has the desired boundary values.
Next consider the geometry, by considering a function w = f( z) holo-
morphic in an open set U as a mapping of the set U by z ~ f(z) into the
w-plane. Since f, as noted above, has all higher derivatives, this will be a
smooth mapping and so must carry smooth curves and their tangents in U
into smooth curves and their tangents in the w-plane. We assert that this
mapping, at any point Zo E U where f'(zo) =1= 0, is conformal in the sense
that it preserves angles between tangents to curves at Zo. For, consider
some smooth curve in the z-plane given in terms of a parameter t as
z = get) = x(t) + iy(t) and passing through the point Zo E U when
°
t = to, so that Zo = g(to). The first derivative has g '( t) =
x '( t) + iy '( t), so when g '(to) =1= the tangent line to the curve at Zo
3. Complex Derivatives 315
makes the angle arg(g '( to» with the x-axis. The image curve in the w-
plane is given by the composite function w = h( t) = f(g( t». By the
Chain Rule for the differentiation of composite functions (which can be
seen to hold here)
h '(t) = /,(z)g'(t).
Thus if both /'(zo) -=1= 0 and g '(to) -=1= 0, the arguments are both defined
and the argument of the product / 'g' is the sum
In other words, under the mapping f the tangent line to the curve has
been rotated counterclockwise by the angle arg /,(zo). Hence, given two
parametrized curves meeting at zo, each tangent line is rotated by the
same angle arg /'(zo). Therefore the angle between the curves is
preserved, so the map z f-> f( z) is conformal, as claimed.
However, if /'(zo) = 0 then this deduction gives h '(to) = 0, so the
first derivative /' is not enough to determine the slope of the tangent line
to the image curve. At such a point Zo the mapping z f-> f( z) need not be
a conformal one. For example, the map w = z2 doubles all angles
between curves meeting at the origin!
One may also show that the stereographic projection from the
(Riemann) sphere to the complex plane is conformal, so that the
geometric viewpoint also works well on the sphere.
The preservation of angles at points where /'(zo) -=1= 0 should "really"
be viewed as a consequence of the Cauchy-Riemann equations. Indeed
the mapping z f-> f( z), when written in real coordinates as
(x,y) f-> (u(x,y),v(x,y», induces at each point (x,y) of the plane a linear
mapping of the tangent space with matrix the Jacobian
au au
ax ay
av av
ax ay
au av au av
-- ---
ax ay ay ax
This discussion indicates that the use of the idea of tangent spaces,
coming from differential geometry, is a helpful adjunct to complex vari-
able theory-even though many texts on complex variables do not make
this connection.
Holomorphic functions also apply to fluid flow. Consider the flow of a
fluid in the plane (i.e., a "laminar" flow) which is steady in the sense that
the fluid velocity at each point (x,y) is independent of time. Write the
components of this velocity as P(x,y) and Q(x,y). The component of the
velocity along the "infinitesimal" vector (dx,dy) is just the inner product
Pdx + Qdy. Hence the line integral
fcP(x,y)dx + Q(x,y)dy
around a closed curve C represents a physical quantity called the "rota-
tion" of the fluid around that curve. By the Gauss lemma (VI.10J) this
equals the integral of aQ - ap over the interior of C. We assume that
ax ay
this integral is zero for all curves C, meaning that the flow is irrotational
(intuitively, there are no vortices). This assumption implies that
aQjax = ap jay. Hence, under suitable continuity assumptions, there is a
function u(x,y) with the partial derivatives
-
au = P(x,y),
au
ax
ay = Q(x,y). (8)
Also, as in §VI.10, the flow across a curve L is just the line integral of
Qdx - Pdy along L. In particular, the total flow across a closed curve C
in the plane is
fcQ(x,y)dx - P(x,y)dy.
and in fluid flow. The deeper ideas of Mathematics do have varied con-
nections!
So much follows from the easy derivation of the Cauchy-Riemann
equations that one might be tempted to try to get even more from the
assumption that z ~ f( z) has a complex derivative-for example, more
by differentiating along some slant lines. There is no more. A theorem of
Looman and Menchoff states that if two real valued functions u(x,y) and
v(x,y) are defined and continuous for (x,y) in an open set U and have
partial derivatives which satisfy the Cauchy-Riemann equations (6) every-
where in U, then the function f( x) = u + iY is indeed holomorphic in
U.
All this is just the beginning of the remarkable properties of holo-
morphic functions f. The definition requires just the existence of a first
derivative, but this will imply the existence (and continuity) of derivatives
of all orders, as well as the presence of suitably convergent Taylor series.
In other words, holomorphic functions of a complex variable escape all
the pathologies attending ordinary functions f of a real variable. The
proof of these remarkable results depends essentially upon the use of
integration (a process already strongly suggested by the applications we
have noted). The integral of f( z) around a closed path in the complex
plane is zero, provided that f is holomorphic in an open set containing
the path and its "interior". This Cauchy integral theorem plays a central
role in complex analysis. To formulate it, we next consider integration and
then properties of the paths of integration (§5).
4. Complex Integration
The complex integral
f h
f( z)dz = f h
f dz (1)
to
I
o
318 X. Complex Analysis and Topology
The integral (1) is then defined to be the limit of this sum as n approaches
infinity and as the maximum of the ti - ti-I approaches zero. This limit
(and hence this integral) exists if h has a continuous first derivative h'( t)
for each t in I (where the derivative at the endpoints of I is to be defined
by a one-sided limit). The choice of the parameter t for the path is not
important; all that really matters is that the path is smooth and is
"directed". This can be formulated by proving that the integral of J is
independent of the choice of parameter. A new choice would consist in
taking a smooth function k:I ~ I which is monotonic increasing (t < t'
implies k(t) < k(t')) and has k(O) = 0, k(l) = 1; then the integral of J
over the path h is identical to that over the path h·k:I ~ U. (One must
here use the fact that k, continuous on the compact set I, is uniformly
continuous.)
As we will soon see, it is also convenient to be able to integrate over the
path given by the boundary of a rectangle and over other paths with
corners. Such paths are said to be piecewise differentiable-they are
described exactly as the paths obtained by piecing together ("composing")
a finite number of differentiable (i.e., smooth) paths. For such paths h the
limit above and hence the integral exists.
Except for the condition that the function J( z) be holomorphic, this
integral is just the line integral introduced in §VI.lO and motivated there
by various physical applications (for example, to work). Explicitly, if we
write z = x + iy, dz = dx + idy, and J(z) = u(x,y) + iv(x,y) in terms of
real and imaginary parts, the integral just defined is the line integral
Going further, one can substitute the function h(t) describing the path to
reduce this integral to ordinary real integrals in t, from 0 to l. All these
formulas express the age old idea: The whole is the sum of its parts!
The size of the parts yields an evident upper bound for the size of the
integral. Specifically, the function t f-> IJ(h(t)) I is continuous on a
compact set (to wit, the interval 1), hence has a maximum value M there.
On the other hand, the path of integration has a length L h , which can be
defined geometrically as the limit of the sum of the lengths of small
inscribed chords, or equivalently as the integral fal
I h '(t) I dt. Then,
looking at the sums (2) used to define the integral (1), one may prove,
much as in (VI. 4.3), that
4. Complex Integration 319
This may seem to be just a rough upper estimate of size, but it turns out
to be exceedingly handy. We will see one example of its use (in §6) but
there are many more throughout analysis, which often requires judicious
estimates of the size of all sorts of integrals.
The integral also has useful algebraic properties. It is an additive opera-
tion on holomorphic functions, in the sense that
(6)
f dz -- f d(rei(J)
'(J -- f 'dO -- 2'TTl..
I (8)
Z re'
This is surely not zero. Here dz / z = d(lo~ z), while the logarithm, as we
saw, is not a single-valued function. Indeed, if we follow a branch of
log.. z continuously around that (counterclockwise) circle, it changes to the
"next" branch by exactly 2'TTi. Moreover, j(z) = liz is not defined at
z = 0; so is not a holomorphic function inside the circle.
These calculations and many like them, suggest that the integral of a
holomorphic function j around a closed path h ought to be zero when
320 X. Complex Analysis and Topology
I
aA
P dx + Q dy = I I
A
[~-
ax
ap ]dXdY .
ay
(9)
In particular, the line integrals appearing in the formula (3) for f Jdz
become double integrals
f(udx-vdy)=-fI
h A
[~+~ldXdY,
ax ay
(10)
f
h
(vdx + udy) = II [~-
axA
~ldXdY,
ay
f h
J(z)dz = O. (11)
f hI
J(z)dz = f h2
J(z)dz. (12)
Figure 1
4. Complex Integration 321
O<t<1. (2)
This is continuous, but it isn't quite an inverse for the composite defined
in (1), because the composite h - I. h is a path that goes over from h (0) to
5. Paths in the Plane 323
h( 1) and back again at the same speed; it is not the identity path, which
ought to be the path staying put at h(O). At best, one can deform h-I·h
into this identity. Also the composite k·h is not really associative; if m is a
third path starting at the end k(l) of k, then the composite m·(k·h) fol-
lows along m for 1/2 < < t 1, while the associated composite (m·k)·h
follows m only for parameter values 3/4 < <
t 1.
What is needed is a process of deforming m·(k·h) to (m·k)·h. This is
provided by the formal concept of a homotopy. If hohl:I ~S are two
paths in the subset S e e with the same endpoints ho(O) = hI (0),
ho (1) = hI (1), then a homotopy holding endpoints fixed (a continuous
deformation) from ho to hI is a continuous map H:I X I ~S with
In other words, this homotopy H maps the unit square with coordinates
sand t into S so as to give a "continuous family" of paths hs, each start-
ing at ho (0) and ending at ho (1). The paths are parameterized by s, start-
ing from s = 0 with the first path ho and ending for s = 1 with hI, as in
Figure 1. This figure pictures the deformation of the initial path ho into
the final one, hI. For example, one may deform the closed path given by
the circle of radius 1 about the origin into another closed path given by a
circle of radius 2 by using the homotopy H(s,t) = (1 + s)e 2'/Tit, as in Fig-
ure 2.
A homotopy from ho to hI and a subsequent homotopy from hI to h2
can clearly be "composed" to give a homotopy from ho to h2' while the
"inverse" (replace s by l-s) of a homotopy ho ~ hI from ho to hI is a
homotopy from hI back to h o. Therefore the relation "h o is homotopic to
hI" is reflexive, symmetric, and transitive. Then taking the equivalence
classes for this relation produces for each path ho the homotopy class, call
it rho j, of all paths hI homotopic to ho with the same end points. Since
homotopies ho "" hI and ko '" kl will yield a homotopy ko ·h o "" kl ·h l ,
H
.. ho(O) ~---~----
Figure 1. A homotopy.
324 x. Complex Analysis and Topology
Figure 2
one can define the composite of two homotopy classes as the class
[k)·[h) = [k·h]. This composition of classes is now associative, in view of
a homotopy m·(k·h) ~ (m·k)·h which we can picture as in Figure 3. At
each horizontal level of the deformation, the paths follow h from t = 0 to
t = 1/4 + s/4, then k, then m from t =
1/2 + s/4 to t 1. =
If we consider just the closed paths (the loops) starting and ending at
one and the same point Zo these considerations will prove the
Theorem. For any subset S of C and for any point Zo ( S, the homotopy
classes of closed paths in S from Zo to Zo form a group under composition.
This group is called the Poincare fundamental group 'lTl(S,ZO). Its iden-
tity element is the class of the trivial path, constant at Zo, while the inverse
of any homotopy class [h) of a closed path h is the class [h -I), given by h
run backwards.
This fundamental group construction assigns an algebraic object-the
group 'lTl(S)-tO a geometric object, the subset SeC. For example, if S
is the unit circle S 1 in C while Zo is the point I on that circle, the
corresponding fundamental group 'IT 1(S 1, I) turns out to be the infinite
cyclic group Z. A generator of this infinite cyclic group is the path p going
once around the circle, in a counterclockwise direction. With some care,
we can prove that any closed path h from I to 1 for the circle S 1 is homo-
topic to pn, where the integer n is the net number of times h winds around
the circle in a counterclockwise direction. (This number is also called the
degree of h.) In other words, the algebraic isomorphism 'lTl(SI) == Z
expresses a geometric fact: That a continuous map S 1 ..... S 1 is determined,
h k 3/4 m
up to homotopy, by the net number of times the map winds the first circle
about the second one; i.e., by the "winding number". (Observe that a
closed path h:1 ~ S' really amounts to a map S' ~ S', where the first circle
S' is obtained by identifying the ends of the unit interval I.)
If C - {OJ is the entire complex plane with only the origin 0 deleted,
its fundamental group 'IT,(C - {OJ, I) is still the infinite cyclic group.
Indeed, an isomorphism 'IT, (C - {O}, I ) ~ 'IT, (S', I) can be constructed by
taking any closed path h in C which misses the origin and deforming it by
radial projection from the origin into a path running along the unit circle
S'. This same fundamental group also appears in the behavior of the
many-valued function lo~ z = lo~ r + «() + 2'ITk )i, which is defined
for all z = re iO ; i.e., defined in the set C - {OJ. Specifically, starting
with one branch k of this logarithm and following a closed path with
winding number n brings one to the branch k + n.
On the other hand, consider the set B in C consisting of two circles
tangent at one point (say the point I). On B there are two closed paths h
and k running once smoothly in a counterclockwise direction around each
of these two circles (see Figure 4); they give elements x = [h 1 and
y = [k 1in the fundamental group of B. A plausible process will deform
any closed path in B into a path running at a unform rate first around one
circle and again around either the same circle or the other circle and so
on. In this way any element in the fundamental group can be written as a
composite xml/lxm2/2 ... of products of integral powers of x and y.
Any group containing two elements x and y must contain all such pro-
ducts; in the present case, two such products are equal only when they
become equal by formal cancellations; for this reason this fundamental
group is known as the free group on two generators x and y. The complex
plane C - {O,I} with two points deleted has the same fundamental
group, generated by paths x and y going once counterclockwise around
or I, respectively. Then for a function f( z) such as 1/z( z - I) holo-
°
morphic in the set C - {a, I}, the Cauchy integral theorem (in one form
to be discussed) will say for two closed paths hand k that
- y
Figure 4
326 X. Complex Analysis and Topology
(7)
Figure 5
~ f(z)dz = O. (1)
Indeed, introducing the real parameter t for the path h and using the
chain rule for differentiation gives
= J(Io ~F(h(t))dt
dt
= F(h(l)) - F(h(O)) = 0,
where zero results because the path is closed (h( I) = h( 0)).
In particular, zn has a primitive zn+ I /(n + I) for each positive n, so the
Cauchy theorem does hold when f(z) is a polynomial. However, this
result still does not tell us how to construct a primitive for more general
holomorphic functions f. In order to do this, we will consider first very
special closed curves: The boundaries of rectangles, taken counterclock-
6. The Cauchy Theorem 329
f aR f(z)dz = O. (2)
of the rectangle. If the intended integral is not zero, contemplate its abso-
lute value
a = I f aR f( z )dz I . (4)
(5)
l~~II .~. I
Figure 1
330 X. Complex Analysis and Topology
with
Since the size of the rectangles approaches zero with k, there is an index k
such that all of the rectangle Rk is in the neighborhood; thus on aRk we
have the estimate
J(z) - J(c) - f'(c)(z - c) = t:(z), I t:( z) I < l} Iz - c I . (7)
Therefore the integral is
f aRk
J(z)dz = f aRk
J(c)dz + f aRk /,(c)(z - c)dz + f aRk t:(z)dz. (8)
On the right, the first two integrals are integrals of polynomials around
closed curves, hence are zero. The remaining integral of t:( z) can then be
bounded by the general formula (4.4) for the size of an integral. By (7),
<
I t:(z) I 8 diameter Rb so with (6) we have
~A
4
< 8 (diameter Rd(perimeter R k ) = 8 ~pd.
4
In other words, A <
l} pd. Since all of this holds for each positive 8, one
must have A = O. This proves the theorem.
The point of this proof is that the nested rectangles close down on one
point c. At this point one can then use the basic idea of the differential
calculus-that the first derivative gives a linear approximation to the func-
tion (here a complex linear approximation (7». For this reason the proof
works assuming just the existence cf the derivative /'(z) and not its con-
tinuity (as was needed in §4 in the argument by the Gauss lemma).
The proof also illustrates several central ideas of topology. Thus the
existence of that point c comes from the fact that in the plane any nested
sequence Cl ::J C2 ::J C3 ::J ... of closed sets has a non-trivial
6. The Cauchy Theorem 331
intersection. This basic fact, about the point-set topology of the plane, is a
statement that the points are all there (the plane is complete); it is com-
parable to the Dedekind cut axiom for the reals.
On the other hand, we have integrated over a sum
hI + h2 + h3 + h4 of the four boundaries hi = aR i of the four smaller
rectangles of Figure 1. Such sums form an abelian group, the group of
"chains"; similarly the Figure 1 suggests a 2-dimensional "chain"
RI + R2 + R3 + R4 • Since hI is the boundary hI = aRI one says that
hI is "homologous" to zero. Also we have subdivided each edge e (of the
rectangle) into two pieces e' and e"; one says that there is a "singular"
homology e ,...., e' + e" because e - e' - e" is a boundary; indeed, a
triangle can be collapsed (vertically in Figure 2) so that its boundary is
e - e' - e". This is the start of an idea of "singular" homology and of
simplicial (Le., triangular) approximation. These ideas of homology and
chains are effective in all dimensions in handling geometric problems of
algebraic topology.
We return to exploit the Cauchy-Goursat theorem. First let f( z) be a
function holomorphic in an open disc
with radius r and center the complex number c. Then the Cauchy theorem
f"f(z)dz = 0 (10)
holds for any closed piecewise differentiable path h in the disc (9),
because we can construct a primitive F for f as the integral, for W f D:
F(w) = fWf(z)dz
c
e' e e"
Figure 2
332 X. Complex Analysis and Topology
LJw c
Figure 3
The result is actually more general, since the same argument evidently
works if the disc is replaced by an open set which is convex (i.e., which
contains the line segment joining any two of its points). Indeed, the same
argument can be made to work if the disc D is replaced by any simply
connected and connected open set U. One again wishes to compare two
rectilinear, axis parallel paths in U from c to w, much as in Figure 3.
Since U is simply connected, these paths are homotopic; the trick is to use
suitable approximations to make the homotopy take place through rec-
tangular changes in path.
Another, explicitly homotopy-theoretic version of the Cauchy theorem
asserts that the integrals
1: J(z)dz = 1: J(z)dz
over the paths hand k with the same endpoints are equal when h is
homotopic to k (fixing the endpoints) in the open set U where J is holo-
morphic. There is still another, more subtle version (harder to prove): If h
is a simple closed piecewise differentiable curve in C, while J is holo-
morphic in the interior of k and continuous on h, then 1:J( z )dz = O.
All of these somewhat different formal statements are realizations of the
one idea underlying the Cauchy integral theorem. The care required for
the proof is one of the reasons why complex analysis has served as an
excellent training ground for mathematical rigor. Indeed, the proofs can
all be made rigorous (and most of them perspicuous as well). Historically,
this developed gradually, with Cauchy's foundations for the calculus,
Riemann's emphasis on geometrical insights, and the careful (- 8
methods of Weierstrass in complex analysis, calculus of variations, and
elsewhere. The better understanding and analysis of these methods has in
its turn led to extensive developments in topology.
Though this book does not present many complete proofs, we
emphasize that there is revealed here an objective notion of a rigorous
proof: Each step in the proof is a logical consequence of prior steps or
theorems or axioms (for the real numbers, or for sets, as the case may be);
7. Uniform Convergence 333
7. Uniform Convergence
The use of infinite series is the quintessence of the notion of infinitary
Mathematical operations. Thus
(1)
(2)
while for any real r < 1 an "infinite" long division yields the geometric
series which converges to 1I( l-r),
(5)
Sn = ao + a\ + ... + an,
one for each n. The series (5) is then said to converge (§IV.4) to a number
S as sum when the sequence Sn has a limit lim Sn = s. As in §IV.4, this
n ... oo
statement about a limit means that for each real t: > 0 there exists a
natural number N such that, for all n > N, I Sn - S I < t:; here the
absolute value is that of the real numbers or the complex numbers, as the
case may be. As in other cases, the idea that the limit S is something
approximated (better and better as N increases) by finite sums gets
expressed formally in terms of the logical quantifiers "for all t:" and "there
exists N". In other words, convergence involves repeated finite approxi-
mations (up to t:) to the non-existent infinite sum.
The series (5) is said to converge absolutely when the corresponding
series of absolute values,
I ao I + I a\ I + I a2 I + ...
is convergent. Then absolute convergence implies convergence, as one sees
by applying the triangle law I Z\ + Z\ I < I Z\ I + I Z2 I for absolute
values to show that the sequence Sn satisfies the Cauchy condition for con-
vergence. On the other hand, there are many convergent series such as the
alternating series
This is just the formal assertion that the same N works for all z and w at
issue. If we put it in logistic notation, where V is "for all" and :=1 stands
for "there exists", it reads
Here again what is crucial is the careful use of the logistic quantifiers Y
and :3, written in the right order (!). For later purposes (in the discus-
sion of quantifiers in the next chapter) we observe that in each quantifier
the variable at issue ranges over a definite set-the variable £ is a positive
real, n is a natural number greater than or equal to N, z is in D and so on.
In the applications of this definition, the set D (and the set E) is usually a
closed set in the topology of C-because convergence over an open set is
less likely to be uniform (as with the set of r with r < 1 for the harmonic
series).
The idea of "uniformity" which is expressed here is really the same idea
as that which appeared in the foundations of the calculus in §VI.7 with
the notion of uniform continuity of a function. There, as here, the formal
expressions of the idea of "uniform approximation" depend on the use of
the quantifiers. This idea of uniformity can be further developed; there is
for example a notion of a "uniform space" (specialized from topological
space). The idea of uniform limits was once (early 1900's) regarded as one
of the most difficult in Mathematical pedagogy. The difficulty now seems
much less.
One point of uniform convergence is that it provides for smooth opera-
tions with infinite series. For example, if each function in(z) is continuous
for z in some open set U and if ~ inez) converges uniformly in every
closed subset of U, then the sum ~ in(z) is a function of z continuous in
U. Similarly, if in the uniformly convergent series (6) each term inez, w) is
complex differentiable in z £ U for all w, so is the sum s(z,w); moreover,
this sum is the limit of the series of derivatives ainlaz. In brief, a uni-
formly convergent series of holomorphic functions may be differentiated
"term-by-term". Similarly, uniformly convergent series may be integrated
term-by-term; we do not enter into the detailed statement.
336 X. Complex Analysis and Topology
There are many tests of convergence. For example, uniform and abso-
lute convergence can be verified by comparison with a known convergent
series of positive terms, such as the geometric series (3). Specifically, in
(6), if there is some positive real r < 1 and some fixed M such that
8. Power Series
The whole subject of complex analysis can now be started over again
from the idea that a "good" function of a complex variable z should be a
function defined by a power series. Historically, this was the point of
view advocated by Weierstrass in his influential lectures in Berlin. One
may still see the contrast between the Weierstrass viewpoint and the more
geometric approach (Riemann and Klein) in the book Funktionen Theorie
by A. Hurwitz and R. Courant. First consider the elementary functions.
For real x, sin x, eX, and cos x can be calculated from their Taylor series,
and the same series converge (uniformly in compact sets) to provide good
definitions of these functions for any complex argument z:
what about convergence? The essential fact is that for each such series (1)
there is a real number R, with 0 < < R 00, such that (1) converges for
all z with I z-c I < R and diverges for I z-c I > R. (Of course if
R = 00, R is not a real number, but the intent is still clear!) This number
R is called the radius of convergence.
To show this, it suffices to consider the case when c = O. So call a real
number r > 0 "good" if all the terms I an I rn have a common bound;
that is, if there is some real number M, with I an I rn < Mr for all
natural numbers n. Then take R to be the least upper bound of all the
"good" r (this of course might give an R = (0). If I z I < R, then there
8. Power Series 337
This is just the Taylor series for f(z) expanded about the point z = c.
This argument shows that the Taylor series (3) is indeed the only possible
convergent power series at z = c for a regular analytic function. It also
shows that regular analytic functions are holomorphic. Soon we will show
(by the Cauchy integral) that holomorphic functions are necessarily regu-
lar analytic.
The strict power series point of view can be carried out elegantly and
systematically (but not in this book, which argues rather for a variety of
approaches). From the intended perspective, every good function begins
life as a power series, and so is initially defined only inside its circle of
convergence. Thus, we might begin with the function g defined by the
geometric series
g(z) = 1 + z + z2 + z3 + ... ,
338 X. Complex Analysis and Topology
but only for I z I < 1. How does one use series to recover all the rest of
the intended function 1/( 1- z)? We will see in § II that this question leads
to some fundamental ideas about "analytic continuation".
_1_ f. f( z ) dz = _I S, f( w) dz + _I S, f( z) - f( w) dz .
2'fTi h z - w 2'fTi h z - w 2'fTi h z - W
!L ~ 1
z
z - w z (z - w) I
r
(2)
;+ [; +
convergent for I w I < I z I =I- O. So choose some r' with 0 < r' < r,
where r is the radius of the circle h; then for all z on this circle and all w
with I w I < r' this series is uniformly convergent because it converges
for all these w at least as fast as the geometric series ~ (r'lr)n. Thus it
can be integrated over z term-by-term to give a convergent series in w
(3)
-
an -
I
2'l/'i
f fez)
zn+l dz . (4)
Moreover, the argument shows that the series (3) converges in any circle,
with center at w, inside the given set U where f is holomorphic.
We have thus shown that a function f holomorphic in U is necessarily
regular analytic in U! In other words, the existence of a complex deriva-
tive for f at each point of U is equivalent to the existence of a convergent
power series at each point. Henceforth we thus can (and will) drop the
term "regular analytic" (which is anyhow old fashioned).
Since the convergent series (3) is the unique power series for f around
w = 0, the nth coefficient an in (4) must be exactly the nth coefficient in
the Taylor series (8.3) for f at w = 0 (i.e., the Maclaurin series). Thus, in
(4), an = fn)(O)lnL The corresponding result holds at every point c in U,
so that
The proof has also shown that the Taylor series for a holomorphic func-
tion f at a point z = c will converge inside any circle about c which is
contained in an open set U where f is holomorphic. This leads to the
slogan: The Taylor series for a holomorphic function f converges in the
circle out to the nearest singularity (i.e., the nearest point where f is
undefined or is defined but is not holomorphic). As stated, this is a slogan
and not a theorem, because the function might have been defined in a
different way, for example as the function f with f(z) = z2 when
>
I z I < I and f( z) = I when I z I I. This function is holomorphic
in I z I < I but not in any larger circle about the origin (and not in any
open set containing z = I). Nevertheless, the Taylor series of the function
at the origin is just z2, so converges everywhere; the function "should
have been" z2 to begin with. The slogan can (with a little trouble) be
reformulated to be rigorous, but the interest is already clear. For example,
the geometric series for 1/( I - z) converges for I z I < I and for no
bigger circle about the origin precisely because z = I is the nearest singu-
larity.
There are many familiar functions holomorphic in the whole plane-for
example polynomials, eZ , sin z, and cos z. Such functions are commonly
called entire functions. In the cases just listed, however, each such entire
function (except the constant polynomials) is unbounded in the whole
plane. This is no accident; Liouville's theorem asserts that any function
f( z) holomorphic in the whole plane and bounded there is necessarily a
constant. For, consider the Taylor expansion (3) for such an f around the
origin; it converges in the whole plane. Its coefficients are given by the
formula (4). If M is a bound for f(z) in the plane, so that I f(z) I < M
for all z, the formula (4) with the path h a circle of radius r and cir-
cumference 2'lTr yields the estimate
For n > I we can let r approach infinity, to get I an I = 0 for all such n.
Thus f( w) = ao is indeed constant.
This proof illustrates again the remarkable utility of the basic estimate
(4.4) for the size of an integral.
From Liouville's theorem one readily derives a proof of the fundamen-
tal theorem of algebra (§IV.lO). For let
10. Singularities
Consider a function f with an isolated singularity; it will suffice to con-
sider such a singularity at the origin z = O. This means that the function
f is holomorphic at least in an open set D of all complex z with
o < I z I < R; such a set is a punctured disc of radius R. The Laurent
theorem for such a function states that there are then two convergent
series, one in powers of z and the other in powers of 1/ z, so that f( z), for
any z in the punctured disc, is given as a sum
00 00
few) = _1_
27Ti
fI z
fez) dz - _ I
I =s z - w 27Ti
fI z
fez) dz.
I =r z - W
342 x. Complex Analysis and Topology
For the first integral, I w/z I < 1, so we can get from it a power series in
w, much as in (2) and (3) of §9. For the second integral, I z /w I < 1, so
this yields similarly a power series in w - I; together these give (1), with z
there replaced by w. We can think of the formula (1) as a representation
of f( z ) as the sum of two functions, the first holomorphic inside
I z I < R, the second holomo~hic for I z I > 0.
An example is the function e fz. From the usual power series for eZ one
has
1
e l/z = 1 + -Z1 + - 1
- +
2!z2
+- - +
n!zn
(2)
This is a series like (1) with infinitely many non-zero coefficients a_ m with
negative index. Such a function is said to have an essential singularity at
the origin z = 0. When there are only a finite number of coefficients
a- m 'i= 0, the function f is said to have a pole. It is convenient to write
Vo(f) = -m for the negative of the order of the pole at O.
When there are no negative powers of z present in the Laurent expan-
sion the function f (if not already defined at z = 0) may be defined there
by setting f(O) = ao; it then becomes holomorphic in I z I < R; it is
said to have had a removable singularity at O. If f is defined (and holo-
morphic) near z = 0, it has a zero there if f(O) = ao = O. The smallest n
with coefficient an 'i= 0 is then the order of the zero at the origin; one
writes Vo(f) = n for this order. The operator Vo (order at z 0) has for =
functions f and g the formal properties
the "absolute value" Ilfll is "large" when the function f has a pole of high
order at 0, and "small" when f has a high order zero there.
One may also consider singularities of a function "at 00", where the
point at 00 corresponds under stereographic projection to the north pole
of the Riemann sphere. More explicitly, the point at 00 can be brought to
the origin by the conformal transformation z f-> liz which carries func-
tions holomorphic near 00 (i.e., outside some large circle) into functions
holomorphic in a circle about 0. Hence, by so transforming the Taylor
series at the origin, one sees, as in (2), that a function f holomorphic for
I z I > R and at 00 has a (unique) expansion
00
r a_I
J, --dz = (2'1Ti)a_1 ; (5)
h z
hand f a_mz -m = °
it is completely determined by the residue a _I at the pole. On the other
when m > 1. For integrals around several poles
the contributions add. This observation can be made into a proof of the
i
k
f(z)dz = 2'1Ti ~ Rescj f· (6)
h j=1
344 X. Complex Analysis and Topology
In brief, the integral is 27Ti times the sum of the residues at all poles
inside h. This famous result provides a convenient way to evaluate certain
definite integrals without finding antiderivatives. Often an integral of a
(real valued) function along some interval of the real axis can be made
part of a "contour integral" which can be evaluated by (6), usually so that
the integral along other (nonreal) parts of the contour h approach zero.
The technique can be learned from the appropriate texts; it is another
illustration of the power of the Cauchy integral theorem.
----
----
----
---- l'
----
lp
----
SI
0 p
----y
u
Another covering map is (x,y) I-> (e 21riX, e 21riy ), which covers the torus
Sl X Sl by the plane R2. Each point on the torus has a small neighbor-
hood which is the image of (denumerably many) small disjoint neighbor-
hoods from the plane, as indicated in Figure 2. This suggests the general
definition of a covering space Y: A covering map p: Y ~X between topo-
logical spaces is a continuous map such that each x E X has a neighbor-
hood V for which the inverse image p -I V is the union of disjoint open
sets Vj, on each of which p restricts to a homeomorphism Vj ~ V.
Covering maps appear prominently with Riemann surfaces. The exam-
ple of Figure I suggests the holomorphic function w I-> w5 = z, which
maps the w-plane Cw onto the z-plane C, multiplying the argument of
each complex number w =1= 0 by 5, and so covering the z-plane (except for
the origin) five times. If C - {OJ is the complex plane with the origin
removed, this map (C w - {O})~(Cz - {O}) is a covering map in the
sense of our definition. Moreover, w5 = z means that w = 5Vz, so each
of the five fifth roots of z =1= 0 appear. Thus Cw - {OJ is the Riemann
surface on which the many-valued function 5Vz becomes single-valued-
like the Riemann surface for Vz discussed in §VIII.4.
The punctured plane C - {OJ has other coverings, an n-fold covering
for wn = z and one with infinitely many sheets. Consider for instance the
exponential function
0 0 0
(x, y) 0 0 0
0 0 0
Figure 2
346 X. Complex Analysis and Topology
2rri
------E ---------
DI------
CI------
BI------
-----~o----·u --+-----.x
0' =E'
larly covers the whole z-plane, omitting O. This exponential map is a cov-
ering map C w ..... (C z - {O}), since each open disc about a point z =1= 0
and not containing the origin is covered by a denumerable number of dis-
joint open sets from the w-plane, one from each strip. Actually, this cov-
ering space Cw is both connected and simply connected. One can prove
that any connected and simply connected covering space
p: y ..... (CZ - {OD of the punctured plane is necessarily homeomorphic
(preserving the projection p) to this covering space Cw ; for this reason Cw
is called the universal covering space of the punctured plane Cz - {O}.
Moreover, the vertical translations of the w-plane by 2'lTik (sheets to
sheets) are denumerably many, and correspond to the elements of the
(infinite cyclic) fundamental group of the punctured plane. Other cover-
ings of the punctured plane correspond to subgroups of the fundamental
group, so group theory enters here also.
The plane R2 is also, as above, the universal covering space of the torus.
The inverse of the function w f-> e w of (1) is the many-valued logarithm
function w = lo&- z; it is single-valued on the surface Cw , which is thus
the Riemann surface of lo&- z; it can be defined as the manifold of all
pairs (z,w) of complex numbers which satisfy z = e W , so it is just the
manifold Cw of all complex numbers w.
This leads directly to the general definition of a Riemann surface-as a
special kind of surface-one with a complex structure. Recall from §VIII.7
that a 2-dimensional manifold (a surface) S is a topological space which is
the union of open sets U which form an atlas of charts, which are
homeomorphisms
into open sets of the plane R2 such that each overlap of domains U and
U' yields a homeomorphism. Specifically, the charts cp and cp I overlap if
the intersection U n U' of their domains is not empty; then restriction of
cp -, and cp , to maps cp ,-' on cp (U n U') = Wand <p " on U n U'
compose to give the "overlap map"
11. Riemann Surfaces 347
(3)
between two open subsets Wand W' of the Euclidean plane R2. Since this
plane can be identified with the complex plane, we define: A Riemann
surface is just a surface with an atlas in which all the overlap maps are
holomorphic (i.e., each </>'\ • </>\\ of (3) is a holomorphic function on the
open set W C C). In other words, to get a Riemann surface S take a
bunch Ui of topological spaces, each identified by a bijection </>i as in (2)
with some open set Vi of C, and paste the Ui together by suitable holo-
morphic identifications (3). Since the plane C had one complex dimension
(and real dimension 2), one may also say that a Riemann surface is just a
one-dimensional "complex manifold". Higher dimensional such manifolds
also occur, in algebraic geometry and analysis.
Two different such atlases A and A' on the same space S describe the
same Riemann surface just when each overlap map (from a chart of A to
one of A') is holomorphic. Much as for smooth surfaces, one can then
describe the Riemann surface intrinsically by using the maximal such
atlas. Also, any open subset So of a Riemann surface is itself a Riemann
surface Gust restrict each chart to U n So). Often Riemann surfaces (in
particular So) are required to be connected.
The charts serve to replace open sets of the surface S by open sets of C.
For example, they can be used to define when a complex valued function
f:S ...... C on the surface is holomorphic on S: It is so when for each chart </>
of S the composite
.p-I flU
C:=>V-:;.. U-:;.. C (4)
differentiable manifold, as in (VIII.8.3). The same idea has still other for-
mulations, as for example in manifolds modelled on Banach spaces.
The complex plane, an open disc, any open set in the complex plane,
and the Riemann sphere (with charts given by stereographic projections)
are all Riemann surfaces. So is the torus-with suitable charts. A function
holomorphic on a surface S has a first derivative at each point with
respect to any uniformizing parameter at that point-although the deriva-
tive clearly depends on the choice of uniformizing parameter.
Riemann constructed his surfaces for particular functions ! because
these surfaces could display the way such functions (even when "many
valued") depend essentially just on the location and character of the
singularities. To illustrate, we may, with Weierstrass, start with a function
!(z) defined just in some circle I z I < R by a power series };anz n con-
vergent there. At each point c' in the circle the power series determines
all the derivatives of!, and so gives the Taylor series };/n)( c ')(z -c't In!
which is convergent in some circle I z - c' I < R' about c'; in part, the
circle may extend beyond the initial disc. Continuing from the second cir-
cle one gets a sequence of Taylor series in circular discs which can be
pasted together wherever they match. This gives a connected Riemann
surface S on which the evident extension of the original power series is
defined and holomorphic. Also each point on this surface S comes from a
point (say the point c') in the complex plane, so the surface is equipped
with a holomorphic function p:S ~C; this is usually a covering map (over
part of C). For example, the Riemann surface for V;, constructed as in
§VIII.4, covers C - {O} twice.
This general process of extension, suitably formulated, is called analytic
continuation. For example, one might start with the power series
_1_ =
2 - z 2
~ [1 + ~2 + ~4 + ... + ~
2n
+ ... J (6)
I I I
res) = I + - s + - s + ... + - + (7)
2 3 nS
does converge, where n S means e S log n with real logarithm log n. If one
writes the complex number s as (J + it for (J and t real (this is the stan-
dard notation in connection with n, the series converges absolutely and
uniformly in any closed half-plane (J >C > 1; hence f is holomorphic in
the whole half-plane (J > 1. It can be continued into the whole complex
plane with the exception of a pole (with residue 1) at the point s = 1;
350 X. Complex Analysis and Topology
1+ ps
+ is
+ ... +pks
- + ...
pS
for all the primes p. This suggests Euler's formula for ns):
1
~(s) = II
p
[ 1 _ _1 1 (8)
pS
This ditty (to the tune of "Sweet Betsy from Pike") goes on to record
that Hardy has at least proved that there are infinitely many zeros on the
critical line. More seriously there are many notable applications of com-
plex analysis to number theory; they constitute the subject known as ana-
lytic number theory. For instance, this theory gives the most perspicuous
(but not the most elementary) proof of the prime number theorem: If '77'( x)
is the number of prime numbers less than or equal to x, then
Covering spaces arise with Riemann surfaces, but their utility extends
well beyond complex analysis. Any "well-behaved" connected space X (in
particular, any connected manifold) has a covering space p:X ->X which is
connected and simply connected, while the elements h of the fundamental
group of the base space X act as continuous transformations h:X ->X with
ph = p; they are then called covering transformations. This covering
space is universal, in the sense that for every other covering space q: Y->X
there is a covering space p':X -> Y with p = qp', as in
- p'
X~-7Y
x
The universal covering space X can be constructed by taking its points to
be homotopy classes of paths from a fixed base point in X, with endpoints
of the path fixed under the homotopy. Both Massey [1967] and Pontryagin
[1939[ give clear expositions of covering spaces.
The covering space serves to visualize the way the fundamental group
of a space "acts" on other objects associated with the space. It is just one
example of the interaction between the quite explicit constructions of
complex analysis and the general development of topology.
Here a cross section s of p means a map s: V ...... A such that the composite
p·s is the "identity"; that is, the inclusion of V in C as in the diagram
below.
1\\
F
A\ ~ c
c :) 'v
PROOF. Given g, the only choice for s is the mapping with s( c) = [g]c for
each c in V. This definition provides a continuous cross section; moreover
for F as above the composite F·s is, as in the diagram, just the given holo-
morphic function g.
Conversely, let s be any cross section of p over an open set V in C. It
must send each point c ( V to the germ [f]c at c of some function
f = fs,c holomorphic in some open set U containing c. The composite
g = F·s then sends c to F(sc) = f(c); it remains to show g holomorphic.
But the continuity of s at c means that some neighborhood Vo of c in C is
sent by s into the open set WI, u of germs of f. Hence, for c' in Vo,
sc' = [fk' so that gc' = fc '. Therefore g, like f, is holomorphic.
It may at first seem strange that the mere requirement of continuity of a
cross section over V produces a function g on V which is not just continu-
ous, but even holomorphic. This results because the holomorphic charac-
ter is built into the construction of A: Each point of A is the germ of some
function which is holomorphic.
Holomorphic functions on C yield a presheaf H by
H( V) = {g I g: V ...... C is holomorphic} .
o Figure 2. Airfoil.
S( U) is the set of cross sections over U. In other words, any good class of
functions can be represented simply as continuous cross sections. This
idea has applications even in set theory, as we will note in the next
chapter.
CHAPTER XI
(I) for all x, does f determine the coefficients an and b n uniquely? George
Cantor found that the answer was "yes"; indeed, he found that he did not
need to assume that the series (1) converged to f(x) for all x, but only for
a great many x. So he was led to the question: For what sets S of reals x
will
~ an cos nx + bn sin nx = 0
n
for all x ES imply that all the coefficients an and bn are zero? It turned
out that some of these sets S required elaborate descriptions. This started
him thinking about general sets; so he became the founder of set theory.
Cantor developed set theory extensively, introducing the cardinal
number, card A, of a set A (§II.8) and the ordinal number of a well-
ordered set (§II.9). Two cardinal numbers are compared by the rule that
Card A < Card B if there is an injection A ~B. The Schroeder-Bernstein
theorem asserts that Card A < Card B and Card B < Card A together
imply Card A = Card B. (For a proof, see Survey, Chapter XIII.) From
this and the axiom of choice it follows that the cardinal numbers are
linearly ordered. Using a "diagonal" argument, Cantor proved that the set
R of real numbers is not denumerable. This means that Card N <
Card R. He then raised the question: Is there any cardinal number prop-
erly between these two? The statement that there is no such is Cantor's
continuum hypothesis. (R is commonly called "the continuum".)
Bigger and bigger sets can be constructed by the power set operation P,
which assigns to each set A its power set P(A), consisting of all subsets of
A. For example, peN) has the same cardinal as R. The diagonal argument
mentioned above shows that the cardinal of P(A) exceeds that of A. For if
f: A ~P(A) were a bijection, one could define a subset S cA as
S = {x I x E A but not x E f(x)}; (2)
sets all of whose elements are themselves sets: The set P(A) of all subsets
of A; the set Zm of all congruence classes of integers, modulo m, or the
cardinal number of A. This raises the possibility of considering only those
sets whose elements x are themselves sets, the elements t of those ele-
ments x again being themselves sets, and so on. This "and so on" will be
more persuasive if we start from the bottom with the empty set, written as
0. It has only one subset, itself, so the set P( 0 ) of all subsets of 0 is just
the set { 0} whose only element is the empty set. Write these two sets as
Now the one-element set { 0} has two subsets, itself and the empty set, so
its power set is a two-element set
This set in its turn has four different subsets, so its power set is
(6)
here {{ 0 }} of course denotes the set whose only element is the set whose
only element is the empty set 0. Iterating this process produces sets Rn
with 2n elements. The union of all these sets Rn is a denumerable set Rw;
here w designates the first infinite ordinal number.
The construction started in (4), (5), and (6) can be formalized as a
recursion over the ordinal numbers. Here an ordinal y =1= 0 which is not
the successor y = a + I of some ordinal a is called a limit ordinal; w is
the first such. The definition of the sets Ra by recursion on the ordinal
number a then reads
Ro = 0, (7)
where the last clause applies when y is a limit ordinal. Since the ordinal
numbers a are well-ordered (i.e., each non-empty set of ordinals has a
first element), one can define the rank of any set x in this hierarchy (7) to
be the first ordinal a with x E: Ra. All the elements of x then have
smaller rank, so that no set x can be a member of itself. Each set x in the
hierarchy has only sets as elements, and can be pictured as a "tree" which
presents the elements of its elements. . .. For example, the set (6) above
{F
is the following tree
~{0.
o {0}
I
o
(8)
o
362 XI. Sets, Logic, and Categories
where <=> is short for "if and only if' while => is short for
"implies". Then the equality of sets is defined by
With (l), this amounts to stating that two sets are equal if and only if they
have the same elements-a view basic to the idea of a set.
The Zermelo axioms now come in the following list:
Extensionality.
This states that equals may be substituted for equals on the left in any
statement y E z. The definition (2) of equality has already provided for
substitution on the right in t E y:
Pairing. For any sets x and y there is a set u so that, for all t,
By (2) again, any two such sets u are equal, so one writes u {x,y}. This
axiom provides also for a singleton: A set {x} = {x,x} with just one ele-
ment x. It also produces ordered pairs <x,y> = {{x},{x,y}} as
explained in (VAA).
Power set. For any set x there is a set u with, for all s,
This set u, again unique, is the union of all the members of x, usually
written u:
x
In particular, this will yield all the finite ordinal numbers, as in the
definition suggested in (11.9.5):
Each ordinal is thus the set of all smaller ordinals, so the first infinite ordi-
nal w should be the set of all finite ordinals. Hence the existence of at
least one infinite set can be insured by requiring that there be a set con-
taining this w; that is, a set containing this 0 and closed under the opera-
tor s of (7).
only for the construction of subsets of a given set u, and not, like R, for
subsets of the universe of all possible sets. This formulation is also in
accord with the way sets are constructed in the cumulative hierarchy of § I
above.
Actually, this axiom (9) is really an infinite family of axioms, one for
each property P; it is thus called an axiom scheme.
These axioms are essentially those formulated by Zermelo in 1908. The
last axiom (9) is not in satisfactory form, because it refers not just to the
one primitive notion E, but also to a "property". With Skolem and
Fraenkel, one must then append here the explanation that a "property" P
of x means something specified by an explicit set-theoretic formula; in
other words, something formulated only in terms of x, membership and
the usual logical connectives. Thus for example, "For all t and s, t E x
and SEX imply t c s" is a property of x.
To see what this axiom means, observe first that it shows that there can
be no set x with x E x (no set is a member of itself). For if this were so,
the singleton set {x} is non-empty, but its only element, to wit x, has an
element (x again) in common with x. Similarly, the axiom insures that
never x EyE x. For if this were so, {x,y} would be a non-empty set
which has elements (x or y) in common with each of its elements. Also,
once we have defined sequences Yn of sets, the axiom of regularity will
insure that there is no infinite "descending" sequence with
for then the set x = {YO,YI, ... } violates this axiom. For this reason, the
axiom of regularity is also called the axiom of foundation: Given a set Yo,
there is no infinite regress of elements of its elements.
The intuitive intent is that one should be able to "see" that all these
axioms appear to be valid for the cumulative hierarchy of sets Ra
described in § l. For example, in the case of the regularity axiom, each
non-empty set x has some ordinal rank lX, while all the elements of x have
smaller rank. Among the ordinal ranks of elements t E x there is (by
well-ordering of ordinals) a first ordinal {3. Any element w E x of this
rank {3 has for x the property required in the regularity axiom (II). Put
differently, an infinite descending sequence of sets like (12) would produce
an infinite descending sequence of ordinals. This intuitive argument
depends on a prior intuitive notion of ordinal.
366 XL Sets, Logic, and Categories
0<1<2<3< .. · <w<w+l< .. ·
w = {0,1,2,3,4, ... }.
On inspection, each of these sets is "transitive". Here a set t is called tran-
sitive when x EyE t implies x E t; in other words, every element y of
t is also a subset of t.
Next the formal development. With von Neumann, one can now define
an ordinal to be a transitive set a such that every element t E a is also
transitive. (This is the case for wand the finite ordinals n.) It then follows
from the axioms that the elements of an ordinal a are linearly ordered by
the membership relation E. With the regularity axiom one can then
prove that this linearly ordered set a is well-ordered-i.e., that each non-
empty subset x C a has a first element in the order of elements of a. For,
by regularity, there is some element y E x with no element in common
with x. In particular, z < y for some z E a means that z E y and so by
2. Axiomatic Set Theory 367
while P ""' P' means that P and P' are ordinally isomorphic and V is
some "type" or "universe". (Some sort of universe is needed to make the
comprehension axiom apply here; we cannot simply take all sets P' with
P ""' P'.) Whatever the universe, if it is big enough it will contain a von
Neumann ordinal IX with P ""' IX, so the older ordinal, or P, is realized or
"represented" by IX.
(One can describe a universe as a set V such that the elements of V by
themselves satisfy the Zermelo axioms, but that requires an added
axiom-there exists a universe-and might (and has!) led to the use of a
whole succession of universes!)
The naive definition of a cardinal number (§II.8) in terms of the rela-
tion of cardinal equivalence reads
of sets which, by the diagonal argument, get larger and larger. To get a
still larger set one would need to form the union of all these sets pn w for
all natural numbers n, and then apply P again. But one cannot form this
union by the axioms unless one knows that there is a set consisting of all
the sets pnw; in other words, that the image of w under the function
n ~ pnw, for nEw, is a set. The final axiom (replacement) does insure
that the image of any set under a function is again a set; here a "func-
tion" x ..... y is to be described by a property R(x,y) specified by a
formula-much as in the case of the comprehension axiom. This again is
an axiom scheme (one axiom for each property)!
Replacement. If R(x,y) is a formula stated in terms of the sets x,y and the
membership relation while u is a set such that for each x E u there is
exactly one y with R(x,y), then there is a set consisting of exactly all these
y.
This set is of course the image of u under the function defined by R.
The replacement axiom scheme implies the comprehension axiom
scheme. When replacement is adjoined to the Zermelo axioms, one gets
the Zermelo-Fraenkel (ZF) axioms for set theory. Normally, one also
includes the axiom of choice, to get ZFC.
Consider a statement p, like the first two here, which involves just one
"variable" x. It corresponds to a set P; namely, to the set P of all those
3. The Propositional Calculus 369
These tables do reflect the intended meanings. From the third column
we can read off the formula p /\ q as a function
on the product with itself of the two element set {±} of truth values;
::;> and V are also such functions, while 7 is a function of one variable.
There are also composite functions, represented by "formulas".
Specifically, certain strings of letters p, q, r, connectives, and parentheses
are called formulas (sometimes weI/formed formulas). They are defined by
recursion as follows
(i) A letter p,q,r, ... is a formula F;
(ii) If F and G are formulas, so are F /\ G, F V G and F ::;> G;
(iii) If F is a formula, so is 7F.
Formulas arise only in these ways. (Of course, we also insert parentheses,
as in F ::;> (G /\ H), to make a formula unambiguously readable.) By
370 XI. Sets, Logic, and Categories
using the truth tables, any formula F involving n letters PI , ... ,pn is
r
represented by a composite function {± -> {±}. Then a formula F is
called a tautology when all the values of the corresponding function are
+. For example, the formula (p 1\ q) ~ (p V q) is a tautology.
Since p <=> q can be defined in terms of the other connectives, we
can also treat formulas involving" <=> ". Some useful tautologies are
The first is the famous definition of material implication: "p implies q"
means "either not p or q". In particular, a false statement implies any
other statement! Philosophers who find this conclusion uncomfortable
have introduced other versions of propositional logic, such as strict impli-
cation or relevance logic. As yet, these variants have not proved helpful in
the formulation of Mathematics.
Since r <=> s is true precisely when rand s are both true or when
rand s are both false, the tautology (1) states that p ~ q, as a function
of truth values, is identical to the function 7p V q; in other words, this
provides a possible definition of ~ in terms of 7 and V. Similarly (2),
which is one of the de Morgan laws, can be used to define 1\ in terms of
7 and V. These two latter connectives thus would suffice for this calculus.
(It is even possible to replace these two by a single binary connective, but
the result is not illuminating.)
The propositional connectives correspond (according to the first table)
to Boolean operations on sets; more exactly, on subsets of some fixed set
V. In this correspondence, a tautology becomes a Boolean function with
value V for all arguments, while an equivalence (a tautology F <=> G)
becomes an identity in Boolean algebra. Any complete system of axioms
(there are many) for Boolean algebra can be turned into a system of
axioms (with rules of inference) for the propositional calculus. We do not
need such a system, since the consequences of the axioms can be
described directly and simply as "all tautologies".
Thus everything about propositions p turns on truth-but this is not yet
a complete explication of that weighty word!
ranging over the domain. Formulas of the language are built up from
variables and constants (0, I, ... ) using some primitive predicates and
primitive function symbols; we consider as an example the case of a
language L(B,+) with one binary predicate B(x,y) and one binary func-
tion symbol +. On this basis, one first defines terms:
Terms are obtained only in this way. From the terms one builds up for-
mulas by a similar recursion:
(iii) If B is a binary predicate symbol while sand t are terms, then B(s,t)
is a formula;
(iv) If F is a formula, so is 7F;
(v) If F and G are formulas, so is F V G;
(vi) If x is a variable while F is a formula, so are (V x)F and (:3 x )F.
Formulas are obtained only in these ways. The last clause (vi) is usually
employed when F is a formula involving the variable x. This variable is
said to be bound in the formulas (V x)F and (:3 x)F and in any further
formula built up from these. Variables which are not bound are said to be
free. The intent is that a formula F with free variables y,z,w, ... is a state-
ment of some property of the elements of the domain represented by
these free variables, while the property involves other "bound" variables
ranging over the whole domain. For example in a formula such as
[(Vx)B(x,y)] V B(x,z) the first x is bound while the last is free; it is
usually good practise to rename the latter (say as w).
The formulas as defined above use only the two propositional connec-
tives 7 and V; others can be introduced by their definitions, as indicated
in the last section. We have described a language L(B, + ) with just one
binary predicate B and one binary operation +; if B were <, it could be
called a "language of arithmetic". There are languages with several predi-
cates, unary, binary or n-ary; there can also be more function symbols,
unary, binary, ternary, ... or even none. It is a striking observation about
actual Mathematical systems that the primitive functions involved are
almost all unary or binary (successor or product), while the primitive
predicates B are usually unary or binary. There are occasional exceptions,
such as the ternary betweenness relation in the foundations of geometry
(Chapter III); even then one gets rid of "betweenness" as quickly as possi-
ble by defining it in terms of the "less than" relation for real number
coordinates. On the other hand, there is clearly no way in which binary
relations or functions could all be replaced by unary ones. In philosophi-
cal terms, everything cannot be reduced to properties (unary predicates)
of things, as was the apparent intent of Aristoteljan logic.
372 XI. Sets, Logic, and Categories
In most Mathematical discussions (as the reader of this book has been
asked to note) the quantifiers used are all bounded ones. Quantifiers
without a bound crop up chiefly in the higher reaches of set theory (e.g.,
quantifiers over all ordinals).
With the exact description of the set-theoretic language L( E), we now
have an explicit description of the idea of a "property", as this arose in
the Comprehension and Replacement axiom schemes of ZF. A property
of x and y means a formula of the language L( E) with just two free
variables x and y. In using the comprehension axiom for ordinary parts
of Mathematics it usually suffices to consider only formulas with bounded
quantifiers. This may be stated as follows:
5. The Predicate Calculus 373
Bounded Comprehension (BQ). For any set u and any formula F(x) of the
language of set theory in which all quantifiers are bounded, there is a set s
with
For most Mathematics, the appropriate axioms for set theory seem to
be ZBQC: The Zermelo axioms with comprehension replaced by bounded
comprehension and with choice added. This approach (which has
nowhere been developed in detail) is a pragmatic choice of axioms: They
suffice (but not for definitions like (2.11 )).
Some set theorists assert that they can "see" that the ZFC axioms are
"true" for the sets described in the cumulative hierarchy of §1. Note that
in this hierarchy at stage a + 1 each new set S E P( Ra) is a subset of
Ra, so consists of elements already present in Ra. But in the description of
the hierarchy there is no real explanation of what is meant by a "subset".
If each subset of Ra is to be described by giving a property of its ele-
ments, it seems plausible that the first order formula expressing this pro-
perty should refer only to those sets already at hand and hence should
involve only quantifiers over Ra. On this ground, bounded comprehension
is much easier to "see" in the hierarchy than unbounded comprehension.
(Much the same argument would apply for "bounded" replacement.) This
again illustrates the observation that an intuitive idea (here, the cumula-
tive hierarchy) has more than one formal realization.
Within ZFC set theory, the cumulative hierarchy can be formally con-
structed. Before axioms are at hand, the hierarchy is a Platonic myth,
clearly visible only to those with a sixth sense for sets.
F F=>G
(4)
G
while the generalization rules are the figures (x not free in F)
The reader may wish to construct his own formal proof of the reverse
implication, using (ivb) and the other axiom scheme of (ii). These illustra-
tions may support the conviction that such formal proofs are always possi-
ble and always pedantic. There have been elaborate demonstrations (for
example, Principia Mathematica in three volumes by Whitehead and
Russell) which show that the usual proofs of Mathematics can be forced
into this mold.
A predicate calculus with equality is often used. This means that there
is a binary predicate "=" together with appropriate axioms: The
reflexive, symmetric, and transitive laws for equality, plus the axiom stat-
ing that "equals may be substituted for equals anywhere". It suffices to
require the latter for the primitive predicates and function symbols, as in
Peano axioms (i) and (ii) of §1I.2, stating that 0 is a number and that the
successor of a number is a number. Next come the axioms
and the aXIOm scheme for induction: For every formula F( n) of the
language, with one free variable n,
In this case, the strength of the induction axiom depends on the set
theory-on how many subsets T are there provided. With ZF or ZBQ set
theory one can then prove the recursion theorem of §II.2 and from this
show that any two such sets N are isomorphic. Located in this way within
a set theory, Peano arithmetic has no non-standard models.
6. Precision and Understanding 377
could be done, and has been done, by using triangles instead. However,
the rectangles are a clear vehicle for exposing the underlying reason: that
a holomorphic function has at each point the same derivative in every
direction; in particular, in the two directions exhibited along axis-parallel
rectangular paths used to construct a primitive for the given function. This
is the underlying idea-but it is not enough to wave one's hands about the
idea. Reduced to formal statement, it becomes a proof. Proofs serve both
to convince and to explain-and they should be so presented. Rigorous
proofs serve to avoid errors-an important function.
But proofs have their limitations, as we will now see.
7, V, 3, 0, +, X, x, y, z
About T Within T
Form (m) = "m is the code of a F(x)
formula",
Prf(n) = "n is the code of a proof', P(y)
Dem(m,n) = Form(m) and Prf(n) and D(x,y)
the proof ends at the
formula with code m.
Sub(m,k) = The code of the following S(x,y) (a term)
formula "Take the formula
(if any) with code m; in it
replace the variable w by
the kth numeral".
Arg( m,n) = Dem(Sub( m,m ),x ) A(x,y) = D(S(x,x),y)
For all n, not Dem(Sub( h,h ),n) (v y)7A(w,y) codep
Now the last formula in the right-hand column, like every formula of the
system, has a code p; let Jf be the corresponding numeral. Plug in p for h
and Jf for w to continue the table as follows:
Call the last formula on the right G. It is obtained by taking the formula
just above it in the table, with code p, and substituting therein the
numeral Jf for the variable w. Hence by the definition of Sub above, G is
the formula with code Sub(p,p). However, its translation on the left
states "There is no proof for Sub(p,p)"; that is, there is no proof for G. In
other words, G when translated back to the left says "There is no proof
for me" (in the system).
Our further examination of this startling situation will use the fact that
the translations above (left to right) are "good" in the following sense:
numbers iff and if there is in T a formal proof for the translation A(fff,if).
Moreover, if Arg(m,n) fails, then there is a formal proof for 7A(fff,if).
This is to apply only to statements such as Arg( m,n) in which quantifiers
like (V k) or (3 k) have numerical bounds; say all k < m.
Arg(p,k) = Dem(Sub(p,p),k)
does not hold.
For suppose it did. This displayed informal statement says that the
number k is the code for a proof of G, whose code is Sub(p,p). In other
words, there is a formal proof (code k) for G. But by the translation prin-
ciple applied to our informal statement there is also a formal proof for
A(p,k) and hence a proof for (3y)A(p,y)-namely, k is the y claimed to
exist. But (3y)A(p,y) is (classically) equivalent to 7( V y)7A(jJ,y),
which is 7G. So there is also a proof of 7G. Contradiction. Since the sys-
tem is consistent, there can be no such contradiction.
Given this lemma, suppose there is a proof for G. That proof has some
code k. Since G has the code Sub(p,p) this means that Dem(Sub(p,p ),k).
By the lemma, this is not so.
On the other hand, suppose there were a proof for 7G; that is, for
7(vy)7A(p,y) and hence for its equivalent (3y)A(p,y). Now accord-
ing to the lemma, for each number k, Dem(Sub(p,p),k) does not hold.
Hence, by the second half of the translation principle, there is a proof for
the negation of the translation; that is, for 7A(p,k) where k is any
numeral. In other words, for the numerals 0, 0', 0", and so on we have
proofs of
7A(p;0), 7A(p;0 '), 7A(p;0") , ... ,
while at the same time we have a proof for 7G; that is of
(3 y)A(p;y).
In other words, this y, proved to exist, can be none of the numerals
0,0 ',0", ... of the formal system. This would be a bizarre situation. It is
excluded if we assume that the formal system Tis w consistent: That there
is no formula H(y) in one free variable and proof of all of
(The above case is that where H(y) = 7A(p;y).) The "strong con-
sistency" mentioned above in our first discussion of Godel's first incom-
pleteness theorem is precisely this notion of w-consistency. (There is a
subtle refinement of the construction of G which renders this stronger con-
sistency assumption unnecessary.) This completes our sketch of Godel's
first incompleteness theorem.
8. Independence Results 383
CT :::;> G.
8. Independence Results
For Zermelo-Fraenkel set theory many sentences-including a number of
interesting ones-are independent of the axioms and so remain undecided
and undecideable. One such is the continuum hypotheses CH, already
384 XI. Sets, Logic, and Categories
S = {x I x E B 1\ F(x)}
for some such formula F( x), with just one free variable x, in the enlarged
language. (Each such formula will involve only a finite number of the
added constants b '.) The collection D(B) of all these subsets S c B is the
set of sets defineable from B. It is a subset of the power set PCB).
The constructible hierarchy La is now defined by recursion over the
ordinals a as
These primitive terms are subjected to two axioms. The associative law:
Given arrows
The identity law for the identity arrows asserts that for each f: A ->B one
has
~l~
(identities omitted).
o~ / '
--3~
The homotopy classes of paths in a topological space (§X.S) form a
category under composition of paths. In this category every arrow is inver-
tible; that is, has a two-sided inverse. A category with this property is
called a groupoid.
Big categories arise with each new description of Mathematical objects.
Given a type of Mathematical structure, what are the corresponding mor-
ph isms preserving these structures? They are the arrows of a category. The
slogan "What are the morphisms?" applies to categories themselves. A
388 XI. Sets, Logic, and Categories
C FC
(7)
have the same effect on each individual point of S I and so (as functions)
are the same sets of ordered pairs. However, if we apply the fundamental
grouf functor 'IT I (with any chosen point of S I as base point) we get
'lT1(S ) = Z and
'lT1(lS) = 1z: 'lT1(SI) -+ 'lT1(SI) = Z,
the identity, while for the inclusion j, 'IT I (j) collapses Z to the unit ele-
ment since 'IT I(D) is the trivial one-point group. Thus the functor 'lT1 has
very different effects on the two arrows of (7); these two functions differ
only in their codomain, but that makes a difference of substantial topolog-
ical interest!
Some functors turn the arrows around. For a vector space W over a
field K the dual space (§VIA) W* consists of all the linear transformations
f: W -'>K. Hence if the arrow T: V -'> W is a linear transformation of
spaces each vector f in W* yields by composition with T a vector
I T: V -'> Kin V*. If we write T*f for this vector I T, while if S: U -'> V is
another linear transformation, then
(T.S)*f = (S*· T*)f,
and also lw*f = f. Thus (T·S)* = S*·T*, so that composition of linear
transformations is reversed under the operation "take the dual". This
operation, written * or V ~ V*, T ~ T*, is thus what is called a contra-
variant functor (on the category of vector spaces over K). A formal
definition of contravariance can be avoided by introducing to each
category C an opposite category COP. This has the same objects as does C
and for each arrow f: A-'>B of C an arrow fOP: B-'>A in COp, with the
same identities and a composition defined (when possible) by
jP.goP = (g.f)°P. A "contravariant" functor G on a category C to a
category D is then just an (ordinary or "covariant") functor Cop -'>D. For
example, the cotangent bundle construction T*M on manifolds M is a
contravariant functor, because a smooth mapping M -'> M' takes each
cotangent vector on M' (determined by some smooth function on M ')
back into a cotangent vector on M. .
In any category C two objects A and B determine the collection
"v
v----~· v'*
r r**
(2)
w----~.. w**
"w
in other words, we get the same result when K is applied before or after T.
We will adopt this property (2) as the general condition under which a
transformation such as K between functors is a morphism of functors (in
the standard terminology, a "natural transformation").
If F, G are two functors C --> D a natural transformation
'T: F ~ G
TA
A FA---~"CA
J FJ CJ
(3)
B FB---··CB
commutes for every arrow f: A-->B ofC, in other words Gj-'TA = 'TBoFf
for all f. One can usefully enlarge the picture implicit in this diagram.
The given functors F and G carry each commutative diagram
(A -->B-->C--> ... ) in C into a commutative diagram in D. The natural
(4)
transformation 'T "translates" the first picture on the left into the second,
on the right, so that the whole diagram is commutative.
For example, if H is the functor giving the set H( U) of all holomorphic
functions g on the open set U C C, then the operation g ~ dg/dz, take
the first derivative, is a natural transformation H --> H between functors on
the category open(C).
392 XI. Sets, Logic, and Categories
11. Universals
The language of categories allows one to give a common description of
certain standard constructions, such as products and pullbacks, which are
used in many different explicit categories. This common description usu-
ally has the form: A certain arrow to or from the constructed object is
"universal" among all possible arrows with some desired property.
First consider products. The "cartesian" product of two sets X and Y is
normally described as are the cartesian coordinates in the plane: It is the
set of all pairs <x,y> of elements x E X and y E Y. This description at
once yields two functions, the projections p: <x,y> I-> x and
q: <x,y> I-> y on the factors X and Y, as in the usual geometric figure
of the product of two intervals:
XXy
(1)
x ________ p~l_______
This also allows the construction of functions into the product X X Y
from separate functions into X and Y. Thus, given two functions
such that for each pair of arrows f: Z ..... X, g:Z ..... Y with a common
domain Z there is a unique arrow h: Z ..... P with p·h = J and q·h = g. In
other words, there is a unique h which makes the following diagram com-
mutative:
Z
I
I
I
I
I
I
f Ih g
(4)
I
I
•
I
X 4 P • y
P q
The arrows p and q are the projections of the product, while a category is
said to have p(oducts if there is such a product diagram (3) for every pair
of objects of the category.
This definition has the form: Among the diagrams of the shape (3) for
given ends X and Y, the product diagram (3) is "universal" -every other
such diagram (with middle term Z) maps into the product diagram (3) via
a unique arrow h. As a result, this property determines the product P and
its projections p and q "up to an isomorphism". This means that if there
is in C another diagram
p' q'
X ~ P' -7 Y (5)
which is also a "product" of the given objects X and Y in this sense, then
there is an invertible arrow 0: P' ..... P with pO = p' and qO = q'. (The
universal property of P gives 0; that of P' gives a two sided inverse for 0.)
For the category of groups, we already noted in (V.S.2) that the usual
direct product of two groups is a product in this universal sense. The same
construction by pairs of elements yields the product for abelian groups.
Similarly the (categorical) product V X W of two vector spaces over the
same field consists of pairs <v,w> of vectors with termwise operations;
this product is commonly called a "direct sum". For two topological
spaces X and Y, the set-theoretical product X X Y has a topology in
which the open sets are arbitrary unions of products U X V with U and
V open in X and Y, respectively. This definition of the product topology is
calculated precisely to produce a topology in which both of the projec-
tions p and q are continuous. Thus Top has products.
394 XI. Sets, Logic, and Categories
x u
l'
~Z
(6)
If the objects are sets, take the set P of all those pairs <x,y> with x E X
and y E Y for which u( x) = v(y); thus P has the evident projections into
X and Y. If the objects are spaces, take the same pairs <x,y>; they form
a subset of the product space X X Y and inherit from it the "subset"
topology. In either category, P fills out a commutative square
p -----'q---~ y
(7)
Pj " j'
x-----·z
with the following universal property: For every other commutative
square
g
w • y
Ij
x u
•z
j' (8)
w
\~\
\
g
I • y
fx/~
(9)
.1'
In other words, the object P is the universal way of completing the corner
(6) to a square. In any category, a P with this universal property is called
a pullback or a fibered product, written P = X Xz Y. (This notation is
incomplete, since P is not determined just by the objects X, Y, and Z, but
also by the arrows u and v.) One also says that W is the pullback of Y
along u, or is obtained by "changing the base" from Z to X.
Pullbacks are omnipresent. If u and v are inclusions of subsets of Z, the
pullback P is their intersection. If v: T*M -'>M is a tangent bundle, while
u: N -'> M is the inclusion of a sub manifold, the pullback is the tangent
bundle to N.
An object T in a category C is called a terminal object if to every object
X of the category there is a unique arrow X -'> T. As for products, it fol-
lows from this universal property that a terminal object T in C is unique,
up to an isomorphism. In the category of sets, anyone-point set is termi-
nal (and of course any two such sets are isomorphic). Similarly the group
with only one element and the topological space with only one point are
terminal in their respective categories. Given a terminal object T, products
can be constructed from pullbacks, because the product X X Y is just the
pullback X X T Y formed from the necessarily unique arrows X -'> T, Y -'> T.
In fact, given the (binary) pullbacks P and a terminal object, one can con-
struct pullbacks P for all sorts of fancier finite diagrams
j
X ---?- Q <E- Y (10)
x • Q • y
I
I
I
:h
I (11)
f I
g
I
I
I
t
w
The arrows i and j are called the injections of the coproduct. The coprod-
uct of two sets Sand T is their disjoint union, as described in (11.8.1). The
coproduct of two vector spaces V and W is their product (a direct sum)
with the evident inclusions. The coproduct of two groups is their free pro-
duct (§V.8). In particular, the free product of two infinite cyclic groups is
the free group on two generators, which appears in topology as the funda-
mental group of the space consisting of two tangent circles (Fig. X.5.4).
The dual of a pullback can also be formed in these categories; it is
called a pushout-or, in the case of groups, an amalgamated product.
We now tum to the construction of "exponentials". If m and n are
natural numbers, the power mn can be described as the number of
different functions on a set of n things to a set of m things. Hence we
write Z Y for the exponential set or the function set consisting of all func-
tions f: Y ...... Z:
f: X X Y ---?- Z
(14)
F: X ---?- ZY
I I. Universals 397
which is reversible: For each F as in the lower line one may construct the
corresponding f in the upper line by using the same formula (13), read
backwards.
In order to describe the bijection f I-> F as a universal process, we need
to express formally the left hand side of (l3) in which the function
t = F(x) is "evaluated" at the argument y. This uses the evaluation func-
tion e,
zY x Y
Fx·1/
e .. Z
(16)
XXY
commutative. As an equation, this reads f = e( F xI), with 1 = 1Y the
identity arrow of Y. This equation is just a different statement of (l3), not
using elements.
This form of the description of exponentials applies to any category C
which has products. The category is said to "have exponents" if to every
pair of objects Z, Y there is an object Z Y and an arrow e: Z Y X Y ..... Z
such that, whatever the arrow f: X X Y ..... Z, there is a unique arrow F
which makes the diagram (16) commute. In other words, the evaluation
arrow e is universal among arrows f from a product.
Our description has also indicated that the exponential Z r, defined in
this way, exists for any two objects Z and Y in the category of sets. This
holds also in the category of vector spaces (or of abelian groups): Just
take Z Y to be the vector space (or abelian group) of all linear transforma-
tions (or of all homomorphisms) Y ..... Z. However, exponentials do not
exist for all spaces in the category Top; in other words, there is not always
a "good" way to define a topology on the function space Z Y. This obser-
vation has actually led to the suggestion of a suitable restriction (or
perhaps an enlargement) of the notion of a topological space, so as to pro-
vide a "convenient category" of such modified spaces, in which products
and the corresponding exponentials always exist.
The description (16) of the exponential has another formulation, this
time in terms of functors. First observe that in a category C with products
(of objects) one also has products of f X g of arrows, because given
f: X ..... X' and g: Y ..... Y' there is by (14) a unique arrow f X g which
makes the following diagram commutative
398 XI. Sets, Logic, and Categories
x---,,-P_- xx q- - y
y---
I
I
I
I
f fXgl g
I (17)
I
I
+
X'--p-,-,- - X' x Y' - - q - '- - Y'
In this way the product X X Y is really a functor (of two variables X and Y).
In particular, for each fixed Y, the operator - X Y is a functor of one
variable, the "product with Y", sending X to X X Y and g: X ~X' to
g X 1: X X Y ~X' X Y. Moreover, ZY for this fixed Y is a functor of
Z. The correspondence (14) above states that the functor X f-> X X Y
uniquely determines the functor Z f-> Z Y by the bijection f f-> F. Here
F: X~ZY is an element of the "hom set" hom(X,Z\ while
f E hom(X X Y,Z). Thus the bijection (14), expressing f in terms of F,
can be written as a bijection of hom sets (cf. (9.8»
Both sides are functors (of the arguments X, Y and Z) to sets, and this
bijection is a natural isomorphism of functors. One says that it makes the
functor ( )Y the right adjoint of the functor - X Y, while the latter func-
tor is a left adjoint (because it is on the left side of the hom set in (18».
Generally a functor P: C ~ X is left adjoint to a functor G: X ~ C when
there is a natural bijection
This axiom provides for the number 1 and the product of two numbers,
since pullbacks and a terminal give products, as noted in §11. One could
also get zero and addition of numbers by an axiom requiring the existence
of an initial object and pushouts. We do not do this because these proper-
ties turn out to be consequences of the other axioms to follow.
x. (1)
s'
Equivalent subobjects of X will be represented by the same arrow from
X, by taking the arrow to be like a characteristic function 1/;. These func-
tions, much used in probability theory, take only the values 0 and 1; I/;x is
1 when x is in the subset S at issue and 0 otherwise; it thus expresses a
basic fact about a subset S: Either XES or not (x E S). Formally, in
set theory the characteristic function I/;s of a subset SeX is defined by
I/;s(x) = 1 XES,
(2)
= 0 x $. S.
400 XI. Sets, Logic, and Categories
Moreover from the characteristic function I/; one can reconstruct the sub-
set S: It consists of exactly those elements x E X which land on I under
1/;. This means that S is the pullback of the inclusion i: {I} c {a, I}
under 1/;:
s --------+- { I }
I
I
I
I (3)
I
I
+ -.jJ
x - - - - - " {O, I}
This description can be purged of all "elements" by using the "universal"
description of the pullback and by replacing the traditional set {O,l} of
two truth values by an arbitrary object n, as follows.
Axiom IV (Subobject Classifier). There is an object n and an arrow
T: 1 ~n from the terminal object I such that every monic m: S ~X is the
pullback of T along a unique arrow 1/;:
s------+- I
m r •
(4)
x -------n
-.jJ
In this axiom, the arrow S ~ 1 must be the unigue arrow from S to the
terminal 1. Also, since 1 is terminal, any arrow (such as T) from 1 must be
monic. Hence the axiom states: There is an arrow T from I which is
"universal" among monics, in the sense that every monic arrow is a
unique pullback of T. Observe that axioms II and III are also statements
about the existence of universals (i.e., of adjoints).
These four axioms are powerful; a category which satisfies them is
called an elementary topas (elementary because the axioms are first order;
a "topos" because every topological space carries such a topos (of
sheaves; as will appear)). For example, arrows 1 ~ Y act "as if" they were
elements of Y (they would be, in the category of sets). Also X is the prod-
uct X X 1, so there is an isomorphism X X I == X. Hence, starting with
a monic we get successive one-one correspondences to other arrows:
mOllIc,
characteristic function,
X == 1 x X,
exponential law.
In other words, equivalent monics S -4 X correspond to "elements" of nX.
Thus ax is an object which is the "set" of all "subsets" of X, so that we
have here a categorical description of the power set W = PX.
12. Axioms on Functions 401
As in the axioms for set theory, one needs an axiom to assure the
existence of something infinite. In set theory, one axiomatized the proper-
ties of the successor function on finite ordinals. Here we axiomatize the
fact that the successor function, with 0, provides definitions of other
arrows by recursion:
0 .. N .. N
I I
I I
I I
I I
If (5)
If I
I
,
I I
I I
I
t
x
.. x .. x
h
This diagram (cf. (11.2.5» is just that defining the function f from h by
the recursion
jO = x, f·s = h·f·
This axiom implies that all the algebras of subobjects are Boolean.
In this approach to Mathematics the objects X do not have elements. As
noted already, the arrows I ..... X might be regarded as elements of X, but
402 XI. Sets, Logic, and Categories
there may not be "enough" of them to make all the usual arguments. In
set theory two functions f,g: X ..... Yare different only when there is an ele-
ment x with fx i= gx where they differ; here there may be two different
arrows f,g: X ..... Y but no arrow x: I ..... X where they differ; that is, where
f·x i= g·x. One can require a topos to be more like classical set theory by
asking that it be "well-pointed" in the following sense (where 0 designates
an initial object, which can be constructed from the previous axioms).
(1)
13. Intuitionistic Logic 403
(we use the same letter t for the functions in the two objects X and Y).
Composition of these functions Ji does define composition in D. One may
think of an object X of D as a set "varying with time t"; at the start it is a
set Xo which changes in time to a set XI; thereby different elements of Xo
may coalesce (xo =1= x' 0 but txo = tx' 0) and new elements may appear.
This category is an elementary topos; that is, it satisfies axioms I-IV
above. For example, the product of two objects X and Y is just the object
tXt: Xo X Yo ..... X I X Y I , while an arrow m: S ......X is monic in D if
and only if both mo: So ..... Xo and ml: SI .....X I are injective, as in the ver-
tical arrows of the figure
mr-~--Sj~l (2)
xo_ _ _ _
t----x l
this means that So C Xo and SI C XI are subsets, with mo and ml the
injections. So when does an element Xo E Xo belong to the subobject S
of X? It may be that Xo E So (that is, Xo = moso for some element so);
when that is the case, then also txo E S I. Or, it may be that Xo is not in
So but that txo is in S I. Or, it may be that neither Xo E So nor
txo E SI. Thus there are three possible truth values! We can define a
"characteristic function" 1/; for the subobject S as a function on Xo to
{O,l,oo} for Xo E Xo as follows:
1/;(xo) = ° if Xo E So,
1 m[~~::~~_____________J;r
~ • to}
1
1 (5)
~ xl--------i--------{ 0, oo}
404 XI. Sets, Logic, and Categories
T
01 • I •
Xa • Xl
P: (OpenX)OP ~ Sets,
This category is a topos, but its logic is not classical, since its subobject
classifier Q is a Heyting algebra and not Boolean. In this topos, take the
406 XI. Sets, Logic, and Categories
n \
Equivalence Class Probability cat~ \ Algebraic fcn
~ZB~ "S::::fThOOry
Combinatorics---Large Cardinals Forcing
~Infi~ity
Figure I
CHAPTER XII
Human Activities
Scientific Questions ] [ Procedures
Problems ] - Ideas
Rules
Definitions [ J
Axioms - Network of Formal Systems
Proofs
1. The Formal
The presentation of Mathematics is formal: Calculations are done follow-
ing rules specified in advance; proofs are made from previous axioms and
follow predetermined rules of inference; new concepts recognized as
relevant are introduced by unambiguous definitions; errors and disagree-
ments are cleared up not by dispute but by appeal to the relevant rules.
It is characteristic of any formal procedure that it makes no reference to
the meaning or to the applications, but only to the form. The formalism
may be imperfect and sketchy, but it carries with it perpetually the possi-
bility of perfection. Because of these characteristics, Mathematics (within
its limits) is absolutely precise and independent of persons. The formal
can be communicated well, without ambiguity. It develops in many suc-
cessive stages.
The formal arises first in the rules for arithmetic questions. Given two
integers in base ten form, rules tell us how to form a "sum" or a "prod-
uct". The rules are unambiguous; we can tell when they have been carried
out correctly, whether by hand, by abacus, or by computer. Different cal-
culations of the same product, when done carefully, come to the same
answer. The rules make no mention of what the numbers might mean, or
what decimals are-although in a larger system, still formal, say in Peano
arithmetic, such rules can be derived from the Peano axioms, the recursive
definitions of sum and product, and the description of what decimals are.
This austerity of the rules of arithmetic is the basis of their applicabil-
ity. The use of arithmetic is governed by well understood practical opera-
tions. The operation of counting uses the successor of each decimal and
prescribes how to count, say, the number of matches in a pile. Then the
formal process of adding two decimal integers matches the practical pro-
cess of combining two piles. If the combined count comes out wrong, the
failure is never attributed to the rules, but to miscounts or missed
matches. The very generality of the rules and their manifold prior uses
means that accidental error is never laid at their doorstep. Similarly there
are common sense operations to determine areas, say of rectangles: Pro-
cedures to recognize when a shape is approximately a rectangle and then
ways to measure its length and breadth in feet. The rule for multiplication
then provides the area, which again can be checked by a measurement.
Thus an area of 96 square feet can be covered (more or less exactly) by 96
square tiles, 12 inches by 12 inches each. If it turns out that the tiles must
overlap, then some count or measurement must have slipped, or perhaps
the rectangle was really a parallelogram. It is never the rule for multipli-
cation of decimals which is at fault.
In brief, the formal rules of arithmetic are a firm background for the
occasionally faulty operation of counting.
The Peano axioms provide a second step in formalization. Now the
rules for calculation are deduced from axioms, but again without any
!. The Forma! 411
2. Ideas
Most of the formalizations in Mathematics are based on some underlying
idea-an intuitive notion which gives guidance and purpose. It is not easy
to give a precise description of the nature of an idea; indeed a deeper
idea may be almost impossible to communicate and so may be recognized
only after it has been embodied in some formalization. A number of the
more general ideas of Mathematics are stimulated more or less directly by
human activities. In §I.ll, Table 1 we have given a sample of such ideas.
It was noted then (and in many subsequent examples) that an idea may
have several quite different formal realizations. Thus the idea of "mani-
fold" starts as a geometric locus of all the items or points of some concept.
It becomes formal as a topological manifold or as a differentiable mani-
fold (COO, C 1 or just smooth in some sense) or perhaps as a complex man-
ifold. In all these cases" every point of the manifold has a neighborhood
described by a "good" chart, but the original idea of manifold should also
allow in some way for manifolds with singularities (and I speculate that in
the future geometry will see more formal concepts of this sort). In any
event, the "idea" of manifold is nebulous enough that it has many formal-
izations; moreover these different formalizations allow us to study
separately different aspects of the idea.
Mathematical ideas arise not just from human activities or scientific
questions; they also arise out of the urge to understand prior pieces of
Mathematics. A "set" was initially a collection of points in the line or the
plane, and then any sort of "collection", considered as one thing and con-
sisting of well-defined Mathematical objects. Subsequently the "idea" of a
set was described more specifically in terms of the cumulative hierarchy.
Then we have an idea clear enough to be communicated but still fuzzy:
At each stage Ra of the hierarchy one next forms the set of "all" subsets
of Ra -and there is here no formal description of what subsets are meant.
Ideas, as we use the term, are inherently vague. (They are not to be con-
fused with the Platonic idea of the ideal line or the ideal circle or the
ideal set.)
Since we are not able to give any precise description of the term "idea",
we may simply list examples (see Table 1), this time of ideas which arise
from prior parts of Mathematics; again each idea turns out to have
several formalizations. We have already seen other examples: The idea of
curvature in differential geometry, or the related ideas of eigenvalues and
spectrum for self-adjoint operators.
The development of an idea can be long and involved. Thus the study
of change and motion suggests the idea that change can be "smooth"
rather than sudden. Examples of such smooth changes appear both in
mechanics and in geometry. Eventually, this idea "smooth" divides into
the separate ideas of "continuous" and "differentiable." The difference
between these two, however, is clear only after both have been carefully
formulated by means of the descriptions of limits. Differentiability, so
416 XII. The Mathematical Network
described, then appears in the calculus and in the analysis of the local
structure of curves, surfaces, and manifolds; and then eventually (but not
until the 1930's) in the precise global definition of a smooth manifold.
Continuity, once formalized, is similarly seen to apply to functions of
several variables and to functions of curves, thus gradually leading to the
20th century notion of a metric space. Eventually it becomes clear that
the metric notion of distance can be replaced by the more qualitative
notion of neighborhood, as in the definition of a topological space. Thus it
is that the formal properties of open sets eventually serve to codify the
very idea of continuity.
There are also many less general ideas: An idea how one might prove
some desired theorem. Sometimes the idea fails, and sometimes its suc-
cessful execution involves complications or additional technical tricks.
Again, the proof may be just a routine realization of the idea. In any
event, when the formal proof is at hand the original idea (with supporting
detail) has been realized. Ideas require formalizations.
The idea of putting together separate pieces of a function leads on the
one hand to a concept of a sheaf (as in §VIII.lI); on the other hand,
where the pieces are power series expansions, it becomes the concept of
analytic continuation for holomorphic functions.
Our view that Mathematics is and must be formal is supplemented by
the observation that each formalism rests on some underlying or leading
3. The Network 417
idea. The reverse can also happen, as when geometric ideas are formu-
lated so vaguely that they do not constitute a proof of the intended
results.
3. The Network
We cannot realistically constrain Mathematics to be a single formal sys-
tem; instead we view Mathematics as an elaborate tightly connected net-
work of formal systems, axiom systems, rules, and connections. The net-
work is tied to many sources in human activities and scientific questions.
We have already sketched some pieces of such a network, in §V.lO with a
table of functions, transformations and groups, in §VI.12 with a table of
the interactions of the concepts of the calculus, and in §IX.13 with the
table of the Interconnections of Mathematics and Mechanics.
In the full, dense network of parts of Mathematics, there are many out-
side ties. The most basic subjects (the subjects at the "edges" of the net-
work) are tied closely to connections reaching outside of Mathematics.
The primary ties are those to various human activities: Counting, measur-
ing, moving, observing, changing, and others as listed in §I.8. In some
cases, these ties can also be regarded as connections not to activities, but
to phenomena: Multitude (that which can be counted), extent (that which
can be measured), motion (which can be observed), change (which again
can be observed). By way of these ties and connections, mathematics is
grounded in "reality", at least in whatever reality is represented by these
activities and these phenomena.
The subjects of Mathematics are also tied or connected to other parts of
human knowledge, and most especially to the various sciences. Geometry
is tied to mensuration, architecture, surveying, navigation, and, on a more
sophisticated level, to space, time, and space-time as these enter into phys-
ics. Calculus is tied to mechanics, dynamics, and many other parts of
theoretical physics. Differential equations and Fourier analysis are like-
wise tied to physics-as is also vector analysis, complete with all the
sophistication of dual spaces and tensor products. Calculus is also tied to
economics, for example, in the use of marginal concepts in Mathematical
economics. The number of such connections between Mathematics and
science is very great-and often these connections go to subjects which are
in the "middle" of the network of Mathematics, and not just to the basic
subjects at the edge of the network.
These external connections of Mathematics are numerous and tight, but
they do not wholly describe or determine the Mathematical subjects. Basic
Mathematical concepts may be derived from human activity, but they are
not themselves such activity-nor are they the phenomena involved as the
background of such activity. The subjects of applied Mathematics are
418 XII. The Mathematical Network
with rational coefficients ai' As in the Galois theory (§V.7) one considers
not just y, but all the rational combinations w of Y and its powers; they
form a field K, called an algebraic number field. The degree of the irredu-
3. The Network 419
cible equation (1) for y is also the dimension of the field K, regarded as a
vector space over the ground field Q of rational numbers; this observation
establishes a useful connection with linear algebra. Every number w in the
field K satisfies a polynominal equation like (1) of degree at most n (use
linear algebra!). When there is such an equation with highest coefficient 1
(the equation is monic) and with all the other coefficients ai rational
integers, the number w is called an algebraic integer. The collection of all
the algebraic integers w in the field K forms a commutative ring tJ; indeed
this important example (with polynomials) is a basic reason for introduc-
ing that concept; see §VI.3. In this way the passage from integers to
rational numbers is extended; tJ passes to K:
Z C Q
n n (2)
tJ C K = Q(y).
This is the start of algebraic number theory and its connections with
Galois theory.
The quadratic case (degree n = 2) appears first. For example, if y = i,
this ring tJ is just the ring of all complex integers m + ni with m,n EZ. In
this ring every integer can be factored, and essentially uniquely, into
prime (i.e. irreducible) complex integers. But there is no such unique fac-
torization for many other quadratic fields. And when y is a higher root of
unity (in which case K is called a cyclotomic field) this need not be the
case, as Kummer discovered in trying to handle the Fermat problem.
However, unique factorization can be restored, by factoring algebraic
integers not into these integers, but into "ideal" factors which are exactly
the ideals used in §7.10 as the kernels of ring homomorphisms; this means
that one must define a product AB of two ideals in tJ-as the ideal gen-
erated by all products ab for a EA, bE B. Also, each element w in tJ deter-
mines an ideal, the set (w) of all its integral multiples. Now the algebraic
number w is prime in tJ when it cannot be factored there; that is, when
for all u, v in tJ, uv E ( w) implies U E ( w) or v E ( w). Similarly an ideal P in
tJ is called prime if uVEP implies UEP or VEP. Now we can state the first
main theorem of algebraic number theory: Every algebraic integer w in tJ
has a unique decomposition
(3)
Iw I = 2- Vw
Co + CIP + C2P
2
+ ... (7)
g5Urin~~haPing
7r
Calculus
\'"
Applied Mathematics
.................
-_
Geometry
I"
I"
---- - - Formulas
-Algebra
/,///
/~
"'"
,
'.
Number Theory
Figure 1
4. Subjects, Specialties, and Subdivisions 423
~~RC
R"II:'"~'om')-NOnhl'," i/~:::~::~'"~
T \
Complex Analysis (Curvature) Projective Geometry Fields
Ri",~
C Differential Geometry ~ Geometry
Riom"" 'mi'""' ~ Algebraic Number Theory
~ (Manifolds) /7
(Ideals) ""
Analytic Number Theory
Figure 2
specific domains (R or C). Here the initial ideas of the calculus (rate of
change, the whole is the sum of its parts) have an extraordinary and for-
mal development. Through the ideas of tangent lines, curvature and tor-
sion there is a deep connection with geometry: Differential geometry as a
subject can be defined either as an application of calculus to geometry or,
more intrinsically, as a study of smoothly curved space-which need not
have an Euclidean context.
Our Figure 2 does not reflect the study of functions or the pervasive
aspects of group theory; this was pictured in part in the Table of §V.10. In
principle, group theory can be described formally as the study of all the
set-models of the (very simple) group axioms. This does not recognize the
variety of interconnections of group theory with other subjects, Figure 3 is
a start, beginning of course with the basic examples of symmetry. Group
theory as a subject quickly splits into different branches: Finite group
theory (which is close to number theoretical ideas) and infinite (combina-
torial) group theory. The addition of the commutative law produces the
subject of abelian groups, which has a quite different flavor and may not
be as "deep" as finite group theory. However abelian groups are extraor-
dinarily useful as topological invariants: they and modules lead to homo-
logical algebra (for example, in abelian categories). In a different direc-
tion, groups with a smooth structure (Lie groups) or with a topology are
closely tied to analysis. Finally the representations of groups by linear
transformations ties group theory back to some of its origins in geometry
and has remarkable further connections: As carriers of symmetry, such
representations can be used to predict the varieties of elementary particles
in Physics.
Figure 2 omitted one side: The connections of analysis with classical
applied Mathematics. Here the various fields are tightly connected to the
Linear Transformation
~
(Composition) (Multiplication)
Differential Equations
/ Mod\,"
I
Lie Groups - - - - - Groups Abelian Groups
Quantum Mechanics
Analysis
/ / \ ~ Fo.oliom' '""y,1i
I ~H=illOO,,"
Mechanics of Particles / ~metric Optics
Moehrni" Cootio"m M"rumi"
Differential Geometry I
Aerodynamics "" Plasticity
Figure 4
4. Subjects, Specialties, and Subdivisions 427
ICh'T,g I
Icoun<'
)'G-e-ne-ti-cs', Real Analysis
(Binomial/Gaussian)
~ Random Process
here. Probability enters vitally into statistical estimates, but in statistics the
choice of methods (linear regression is overused) is much dependent upon
the application intended, while the foundations are still partially obscure
(Bayesian) or at least controversial. For these reasons we cannot count
statistics as now a part of Mathematics in our sense.
We have thus exhibited several diagrams of (part of) the network of
Mathematics with nodes the various specified "subjects" or "branches".
The description of a subject can be quite various. It is sometimes done by
a concept (holomorphic function) carefully defined in some context. It is
sometimes derived from (our ideas of) a form of experience-as in
geometry. But then the subject may split into more specific forms, some-
times those described by axioms (topological space) and sometimes by
types of formulas (algebraic geometry). Within algebra, different subjects
are often delimited by the appropriate axiom systems, but often the direc-
tion in which the axioms are exploited is suggested by the applications or
by the problems of special interest. When axioms are used, it can easily
happen that one subject is "contained" in another in a formal sense. Thus
a lattice is a particular kind of partially ordered set (one with greatest
lower and least upper bounds of pairs of elements), while a partially
ordered set in its turn is a particular kind of category (one in which, given
objects A and B, there is at most one arrow from A to B, and this when
A < B). Despite these formal inclusions, the subjects in question really
count as different ones, because of the very different ways in which the
axioms are developed.
Many subjects are subdivided into further specialties, reflecting new
techniques, different motivations and sometimes just the pressure of more
and more specialists. Thus Mathematical logic was once neatly subdivided
into four principal branches as shown just below the four boxes in Figure
6. However, one of the four (Proof Theory) did not have equal status and
the other branches proceeded to subdivide again, as indicated in the case
428 XII. The Mathematical Network
Analysis-----f~~_R~l~·go~r~ou:.::s!;F~ou~n~d~at:::io~n-....f-----Geometry
I
Proof Theory
Independence
Figure 6
5. Problems
The development of Mathematics, as reflected in the multiplication of its
subfields, is driven by various forces: Ideas arising from human activities,
questions posed by science, and problems presented within Mathematics.
Many research mathematicians think that the main guide to Mathematical
research is the desire to solve specific Mathematical problems-not so
much the smaller puzzle-type problems, but the major and famous
Mathematical problems. This includes some of the notable problems
which have recently been solved, such as Hilbert's fifth problem (con-
tinuity assumptions on Lie groups imply analyticity), the negative solution
of the decision problem for Diophantine equations (Hilbert's tenth prob-
5. Problems 429
lem), and the positive solution for the Poincare conjecture in dimension 4
(a simply connected four-dimensional manifold with the homology of a
4-sphere is necessary homeomorphic to a 4-sphere). More especially, one
must count here the many famous problems not yet solved: The Poincare
conjecture in dimension 3, the Riemann hypothesis for the zeros of the
zeta function, the Goldbach hypothesis (every even number is the sum of
two primes), the conjecture that every finite group is the Galois group of a
normal extension of the field Q, and many more. There are other, less
spectacular but equally difficult problems to be found in individual fields
of Mathematics. In all these cases, the continued attempt to solve such
problems can be a major source of new techniques, new ideas, and even
new branches of Mathematics.
Fermat's last theorem (xn + yn = zn has no solution in integers for
n > 2) is a striking example. For the corresponding equation with n = 2
one can find all integral solutions (all integral Pythagorean triangles) by
the factorization
i = (z2 - x 2 ) = (z + x)(z - x)
followed by a routine attention to the prime factors of the right hand side.
In the nineteenth century, Kummer was tempted to try the same for any
n, by the factorization
ax 2 + 2bxy + ci
ture (§V.9) led to the subsequent elaborate determination of all finite sim-
ple groups and to an extensive revival of the whole subject of finite group
theory. In this case (and in many others) the presence of a pertinent prob-
lem is a major dynamic in the advance of Mathematics. However, this is
not always so. Hilbert's fifth problem (about continuous groups) was
famous until it was solved-and then it dropped out of prominence,
though the general subject of Lie groups continued to be active, little
influenced by the solution of this problem.
There are also all manner of small problems, some natural, some con-
cocted. Problems are a vehicle of competition and display. In some
schools (often in those influenced by Hungarian traditions) the business of
mathematics seems to be just the formulation and the eventual solution of
hard problems, with perhaps more attention to the difficulty than to the
relevance of the problem. Problems in this sense are akin to chess
problems-challenging but not necessarily relevant. Planar graphs provide
an example. Here a graph means an (abstract) finite collection of vertices
and edges, with the prescription that each edge joins exactly two
(different) vertices. Some graphs cannot be drawn in the plane (or,
equivalently, on a sphere) without an unintended crossing of edges. This
is the case in a simple conundrum. Tell the gas, electric and piped-heat
companies how to run their mains (gas etc.) all at the same level, six feet
below ground; to three different houses:
(2)
(3)
6. Understanding Mathematics 431
6. Understanding Mathematics
A major element in Mathematics is the need to fully understand why the
formulas work and why the theorems hold. In lectures, in courses, and in
conversations Mathematicians try repeatedly to better understand the
received results of each subject. Their methods are various; we now
examine some under the headings analogy, examples, analysis of proofs,
shift of attention, and the search for invariant form:
(a) Analogies. In many respects, 2-dimensional Euclidean space is much
like 3-dimensional space: . Points and lines become points, lines, and
planes; triangles and circles become tetrahedrons and spheres. These and
other analogies strongly suggest a consideration of "spatial" configurations
in higher dimensions. Some initial attempts describe 4-dimensional space
by points, lines, planes, and hyperplanes; as this becomes cumbersome
and as the phenomena requiring still higher dimensions arise there is a
search for more effective methods, first by using coordinates and then by
vector spaces.
432 XII. The Mathematical Network
in the Jordan curve theorem and the resulting deeper study of the topol-
ogy of the plane.
To present a proof, one may often try to put it in a vivid or diagram-
matic form. When this done for the Jordan-Holder theorem (§V.7), it
naturally leads to the diagrammatic presentation of a partially ordered set
and thence to the notion of a lattice. Again, topologists found it suggestive
to denote continuous maps from one space X to another space Y by
arrows X ~ Y. This simple notation soon suggested fertile concepts of
exact sequences and of categories. In brief, proofs are the meat of
mathematics, so chewing them over can produce juicy results!
(d) Shift of attention. The understanding of a proof may progress when
our attention is shifted from the initially emphasized aspect to some other
focus. Thus Galois theory starts as the study of the solutions of a particu-
lar polynomial equation; it then becomes the study of the field of all the
rational combinations of the roots of that equation. This shift makes it
possible to see the Galois group of the equation not as a group of permu-
tations of its roots, but as a group of automorphisms of the corresponding
field. In group theory, the cosets of a normal subgroup N of a group G
become the elements of the factor group G/ N; when, instead, the
emphasis is placed on the function sending each element to the coset
which it generates, there is revealed a homomorphism of the original
group G onto the factor group G/ N -and one then sees that the properties
of this homomorphism, and not the particular properties of the cosets,
provide (§V.6) the more effective description of the factor group. Simi-
larly, shifting from matrices to the corresponding linear transformations
substantially changes the emphasis in linear algebra, from calculation to
visualization-and lays the groundwork for an effective approach to
infinite-dimensional vector spaces.
(e) Invariant formulation. The introduction of analytic geometry by
Descartes provided for algebraic proofs of geometric facts-but the alge-
braic proofs depended on a choice of coordinate systems, while the
geometric facts-for example, the fact that the medians of a triangle are
concurrent-are independent of the choice of coordinates which may be
used in the proof. For this reason, the development of analytic methods in
geometry has constantly seen parallel attempts to make the results (or
even the arguments) independent of such choices. In linear algebra, this
appears in formulations independent of the choice of bases. In tensor
analysis, a tensor, relative to bases, first appeared as a multiply indexed
family tt of scalars t, and the description of a tensor had to be coupled
with a complete description of the effect of a change of basis on these t's;
alternatively, tensors can be defined (§7.9) in a coordinate-free manner as
elements of tensor products of vector spaces. An algebraic variety in affine
n-dimensional space is the locus of a finite list of polynomial equations in
the coordinates-but it can be described more invariantly in terms of the
ideal of all polynomials which vanish on the locus; the given equations
434 XII. The Mathematical Network
are then just one set of generators (among the many possible sets) for that
ideal.
The search for an "invariant" formulation is not limited to questions of
geometry. It can also arise in algebraic contexts. Thus a specific group
may originally be described in terms of specified generators and their rela-
tions; for example, a dihedral group is generated by a rotation (by 360 jn)
0
extensions (those which are normal with an abelian Galois group (§V.7)
over some ground field, such as Q). Similarly the elaborate study of the
various forms of conics, cubic, and quartic curves in the plane finally
extracted some systematic facts about intersections; for example, a cubic
and a quartic "in general" intersect in 3.4 = 12 points. Generalization
then led to Bezout's theorem: Two plane curves of orders m and n inter-
sect "in general" in mn points. But the validity of this theorem does
require some substantial adjustments. Among the intersections we must
count possible intersections at "infinity" (in the projective plane) as well
as all the intersections at points with complex coordinates-while each
intersection must be counted with a suitable "multiplicity". Here special
cases, such as tangent lines and osculating circles suggest how to go about
defining what is meant by multiplicity; for example, suitable ideals may
be used. Similar notions of multiplicity thus develop for the intersections
of surfaces and manifolds of higher dimension: They playa major role in
algebraic geometry and subsequently in algebraic topology. There they
are transmuted into cup products in cohomology.
There are many other examples of generalizations from cases-for
instance, various cases in number theory led to the unique decomposition
of finite abelian groups into products of cyclic groups (V.8.4).
(b) Generalization by analogous steps refers to situations where a more
general theory runs in parallel to earlier cases without actually replacing
or subsuming those cases. An instance in point is the parallel between real
numbers, complex numbers, and quatemions-followed by the search for
higher dimensional such algebras over the reals, with the discovery that
there were no further finite dimensional division algebras without the loss
(say) of the associativity of multiplication. Another striking example is the
generalization from real analysis to complex analysis.
(c) Generalization by modification refers to instances in which a desir-
able theorem cannot be generalized in a direct way, but in which suitable
concepts are used to modify the theorem so as to make a generalization
possible. Thus the unique prime decomposition valid for the ordinary
integers fails to hold for the ring of integers in most quadratic and
cyclotomic fields of algebraic numbers-but it can be replaced in this and
other rings by a unique decomposition of ideals into products of prime
ideals. Moreover, the original decomposition of an integer n into its prime
factors is reflected in the general theorem by way of the decomposition
into prime ideals of the principal ideal (n) of all multiples of n. Again, the
Jordan curve theorem, valid for simply closed curves in the plane, is not
true for all such simple closed curves on the torus-but the classification of
the ways in which this theorem fails is a major source of the study of
homologous curves and of the introduction of the homology and homo-
topy groups (which, in effect, measure variously the numbers of ways in
which the Jordan curve theorem can fail).
Abstraction. Generalization and abstraction, though closely related, are
best distinguished. A "generalization" is intended to subsume all the prior
436 XII. The Mathematical Network
instances under some common view which includes the major properties
of all those instances. An "abstraction" is intended to pick out certain cen-
tral aspects of the prior instances, and to free them from aspects extrane-
ous to the purpose at hand. Thus abstraction is likely to lead to the
description and analysis of new and more austere or more "abstract"
mathematical concepts. We will describe some types of abstraction under
headings "abstraction by deletion," "abstraction by analogy," and
"abstraction by shift of attention".
(d) Abstraction by deletion is a straightforward process: One carefully
omits parts of the data describing the mathematical concept in question to
obtain the more "abstract" concept. This often leads to a reverse process,
in which it is shown that all (or some) of the abstract objects can have the
deleted data restored, perhaps in more than one way. Such a restoration is
then called a "representation theorem".
For example, if one starts with the notion of a transformation group,
one may delete the elements being transformed but retain the associative,
identity, and inverse laws for the composition of transformations. The
result is the notion of an "abstract" group. The corresponding representa-
tion theorem, due to Cayley, asserts that every abstract group is iso-
morphic to a group of transformations on some set (§V.6).
The algebra of sets (§ 1.9) concerns the algebraic operations of intersec-
tion, union, and complement on the subsets of some given set. If one
deletes all references to the elements of these subsets, but retains the three
operations and (a suitable list of) the basic identities which they satisfy,
one reaches the notion of a Boolean algebra. Conversely, the Stone
representation theorem asserts that every Boolean algebra can be
represented as the algebra of some of the subsets of a suitable set.
Sometimes such a deletion is suggested by the sorts of theorems being
considered. Thus, in functional analysis, certain theorems about the
existence of functions with given properties are like theorems about the
existence of vectors in linear spaces. This use of geometric language then
suggests an abstraction: The deletion of the numbers on which the func-
tions act, so that the functions (for the intended purpose) become just
points or vectors in some Banach or Hilbert space.
(e) Abstraction by analogy occurs when a visible and strong parallel
between two different theories raises the suggestion that there should be
one underlying, less specific theory sufficient to give the common results.
Thus systems of numbers or of functions closed under the usual rational
operations (addition, subtraction, multiplication, and division by non-zero
elements) follow the same algebraic rules and yield the same theorems for
the solutions of quadratic and higher equations (Galois theory). These
parallels then suggest the abstract notion of a field-which soon turns out
to include other instances such as the p-adic numbers and the field of
integers modulo p as well as other finite fields. In the same direction, the
close parallel between algebraic number fields and fields of algebraic
7. Generalization and Abstraction 437
R :J T implies R n (S u T) = (R n S) u T. (1)
One is thus led to formulate and prove the theorem asserting that any two
maximal descending chains in a finite modular lattice have the same
length. For subspaces of a vector space, the length is the dimension. How-
ever, this modular lattice theorem does not contain the original Jordan-
Holder theorem, because the latter concerned not chains of normal sub-
groups of the whole group but chains of subgroups, each normal in the
next larger. The modular lattice theorem shows only that any two chief
series (maximal chains of normal subgroups) in a group have the same
length. However, the modular lattice theorem does apply in other cases, as
for example to subspaces of a projective space, and it has infinite-
dimensional generalizations, to the so-called continuous geometries used
by von Neumann to analyse rings of operators on Hilbert spaces. (See
Birkhoff [1967].)
Incidentally, the modular law (1) also holds in a Boolean algebra which
satisfies the stronger distributive law (for all Rand T)
R n (S U T) = (R n S) u (R n T).
This resembles (1), since R :J T there means that T on the right of (1)
may be replaced by R n T.
(f) Abstraction by shift of attention. Some abstractions arise when the
study of a Mathematical situation gradually makes it clear that certain
features of the situation-perhaps features which were originally
obscure-are really the main carriers of the structure. These features, with
their properties, are then suitably abstracted. Thus the Galois theory
begins as a theory of the roots of a given polynomial, and the Galois
group arises first as a group of permutations of those roots-all permuta-
tions leaving invariant all polynomial (and hence, all rational) relations
between the roots. Presently, it appears that this group of permutations
acts on all the rational expressions in these roots, and that the Galois
438 XII. The Mathematical Network
8. Novelty
We have seen that problems, generalizations, abstraction and just plain
curiosity are some of the forces driving the development of the
8. Novelty 439
where u( t,x) is the height of the wave above the standard water level in
the canal at time t and distance x down the canal. In this equation the
basic terms u, + Ux express standard conditions (§VI.ll) for the propaga-
tion of waves such as u = f( x - t); they are modified by the effect of
dispersion represented by the third partial derivative Uxxx and by a term
uU x , non-linear in u. This equation does have explicit travelling wave solu-
tions, appropriate to the observed phenomenon. Moreover, it has surpris-
ing applications elsewhere, for instance in magneto-hydrodynamics.
Computer science brings up other new Mathematical ideas. For exam-
ple there are new algorithms which find the product of two matrices more
quickly than the straightforward application of row-by-column multiplica-
tion. There are many other new algorithms, plus questions of principle
about computational complexity: For a computation of specified size, how
does one estimate the minimum time really necessary; can it be done in
"polynomial time"?
Not all outside influences are really fruitful. For example, one engineer
came up with the notion of a fuzzy set-a set X where a statement x E X
°
of membership may be neither true nor false but lies somewhere in
between, say between and 1. It was hoped that this ingenious notion
would lead to all sorts of fruitful applications, to fuzzy automata, fuzzy
decision theory and elsewhere. However, as yet most of the intended
440 XII. The Mathematical Network
9. Is Mathematics True?
It is customary to ask of a piece of Mathematics: "Is it true?" For exam-
ple, one might ask: Is the Jordan curve theorem true of that simple closed
curve drawn out there in the middle of my complex plane? The theorem
says that the curve separates that plane into an inside and an outside so
that no path can connect an inside point to an outside point without
meeting the curve itself. That particular curve is quite convoluted (Figure
X.4.3), so it's a bit hard to really tell which of the points are inside and
which are outside; but is it true?
The whole thrust of our exhibition and analysis of Mathematics indi-
cates that this issue of truth is a mistaken question. There really isn't an
absolutely fiat Euclidean plane out there, complex or otherwise; there is
only the (very slightly) bumpy surface of the blackboard, which further-
more is very far from extending out to "infinity" in any direction what-
ever. What appears on the blackboard is not really a simple closed curve,
but a wavy and somewhat thick line of chalk marks; perhaps the marks
even skip a little, so that a careful draughtsman might be able to sneak a
path from the "inside" out to some point clearly "outside". Even if the
curve were devoid of gaps, I can't really demonstrate that it is continuous;
for every £ greater than zero I haven't actually chosen a 8 such that. ...
Moreover I am not really interested in the inside and outside of this par-
ticular Jordan curve; I am rather interested in knowing that my plane is
simply connected or that the holomorphic functions I intend to define on
it satisfy Cauchy's integral theorem or that the results of this theorem can
be used, via some conformal map, to design an airplane wing. For this
and all sorts of other reasons the theorems of Mathematics, by Jordan or
others, are not simply statements about the behavior of individual objects
in the physical world.
This point can be put more explicitly. Our survey has indicated that
Mathematics is an extensive network of formal rules, definitions and sys-
tems, tightly tied here and there to activities and to science. This descrip-
9. Is Mathematics True? 441
tion does not in any way provide a physical object for each Mathematical
term, or a physical law corresponding to each Mathematical theorem.
Instead there are many pieces of Mathematics and a variety of accepted
procedures for making practical use of some of these pieces. As a result, it
is simply meaningless to ask, given a relation between Mathematical
terms, whether it is true of the "corresponding" physical objects. The
correspondence is just are not that direct. Instead, our description of
Mathematics indicates that the appropriate questions are different ones:
Is this piece of Mathematics correct? That is, do the calculations follow
the formal rules prescribed, and are the theorems deduced from the stated
axioms by rules of inference on which we have agreed?
Is this piece of Mathematics responsive? That is, does it settle some
problem which had arisen or does it carry further some development
which was incomplete?
Is this piece of Mathematics illuminating? That is, does it help under-
stand what had gone before, either by further analysis or by abstraction or
otherwise?
Is this piece of Mathematics promising? That is, though it is a novel
departure from precedent or fashion, is there a reasonable chance that it
will subsequently fit in the picture?
Is this piece of Mathematics relevant? That is, is it tied to something
which is tied to human activities or to science?
Many of these questions are questions of degree: Is it more or less
relevant? Even that is hard to judge because relevance can be transmitted
all along the network of Mathematics. There are many pieces of
Mathematics which fit well but are not immediately relevant to any
application-and which much later may tum out to be applicable. The
promise of a new idea is also difficult to judge; too often the new may be
dismissed as "off-beat". But before all comes the question: Is it correct?
Centuries ago, when Mathematics dealt largely with arithmetic and
elementary geometry, it was perhaps easier to think of the numbers or the
geometric figures as real objects about which one could make true state-
ments. These objects had been long familiar and they were eminently use-
ful, so it was comfortable to consider them as real. Now, looking carefully,
we see that this comfort was an illusion. Numbers are the means used in
calculating by rules, while figures are the images used to suggest formal
geometric proofs; their eminent utility simply suggests that the formal is
powerful in practise. By now there are so many and various Mathematical
objects that any imputation of their reality offers spectacular cutting
opportunities for Occam's razor.
The real world is understood in terms of many different Mathematical
forms. For example, the fours group Z2 X Z2 is not a single object some-
where in the world. It is rather a form, exemplified many times over in
the world, by the symmetry present in each rectangular shape. It is also
exemplified within Mathematics, say as the Galois group of the field
442 XII. The Mathematical Network
which are highly relevant to the world. This requires explanation on the
metaphysical and the epistemological level-and raises thus a variety of
questions to which I have no adequate answer.
Here is a sketch of some tentative answers to these questions.
(i) Various phenomena do have underlying similarities and regularities.
These regularities are somehow propagated from one situation to
another.
(ii) On the basis of millennia of experience, mankind has developed
"ideas" about the phenomena which in turn are used to extract from
the phenomena a conceptual description of some of these similarities
and regularities. Some of the concepts involved can be made strictly
formal, and hence Mathematical.
(iii) Different formal statements in the descriptions, because of the propa-
gation of the regularity of phenomena, can be closely connected with
each other. Because these connections help the practical understand-
ing of the regularities, they have been extensively examined and in
some cases reduced to formal proofs, by rules of inference, from
astutely chosen axioms.
(iv) In many cases, the results of these proofs fit the facts-not just the
particular immediate facts from which the concepts were extracted,
but other facts which, because of the propagation of regularity, also
fit these deductions. By the choice of the successful cases of concept
formation and deductions, various branches of Mathematics are
selected and developed. From time to time the development is sup-
ported by additional formal explanations of the regularity of new
phenomena.
A development of this analysis might account for the "unreasonable
effectiveness of Mathematics in providing methods for science". For
example the concept of a group, originally formulated to analyze certain
symmetries in Galois theory and in geometry, turns out to be relevant to
Mathematical physics, primarily because corresponding symmetry
phenomena arise there.
To summarize: The world has many underlying regularities which, once
extracted, can be analyzed and understood by Mathematical form.
Because it is formal, the same Mathematical notions can apply to widely
different phenomena.
Question IV. What is the boundary between Mathematics and (say)
Physical Science?
Mathematics has been described here as a formal development of ideas
suggested by phenomena. Some portions of this description apply to other
sciences, notably to Theoretical Physics. These, too, make considerable use
of models and forms-and hence of Mathematics. There, however, one
develops only the forms that seem to be useful in fitting a specific type of
phenomena. If it turns out that they don't fit, they are discarded. In
Mathematics, one pursues the forms whichever they may lead.
10. Platonism 447
10. Platonism
Our view of Mathematics provides both for formal concepts and for guid-
ing ideas, and hence accounts for the role of intuition in Mathematics. It
is, however, in sharp contrast to all variants of Platonism. Let us consider
this contrast.
We are not concerned with the actual historical doctrines formulated by
Plato but with the current views going under his name. These views typi-
cally hold that Mathematical concepts are about externally given objects
which therefore impose their nature upon the results of Mathematics.
That Euclidean geometry can lead to Platonism is clear. This geometry is
not "about" the imperfect lines and figures which we may draw on an
uneven blackboard or on crinkly paper. It must be "about" something, so
it is about ideal straight lines, perfectly rounded circles and absolutely flat
planes which have an objective existence. Once formulated so, this doc-
trine extends naturally to encompass ideal numbers and their properties,
functions and their derivatives and finally the whole world of sets. Thus
Platonism in Mathematics views Mathematics as dealing with a domain of
abstract or ideal "objects" which have being independent of our thought
about them and whose being therefore determines what can be truly
thought about them.
There are various versions of this view; I will follow here the careful
distinctions used by Michael Resnik on page 162 of his book Frege and
the Philosophy of Mathematics. First a methodological Platonist is one
whose activities fully endorse all the standard, infinite, and non-
constructive methods of Mathematics, as if Mathematics were "dealing
with a mind-independent infinite domain of abstract entities". An ontolog-
ical Platonist is one who "recognizes the existence of numbers, sets, ... on
a par with ordinary objects". An epistemological Platonist holds that
"knowledge of Mathematics objects is . . . in part based upon a direct
acquaintance with them". Finally a realist "believes that the objects of
Mathematics ... exist independently of us and our mental lives".
These several views are considerably different. First observe that the
practise of Mathematical research does require deep concentration and
complex understanding of what can follow from known axioms and
theorems. This process often may depend on vivid intuitive imagination
that the concepts at issue concern objects which are really "there". This
description of Mathematical practise has been called "mythological Pla-
tonism"; it seems to be essentially the same as the methodological Platon-
ism defined above. It is our view that it is not a philosophical doctrine
about the nature of Mathematics, but a description of the process of
Mathematical research. As such, it is an appropriate start on such a
description. For example, if I wish to make (as I have done) extensive
computations of the cohomology groups of the so-called Eilenberg-Mac
Lane spaces K( '!T,n), I know that such a space is defined to be one in
which there is just one homotopy group '!T in dimension n, and I recall the
448 XII. The Mathematical Network
This listing does put special emphasis upon extracting and formulating
ideas and on understanding their import: this is in some contrast to the
traditional view that research consists primarily of finding new theorems
(item (f)). But all of these tasks are hard; the search for new ideas and
new formalizations is inevitably adventurous and uncertain. Here are
some examples.
(a) The idea of using a Fourier series to represent periodic functions
was extracted from the study of heat and eventually led to a better under-
standing of the notion of a function and to an extensive and still continu-
ing development of harmonic analysis (as under item (f)). More recently,
the notions of game theory were extracted from practical concerns: they
have had extensive applications in Mathematical economics, but some-
what less influence in Mathematics proper.
(b) The ideas of complex number, group, and set were first disentan-
gled and formulated in the 19th century. They have each proved to be
extraordinarily important; in particular the study of groups has led to all
sorts of interconnections with other parts of Mathematics, for example in
the harmonic analysis on a group.
The concept of a topological space was developed in its present general
form only after the wide exploration of all sorts of particular examples-
and it has served to establish many new connections with complex
analysis, differential geometry, algebraic geometry, and Galois theory, for
instance. Other formal concepts may be developed "before their time" to
find use only much later. For example, the notions of lattice theory were
found about 1900 by Dedekind and others, but did not at that time find
any noticeable resonance. The same notions, when rediscovered by Gar-
rett Birkhoff and Oystein Ore in the early 1930's, were immediately put to
use in projective geometries, continuous geometries and in the analysis of
subobjects of algebraic systems. It would seem that by 1930 there were at
hand more uses for such abstract notions. Subsequently, after a lively
decade of research, lattice theory became of less central interest to alge-
braic developments-it may be because the principal uses were already
worked out and the remaining problems were artificial. There are many
other examples of failure, success, and partial or temporary success in the
introduction of new mathematical concepts. Exploring the unknown is
bound to be an adventurous and chancy business!
(c) Solving problems arising in science is a major activity of applied
Mathematics. There are many examples not covered in our text-but the
development of the calculus to handle the problems of celestial and ter-
restrial mechanics is an outstanding example.
11. Preferred Directions for Research 451
The influence of habit is clear: it is easiest to work in the field that one
learned first; moreover the field may have been chosen to fit the talent of
the individual. Some Mathematicians are natural analysts (good at
approximations), some are algebraically minded (manipulations of formu-
las), some are inspired by applications, and some have well developed
geometrical intuitions. However, current specializations are much more
specific than these varieties of talent.
Some specialities dry up, but not the mainstream of Mathematics. The
voice of authority recommends work in the "mainstream"-a handy but
inexact label to cover the essential portions of our subject. Our chapters
have described some of the sources of these mainstreams: Number theory,
geometry, calculus, algebra, mechanics, and complex variables (with logic
on the side). Research in mainstream topics is likely to be relevant, but it
is also likely to be difficult-much of the streambed has already been well
raked over. The evidence suggests that it is also useful to keep an eye on
possible new sources!
In the nineteenth century, synthetic projective geometry was in fashion,
and the fashion persisted till at least 1935, when some graduate schools
still required courses in both synthetic and analytic projective geometry.
Today, no one is concerned about the difference in method, and there is
little attempt to find new theorems by either method. Today, graph theory
is in high fashion, perhaps because of uses for computers or purported
applications to social science.
Some fashions are deeper, and depend on new concepts which open
new opportunities. Thus the remarkable properties of holomorphic func-
tions of one complex variable, as revealed in the work of Cauchy and
Riemann, put the study of these functions in a central position in
Mathematics for a considerable period (say 1854-1930). Toward the end
of this period, most of the opportunities had been explored and the sub-
ject became the center of authority; a major (but narrow) objective
appeared to be that of better understanding the big Picard theorem by
finding more elementary proofs for it. By now, the emphasis has shifted,
so that there is relatively much more attention to functions of several
complex variables.
The calculus of variations was another central interest in analysis,
because the precise methods of Weierstrass made possible a careful study
of this calculus and the more general variants (such as the problem of
Bolza) of its standard problems. This emphasis died down about 1930, but
the field was rapidly reinvigorated by new ideas; first the use of topology
(Morse theory) and then the applications to optimal control.
Finite group theory is a striking example of a field developed by a new
opportunity. This field has been quite inactive, and seemed subordinate to
related fields such as algebraic groups. Then powerful new techniques
(Hall-Higman, Thompson and Feit-Thompson) appeared. These tech-
niques suddenly made it possible to imagine a determination of all finite
12. Summary 453
simple groups. This became the specific program of the whole field for the
period 1962-1982. There are many other examples of such programs of
special research-not always so successful.
This is also an example of insight: With these new ideas, such and such
should be possible. There are many smaller and more specific examples of
such insight by individual Mathematicians. There are also major exam-
ples. Riemann's introduction of Riemann surfaces; Hamiiton's use of
canonical coordinates; Galois' recognition of the use of groups; the
emphasis upon power series by Weierstrass. These are examples of the
important role of ideas.
Mathematics develops from ideas in a network of interlocking formal
systems. This accounts for the inevitable specializations of research limited
to one particular node of the network-but it also indicates that such spe-
cialization is by itself not enough. One needs also awareness of the
relevance of other related parts of the network. Understanding Mathemat-
ics goes beyond specializations.
12. Summary
Now we return to the six questions raised in the introduction, pp. 1-4.
Origin. There are many origins for Mathematics: In the practices which
develop in various human activities, leading to procedures, to ideas, and
then to formal rules; in the questions raised by the scientific study of
phenomena old and new; finally, in the human capacity to form ideas, to
extract information, and to make generalizations and abstractions. These
sources continue to supply new material for Mathematical thought.
Formalization. How are the forms in Mathematics derived? Some are sug-
gested by the facts, but they become forms only when extracted from the
facts, next considered vaguely as ideas and then finally pinned down by
meticulous definitions or axioms. Other Mathematical forms are
abstracted from more elaborate pieces of Mathematics, in order the better
to understand those pieces. It may be for this reason that simple forms,
like the forms of group theory, were discovered late in the development of
Mathematics: They required first exemplification by more concrete forms
in algebra and geometry. Mathematical forms are both discovered (within
the examples) and invented (developed by thought and ideas).
It is the formal nature of Mathematics which makes it interpersonal,
objective, and exact.
Dynamics. Mathematics involves repeated new discoveries, but there is no
one simple description of the forces driving these discoveries: Curiosity
about scientific questions, famous problems, or the wish to generalize.
What is common is perhaps the desire to understand: To understand what
that question from science really involves, or to see why that old problem,
though it appears simple, is so subtle, or to understand that related
features apparent in several different situations have a common explana-
tion. In this sense the desire to understand is the most important dynamic
for the advance of Mathematics.
Foundations. Mathematics has access to absolute rigor-because it is about
form and not about fact. However, there is no single and absolute founda-
12. Summary 455
tion for Mathematics. Any such fixed foundation would preclude the
novelty which might result from the discovery of new form. A form is any
development which proceeds by rule rather than by appeal to fact as to
meaning. Among the many contemporary discussions of the philosophy
of Mathematics we cite a number of authors: Bernays [1935], Curry
[1951], Dummett [1977], Godel [1947], Goodman [1979], Kitcher [1983],
Lehman [1979], Mac Lane [1981], Quine [1963], Resnik [1980], Robinson
[1965], Steiner [1975], Weyl [1949], Wilder [1981], and Wittgenstein [1964].
However, none of the usual systematic foundations or philosophies, as
we have listed them in the introduction, seem to us satisfactory. They may
be summarized (too briefly) as follows:
Kock, Anders [1981]. Synthetic differential geometry. London Math. Soc. Lecture
Notes Series 51. 311 pp. Cambridge and New York: Cambridge University
Press.
Landau, Edmund [1951]. Foundations of analysis. The arithmetic of whole,
rational, irrational, and complex numbers. Translated by F. Steinhardt. 134 pp.
New York: Chelsea Pub. Co.
Lang, Serge [1967]. Introduction to differentiable manifolds. 125 pp. New York:
Interscience (John Wiley & Sons].
Lehman, Hugh [1979]. Introduction to the philosophy of mathematics. 169 pp.
Totowa, New Jersey: Rownan and Littlefield.
Lightstone, A. H. and Abraham Robinson [1975]. Non-archimedean fields and
asymptotic expansions. 204 pp. North Holland Mathematical Library. Vol. 13.
Amsterdam-Oxford: North Holland Publishing Company.
Mackey, George W. [1978]. Unitary group representation in phYSiCS, probability and
number theory. 402 pp. Math lecture notes series # 55, Reading, Mass.
Mac Lane, Saunders [1963]. Homology. Die Grundlehren der Math. Wissenschaf-
ten, Vol. 114. 422 pp. Heidelberg: Springer-Verlag.
_ _ [1971]. Categories for the working mathematician, Graduate texts in
mathematics, Vol. 5. 262 pp. Heidelberg: Springer-Verlag.
_ _ [1981]. Mathematical models: a sketch for the philosophy of mathematics.
Am. Math. Monthly 88: 462-472.
Mac Lane, Saunders and Garrett Birkhoff [1979]. Algebra, 2nd ed. 586 pp. (1st ed.
1967) New York: Macmillan Publishing Co.
Massey, William S. [1967]. Algebraic topology: An introduction. 261 pp. New York:
Springer-Verlag.
Monna, A. F. [1975]. Dirichlet's principle. A mathematical comedy of errors and
its influence on the development of analysis. 138 pp. Utrecht, The Netherlands:
Oosthoek Scheltema & Holkema.
Myhill, John [1972]. What is a real number? Am. Math. Monthly 79: 748-754.
Narasimhan, R. [1985]. Complex analysis in one variable. 216 pp. Boston: Burk-
hauser.
O'Neill, Barrett [1966]. Elementary differential geometry. 411 pp. New York and
London: Academic Press.
Osgood, William Fogg [1937]. Mechanics. 495 pp. New York: The Macmillan Co.
Paige, Leigh [1928). Introduction to theoretical physics. 587 pp. New York: D. van
Nostrand.
Pars, L. A. [1965). A treatise on analytical dynamiCS. 641 pp. New York: John
Wiley & Sons.
Pontryagin, L. S. [1939). Topological groups. Translated from the Russian by
Emma Lehmer. Princeton math series vol. 2. 299 pp. Princeton, New Jersey:
Princeton University Press.
Quine, W. V. O. [1963]. From a logical point of view. Logico-philosophical essays,
2nd ed., rev. 184 pp. New York: Harper and Row.
Resnik, Michael D. [1980]. Frege and the philosophy of mathematics. 243 pp.
Ithaca, New York: Cornell University Press.
Robinson, Abraham [1965]. Formalism 64, pp. 228-246. In Proc. Internat.
Congress for Logic, Methodology, and Philosophy. Jerusalem 1964. Amsterdam:
North Holland Pub. Co.
Russell, Bertrand A. W. [1908]. Mathematical logic as based on the theory of
types. Amer. J. Math. 30: 222-262.
Shoenfield,1. R. [1975). Martin's axiom. Am. Math. Monthly 82: 610-619.
Sondheimer, Ernst and Alan Rogerson [1981). Numbers and infinity. A historical
460 Bibliography
account of mathematical concepts. 172 pp. London and New York: Cambridge
University Press.
Spivak, Michael [1965]. Calculus on manifolds. 144 pp. New York and Amster-
dam: W. A. Benjamin Inc.
Steiner, Mark [1975]. Mathematical knowledge. 164 pp. Ithaca, New York: Cornell
University Press.
Titchmarsh, E. C. [1932]. The theory of functions. 454 pp. Oxford, England: The
Clarendon Press.
Troelstra, A. S. [1972]. Choice sequences. A chapter of intuitionist mathematics.
Oxford logic guides. 170 pp. Oxford Clarendon Press.
Weyl, Hermann [1949]. Philosophy of mathematics and natural science. 311 pp.
Rev. English ed. Trans. by O. Helmer. Princeton, New Jersey: Princeton
University Press.
_ _ [1923]. Raum, Zeit, Materie. Vorlesungen uber allgemeine Relativi-
tatstheorie, 5th ed. 338 pp. Berlin: Springer-Verlag.
Whitehead, A. N. and Bertrand Russell [1910]. Principia mathematica. Vol. 1. 666
pp. 2nd ed. 1925. 674 pp. Cambridge, England: Cambridge University Press,
1925.
Wilder, Raymond L. [1981]. Mathematics as a cultural system. 182 pp. Oxford-
New York: Pergamon Press.
Wilson, Edwin B. [1912]. Advanced calculus. 566 pp. Boston: Ginn & Co.
Wittgenstein, Ludwig [1964]. Remarks on the foundation of mathematics, 2nd ed.
Edited by G. H. von Wright, R. Rhees, and G. E. M. Anscombe. Oxford, Eng-
land: Basil Blackwell.
Zermelo, Ernst. 1908]. Untersuchungen iiber die Grundlagen der Mengenlehre I.
Mathematische Annalen 85: 261-281.
List of Symbols