FIT1058 Course Notes
FIT1058 Course Notes
COURSE NOTES
June 2, 2025
ii
Monash University
Faculty of Information Technology
P R E FA C E
PREREQUiSiTES
This unit assumes successful prior study in mathematics at least to the standard of
Mathematical Methods units 3 & 4 in the Victorian Certificate of Education (VCE).
That VCE subject in turn builds on mathematics studied earlier in high school.
The specific topics from school mathematics that we make most use of in this unit
are:
iii
iv
• calculus (not a lot, but we make some use of limits and integrals, and derivatives
can give useful insight even in situations where we don’t make specific use of them).
We also make pervasive use of standard high-school algebra when working with polyno-
mials and other functions.
We review parts of some of these topics in the pre-reading (see next section), but
the pace and level still assumes prior knowledge. So it is important that you work to
fill in any gaps or hazy areas in your knowledge of these topics.
PRE-READiNG
These pre-reading sections should be read and studied before the seminars on that topic.
Some of them review work you did in school, but you should still read them, for several
reasons: they establish the notation, terminology and other conventions we will use,
which sometimes differ from those used in schools; they give some important computer
science context to the concepts discussed, which you may not be aware of even if you
have studied the concepts themselves before; and our experience is that most students
benefit from some reminders and revision of this school material anyway. Other sections
marked 𝛼 may cover new material that is so fundamental to the topic to be discussed
that reading about it before the seminar will significantly increase your ability to learn
the material and master it.
The amount of pre-reading varies, depending on the topic. Some chapters spend a lot
of their early sections reviewing concepts and topics covered in school (e.g., the chapters
on sets and functions). These have more pages of pre-reading, but reading them should
not take as long as reading completely new material. Other chapters contain material
that is almost entirely new. These have fewer pages of pre-reading, but those pages will
need to be read more slowly and carefully.
E X T R A M AT E R i A L
Sections whose numbers have superscript 𝜔 contain extra material that is beyond the
FIT1058 curriculum. For example:
This may include discussion of alternative approaches that we don’t study, or more
advanced aspects of the topic, or some historical background. The specific content
introduced in these extra sections won’t be on tests or exams in this unit. But reading
them may still be of indirect benefit, by consolidating the material studied in the other
sections.
v
ACKNOWLEDGEMENTS
Thanks very much to those who have given feedback on earlier versions of these Course
Notes, including: David Albrecht, Mathew Baker, Annalisa Calvi, Nathan Companez,
Michael Gill, Thomas Hendrey, Alexey Ignatiev, Roger Lim, Rebecca Robinson, James
Sherwood, Alejandro Stuckey de la Banda, Tham Weng Kee, Nelly Tucker, Joel Wills,
and some anonymous student reviewers.
Thanks also to the FIT1058 students, including Tiancheng Cai, Shshank Jha, Ian
Ko, Cody Lincoln, Zijing Song, Timothy Tong, Jing Yap, and Michael Zeng, who have
pointed out some errors in earlier versions, enabling us to fix them. Further error-
spotting is very welcome. We look forward to acknowledging more students in future.
CONTENTS
Preface iii
1 Sets 1
1.1𝛼 Sets and elements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2𝛼 Specifying sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3𝛼 Cardinality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.4 Sets of numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.5 Sets of strings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.6 Subsets and supersets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.7 Minimal, minimum, maximal, maximum . . . . . . . . . . . . . . . . . . 9
1.8 Counting all subsets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.9 The power set of a set . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.10 Counting subsets by size using binomial coefficients . . . . . . . . . . . . 12
1.11 Complement and set difference . . . . . . . . . . . . . . . . . . . . . . . . 17
1.12 Union and intersection . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
1.13 Symmetric difference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
1.14 Cartesian product . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
1.15 Partitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
1.16 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
2 Functions 37
2.1𝛼 Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
2.1.1𝛼 Domain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
2.1.2𝛼 Codomain & co. . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
2.1.3𝛼 Rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
𝛼
2.2 Functions in computer science . . . . . . . . . . . . . . . . . . . . . . . . 44
2.2.1𝛼 Functions from analysis . . . . . . . . . . . . . . . . . . . . . . . 44
2.2.2𝛼 Functions in mathematics and programming . . . . . . . . . . . . 44
2.3𝛼 Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
2.4𝛼 Some special functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
2.5𝛼 Functions with multiple arguments and values . . . . . . . . . . . . . . . 46
2.6 Restrictions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
2.7 Injections, surjections, bijections . . . . . . . . . . . . . . . . . . . . . . . 49
2.8 Inverse functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
2.9 Composition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
2.10 Cryptosystems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
2.11𝜔 Loose composition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
vii
viii CONTENTS
3 Proofs 85
3.1𝛼 Theorems and proofs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
3.2 Logical deduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
3.3 Proofs of existential and universal statements . . . . . . . . . . . . . . . 92
3.4 Finding proofs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
3.5 Types of proofs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
3.6 Proof by symbolic manipulation . . . . . . . . . . . . . . . . . . . . . . . 95
3.7 Proof by construction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
3.8 Proof by cases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
3.9 Proof by contradiction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
3.10 Proof by mathematical induction . . . . . . . . . . . . . . . . . . . . . . 97
3.11 Induction: more examples . . . . . . . . . . . . . . . . . . . . . . . . . . 100
3.12𝜔 Induction: extended example . . . . . . . . . . . . . . . . . . . . . . . . . 104
3.13 Mathematical induction and statistical induction . . . . . . . . . . . . . 108
3.14 Programs and proofs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
3.15 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
The raw material of computation and communication is information. This takes many
different forms, due to the great variety of things we might want to represent and the
many different ways of representing them.
In this unit and in other units in your degree, you will learn about many different
structures that are used to represent information in order to store it, communicate it or
compute with it.
We will start with sets because these are among the simplest possible information
structures. Most other structures can be defined in terms of sets, so sets are a founda-
tional topic.
Sets are used extensively to define the types of objects that we compute with. When
we work with a whole number, we say it is of Integer type because it belongs to the set
of integers and satisfies the various laws and properties of integers. When we work with
a string of characters, we might say that it is of String type because it belongs to the
set of all strings over a suitable alphabet and satisfies the properties expected of strings.
Many programming languages take careful account of the types of the objects they work
with, and sets always underlie any notion of type.
A set is just a collection of objects, without order or repetition. The objects in a set are
called its elements or members
members. To specify a set, we can just give a comma-separated
list of them between curly braces. So the following are all sets:
In the last example, the set is empty. This set, simple as it is, is so fundamental that it
has its own symbol, ∅, not to be confused with zero, 0 (which, in the early days of the
computer industry, was often written with a slash through it to distinguish it from the
letter O).
1
2 SETS
When we write a set by listing its elements in the above way, we will inevitably
list the elements in some specific order. But different orders of writing do not affect
the identity of the set. Our third set above contains three of the earliest computers
ever built, but they are not listed there in chronological order. If we wrote them in
chronological order, the set would be written
object ∈ set.
For example,
To state that an object does not belong to a set, we use ∉. For example,
We will be working with many sets that are far larger than these examples, and many
will be infinite. So it is often not practical to write out all the elements. So we need a
succinct way of specifying precisely the elements of a set. One way is to give a condition
that, in general, is either true or false, with the members of the set being precisely those
objects for which the condition is true. For example,
{𝑥 ∶ 𝑥 is even}
is the set of all even numbers. The variable 𝑥 here is simply a name for elements of
this set, so that we can talk about them. The colon, “:”, separates the name 𝑥 from the
condition on 𝑥 that must be satisfied in order for it to be an element of this set. We read
this as “the set of all 𝑥 such that 𝑥 is even”. The choice of name, 𝑥, is not important;
we could equally well write the set as
{𝑛 ∶ 𝑛 is even}
In this definition, the reader will naturally infer that the variable (𝑥 or 𝑛) represents a
whole number, since the concept of a number being even or not only applies to whole
1.2𝛼 S P E C i F Y i N G S E T S 3
numbers; it makes no sense, in general, for rational numbers or real numbers. But it
is often preferable to spell out the kind of numbers we are talking about, so that the
reader does not have to fill in any gaps in our description. In this example, we might
also want to remove any doubt in the reader’s mind as to whether we are working with
integers in general or just natural numbers. So we might rewrite our definition as
{𝑥 ∶ 𝑥 ∈ ℤ and 𝑥 is even}
{name ∶ condition},
where name is a convenient name for an arbitrary member of the set and condition is a
statement involving name which is true precisely when the object called name belongs
to the set and is false otherwise.
It is a common convention to include, in our statement of the name (before the
colon), a specification of a larger set that the object must belong to. For example, the
set of even integers could be written
{𝑥 ∈ ℤ ∶ 𝑥 is even}.
This can be read as “the set of 𝑥 in ℤ such that 𝑥 is even” or “the set of integers 𝑥 such
that 𝑥 is even”. In general we can write
It is necessary that the condition be precise and clear. To ensure this, it will often be
specified in a formal symbolic way. It is ok to use English text in the condition provided
it is used clearly and precisely. It is also important for the text to be succinct, subject
to ensuring precision and clarity.
Another way to specify a set is to give a rule by which each member is constructed.
For example, the set of even integers could be written
{2𝑛 ∶ 𝑛 ∈ ℤ}
We read this as “the set of 2𝑛 such that 𝑛 belongs to ℤ” or “the set of 2𝑛 such that 𝑛 is
an integer”. The rule is a formula for converting a named object into a member of the
set, and after the colon we give a condition that the named object must satisfy in order
for the formula to be used. Taking all objects that satisfy this condition, and applying
the formula to each one of them, must give all members of the set. In general, we can
write
{rule expressed in terms of name ∶ condition on name}.
Since the curly braces are read as “the set of”, it’s ok to write, for example, {even
integers} for the set of even integers or {people on Earth} for the set of all people on
4 SETS
Earth. This way of defining sets — using just English text between the braces — is
fine when the English is completely precise and not too long. But it should be used
with care, because of the risk of imprecision, and only works well for sets that can be
described very simply.
People sometimes describe large sets by listing a few of their elements and expecting
readers to spot the pattern and infer what the entire set is. For example, the set of even
integers might sometimes be written as
or
{… , −6, −4, −2, 0, 2, 4, 6, …}.
While this sort of description might help communicate ideas in an informal conversation,
it is not a definition of the set, since it does not precisely specify which elements are
in the set, but rather turns that task over to the reader by the use of “…”. Informal
descriptions have their place, and we will use them sometimes, but they are not formal
definitions.
1.3𝛼 CARDiNALiTY
The size or cardinality of a set is just the number of elements it has. If 𝐴 is a set, then
its size is denoted by |𝐴| or sometimes #𝐴. When a set is specified by just listing its
elements, we can determine its size by just counting those elements, which can be done
manually if the set is small enough. For the above examples, we have
Some sets are so commonly used that they have special names. We have already met ∅
which denotes the empty set. There are names for some fundamental sets of numbers:
ℕ the set of positive integers
ℕ0 the set of nonnegative integers
ℤ the set of all integers
ℚ the set of rational numbers
ℝ the set of real numbers
Usually, when we work with these fundamental number sets, we are not only interested
in them as plain sets: we may also be interested in the natural order they have (with
≤), and in some operations we can do with their elements (like +, −, × and more). So,
the symbol ℤ stands for the set of integers (as above), but it is also used to represent
that same set together with some selection of operations that we are interested in at
the time. We will not dwell on this point further; it would be too fussy to start using
different names for a number set depending on what operations on it were being used
at the time.
To restrict any of these sets to only its positive or negative members, we can use
superscript + or −. So ℤ+ is another way of denoting ℕ, and ℝ− is the set of negative
real numbers. To denote the set of nonnegative members of one of these sets of num-
bers, we combine superscript + with subscript 0, as in ℝ+ 0 for the set of nonnegative
real numbers (since the nonnegative real numbers are just the positive real numbers
together with zero). Similarly, ℚ− 0 is the set of nonpositive rational numbers.
For intervals of real numbers, there is some standard notation to indicate which, if
any, of the two endpoints of the interval are included:
notation definition terminology
[𝑎, 𝑏] {𝑥 ∈ ℝ ∶ 𝑎 ≤ 𝑥 ≤ 𝑏} closed interval
[𝑎, 𝑏) {𝑥 ∈ ℝ ∶ 𝑎 ≤ 𝑥 < 𝑏}
half-open half-closed interval
(𝑎, 𝑏] {𝑥 ∈ ℝ ∶ 𝑎 < 𝑥 ≤ 𝑏}
(𝑎, 𝑏) {𝑥 ∈ ℝ ∶ 𝑎 < 𝑥 < 𝑏} open interval
Sometimes we want to restrict the contents of the interval to one of our other special
sets of numbers. We will indicate this using a subscript on the interval notation. For
example, if we only want integers within the interval [𝑎, 𝑏], we write [𝑎, 𝑏]ℤ , which is an
abbreviation for [𝑎, 𝑏] ∩ ℤ.
For any alphabet 𝐴, the set 𝐴 0 is the set containing just the empty string: 𝐴 0 = {𝜀}.
This is not to be confused with the empty set!
How many strings of length 𝑘 over the alphabet 𝐴 are there? In other words, what
is |𝐴 𝑘 |? If 𝑘 = 1, then we are just counting strings of length 1, which is the same as the
number of letters in the alphabet, so |𝐴 1 | = |𝐴|. If 𝑘 = 2, then we are counting strings
of length 2. For each possible first letter, we have |𝐴| choices for the second letter, since
there is no restriction on the second letter. Since each choice of first letter gives the
same number of choices for the second letter, and since there are |𝐴| choices for the
first letter, we find that the number of strings of length 2 is |𝐴| × |𝐴| = |𝐴|2 . If 𝑘 = 3,
then we have |𝐴|2 choices for the first two letters (as we just saw), with each such choice
followed by |𝐴| choices for the third letter, with this number being independent of the
1.6 S U B S E T S A N D S U P E R S E T S 7
choice we made for the first two letters. So the total number of strings of length 3 is
|𝐴|2 × |𝐴| = |𝐴|3 . This reasoning extends to any value of 𝑘. So the number of strings
over 𝐴 of length 𝑘 is given by
|𝐴 𝑘 | = |𝐴|𝑘 .
We also write 𝐴 ∗ for the set of all finite strings (of all possible lengths) over the
alphabet 𝐴. This is always an infinite set (provided 𝐴 ≠ ∅). For 𝐴 = {0, 1}, we give a
few of its smallest members:
𝐴 ∗ = {𝜀, 0, 1, 00, 01, 10, 11, 000, 001, 010, 011, 100, …}.
A subset 𝐴 of a set 𝐵 is a set 𝐴 with the property that every element of 𝐴 is also an
element of 𝐵. We write 𝐴 ⊆ 𝐵. For example,
𝑥 ∈ 𝐴 ⇒ 𝑥 ∈ 𝐵.
8 SETS
Figure 1.1: 𝐴 ⊆ 𝐵.
membership of 𝐴 ⇐ membership of 𝐵
If we have both 𝐴 ⊆ 𝐵 and 𝐴 ⊇ 𝐵, then the two sets must actually be identical:
𝐴 = 𝐵. The converse certainly holds too: if 𝐴 = 𝐵 then 𝐴 ⊆ 𝐵 and 𝐴 ⊇ 𝐵. This
suggests a way of proving that two sets 𝐴 and 𝐵 are equal: prove that each is a subset
of the other. So the task of proving set equality is broken down into two subtasks, each
requiring proof of a subset relationship, which is usually easier to prove than equality.
In fact, this is a very common strategy for proving set equality.
We can think of 𝐴 ⊆ 𝐵 and 𝐴 ⊇ 𝐵 as giving logical implication in both directions:
membership of 𝐴 implies membership of 𝐵 (because 𝐴 ⊆ 𝐵), and is implied by mem-
bership of 𝐵 (because 𝐴 ⊇ 𝐵). More succinctly: membership of 𝐴 is equivalent to
membership of 𝐵, just as we would expect because 𝐴 = 𝐵 here. Because we have impli-
cation in both directions, ⇒ and ⇐, it is convenient to put them together in a single
symbol, ⇔, which means that implication goes in both directions:
membership of 𝐴 ⇔ membership of 𝐵.
• the “if
if part”, saying that “𝑥 ∈ 𝐴 if 𝑥 ∈ 𝐵”, which means (writing the membership
statements the other way round) that “if 𝑥 ∈ 𝐵 then 𝑥 ∈ 𝐴”, or equivalently, “𝑥 ∈
𝐵 ⇒ 𝑥 ∈ 𝐴”, or equivalently, “𝑥 ∈ 𝐴 ⇐ 𝑥 ∈ 𝐵”;
• the “only
only if part”, saying that “𝑥 ∈ 𝐴 only if 𝑥 ∈ 𝐵”, which means that “if 𝑥 ∈ 𝐴
then 𝑥 ∈ 𝐵”, or equivalently, “𝑥 ∈ 𝐴 ⇒ 𝑥 ∈ 𝐵”.
Suppose we have a set and we are interested in those subsets of it that have some specific
property. For example, let 𝐵 be a set of people. A clique in 𝐵 is a set of people who
all know each other. In other words, it’s a subset 𝐴 ⊆ 𝐵 such that, for each 𝑥, 𝑦 ∈ 𝐴,
person 𝑥 and person 𝑦 know each other. Given a set of people and their social links, we
may wonder how “cliquey” they can be. To help us describe “peak cliques”, we make a
precise distinction between the adjectives “maximum” and “maximal”.
• A maximal clique is a clique that is not a proper subset of any other clique.
10 SETS
Observe that these are different concepts, although they are related. Consider carefully
the second of these, the concept of a maximal clique. Such a clique is not necessarily
as large as the largest possible clique in 𝐵 (although it might be). If 𝐴 is a “maximal
clique”, then it’s a clique with the extra property that, if we add any other person in
𝐵 to the set, it’s no longer a clique: that new person will be a stranger to at least one
person already in 𝐴. So 𝐴 cannot be enlarged while preserving the clique property. But
that does not mean it is as large as any clique in 𝐵 can be. There may be other quite
different cliques that are even larger than 𝐴. So a maximal clique may be smaller in size
than a maximum clique.
On the other hand, a maximum clique is also a maximal clique. A clique that is
largest, in size, among all possible cliques in 𝐵 cannot possibly be enlarged; it cannot
possibly be a proper subset of another clique, because then the latter clique would be
larger in size than the former one.
So,
maximum ⟹ maximal.
The reverse implication does not hold in general. (Typically, there are maximal cliques
that are not maximum cliques. See if you can construct an example social network
where this happens. But there do exist unusual situations where every maximal clique
is maximum; can you construct one?)
We make this distinction between the meanings of “maximum” and “maximal” when-
ever we are talking about subsets with some property.
• A maximum subset with the property has largest size among all subsets with
the property.
• A maximal subset is a subset with the property that is not a proper subset of
any other subset with the property. In other words, it cannot be enlarged while
still maintaining the property.
• A minimum subset with some property has smallest size among all subsets with
the property.
• A minimal subset with some property is a subset with the property that is not
a proper superset of any other subset with the property. So, no proper subset
has the property. In other words, if we remove anything from it, the property no
longer holds.
In many situations in life, and especially if we are just talking about real numbers
(rather than sets), this distinction between “maximum” and “maximal” is unnecessary
(and likewise for “minimum” and “minimal”), and the terms are often treated as synonyms.
What is the maximum numerical score you have ever made in your favourite game? You
could replace “maximum” by “maximal” in this sentence, with no ambiguity (though it
1.8 C O U N T i N G A L L S U B S E T S 11
would be less common wording in practice).1 This is because real numbers are totally
ordered; for every pair 𝑥, 𝑦 ∈ ℝ, if 𝑥 ≠ 𝑦 then either 𝑥 < 𝑦 or 𝑦 < 𝑥. So, if a number
has some property and cannot be increased while maintaining that property (i.e., it’s
maximal), then it’s also the largest number with that property (i.e., it’s maximum).
But the subset relation is different to the kind of order relation we are used to for
real numbers. The subset relation does not give a total ordering; you can have two
different sets 𝐴 and 𝐵 that are incomparable in the sense that 𝐴 ⊈ 𝐵 and 𝐵 ⊈ 𝐴, i.e.,
neither is a subset of the other. Such incomparability cannot occur among real numbers.
But now that we are working with subsets, the terms “maximal” and “minimal” must
be used with care, both in reading and writing. Unfortunately, they are often confused,
even in technical publications in situations where the distinction matters.
From now on, we will mostly drop the underlining when using “maximum”, “maximal”,
“minimum” and “minimal”. But be observant about which suffix, -um or -al, is being
used, and what the usage implies.
If 𝐵 is a finite set, with |𝐵| = 𝑛 say, how many subsets does it have? A subset of 𝐵 is
determined by a choice, for each element of 𝐵, of whether or not to include it in the
subset. Now, 𝐵 has 𝑛 elements, and for each of these we have two choices. These choices
are independent, in the sense that making a choice for one element puts no restrictions
whatsoever on the choices we may make for other elements. So the total number of
choices we make is
2×2×2×⋯⋯⋯×2×2
for each element of 𝐵,
choose between two options
which is just 2|𝐵| = 2𝑛 . This tells us that the number of subsets of a set grows very
quickly — in fact, grows exponentially — as the size of the set increases.
The power set of a set 𝐵 is the set of all subsets of 𝐵. We denote it by 𝒫(𝐵). The
observations of the previous paragraph tell us that
|𝒫(𝐵)| = 2𝑛 . (1.1)
1 So, although a maximal clique is not necessarily a maximum clique, a maximal size clique is in-
deed just a maximum size clique. This is because, in “maximal/maximum size clique”, the adjective
“maximal/maximum” is applied to the size, which is a number (and therefore part of a total order), rather
than to the set itself. Nonetheless, we will avoid applying the term “maximal” to sizes and other numbers,
since there we can use “maximum” which is more common.
12 SETS
This is true even if 𝐵 is empty, when 𝑛 = 0 and 2𝑛 = 20 = 1, in keeping with the fact that
∅ has one subset, namely itself. This expression for |𝒫(𝐵)| explains the term “power
set”.
In algorithm design, we often need to find the “best” among all subsets of a set.
Consider, for example, some social network analysis tasks, where we have a set of people
and a set of pairs that know each other. Questions we might ask include: What is
the largest clique, i.e., the largest set of people who all know each other? What is the
largest set of mutual strangers? What is the smallest set of people who collectively
know everyone? We could, in principle, solve these problems by examining all subsets
of the set of people, or in other words, all members of its power set, provided we can
easily determine, for each subset, whether or not it has the property we are interested
in (being a clique, etc.). However, for reasonably large 𝑛, the number of sets to examine
is prohibitive and the search would take too long. So we need to find smarter methods
where we use the properties of networks and of the structures we are interested in to
solve the problem without examining every single subset.
The power set of 𝐵 is also often denoted by 2𝐵 .
Sometimes we are focused on subsets of a specific size 𝑘. How many subsets of size
𝑘 does a set 𝐵 of size 𝑛 have? This quantity is denoted by a binomial coefficient,
coefficient
written
𝑛
⒧ ⒭
𝑘
and read as “𝑛 choose 𝑘” because we are interested in choosing 𝑘 elements from 𝑛
available elements. Between them, the binomial coefficients (taken over the full range of
subset sizes, 𝑘 = 0, 1, 2, … , 𝑛) count every subset of 𝐵 exactly once, so we already have
𝑛 𝑛 𝑛 𝑛
⒧ ⒭+⒧ ⒭+⋯+⒧ ⒭ + ⒧ ⒭ = 2𝑛 .
0 1 𝑛−1 𝑛
This is an important and useful fact, but it does not yet give us a method for working
out ⒧𝑛𝑘⒭. We now consider how to work this out.
We start with some simple cases. If 𝑘 = 0, then we are choosing no elements at all,
and this can be done in just one way, by doing nothing. (In this context, there’s only
one way to do nothing!) So, for all 𝑛,
𝑛
⒧ ⒭ = 1.
0
1.10 C O U N T i N G S U B S E T S B Y S i Z E U S i N G B i N O M i A L C O E F F i C i E N T S 13
At the other extreme, if 𝑘 = 𝑛, then we choose all elements. Again, this can be done in
only one way, because for each element of our set, we have no choice but to take it. So
𝑛
⒧ ⒭ = 1.
𝑛
Now suppose 𝑘 = 1. We choose just one element from 𝑛 elements, so we have 𝑛 options:
𝑛
⒧ ⒭ = 𝑛.
1
What about 𝑘 = 𝑛 − 1? This time, we are choosing one element not to include in our
subset; once that choice is made, everything else is determined. So, again, we have 𝑛
options:
𝑛
⒧ ⒭ = 𝑛.
𝑛−1
The symmetry we have seen here — firstly between 𝑘 = 0 and 𝑘 = 𝑛, and then between
𝑘 = 1 and 𝑘 = 𝑛−1 — is more general. To see this, observe that deciding which elements
are included also determines which elements are excluded, and vice versa. The number
of ways of choosing 𝑘 elements to include in our subset is the same as the number of
ways of choosing 𝑘 elements to exclude from our subset, which in turn is just the number
of ways of choosing 𝑛 − 𝑘 elements to include. Therefore we have
𝑛 𝑛
⒧ ⒭=⒧ ⒭. (1.2)
𝑘 𝑛−𝑘
When 𝑘 = 𝑛 we are just asking for the number of ways in which 𝑛 elements can all be
chosen in order, and that is just the factorial of 𝑛, written 𝑛! and defined by
𝑛! = 𝑛 ⋅ (𝑛 − 1) ⋅ (𝑛 − 2) ⋅ ⋯ ⋅ 3 ⋅ 2 ⋅ 1.
14 SETS
𝑛!
# ways to choose 𝑘 elements in order = . (1.4)
(𝑛 − 𝑘)!
Compare the number of arithmetic operations in each of these expressions, (1.3) and
(1.4). It will be evident that the first expression, (1.3), is more efficient. The second is
still important in understanding and using these counting problems, though.
We now return to our main aim of counting the unordered choices of 𝑘 elements
from 𝑛 elements. Our ordered counting above will count each subset of size 𝑘 some
number of times. In fact, our sequence of choices was designed to count every possible
ordering of the 𝑘 elements exactly once. How many orderings are there? Since we drew
these elements from a set (namely 𝐵), and each element of 𝐵 is chosen at most once, all
these chosen elements must be distinct. So there is no possibility of any of them looking
identical to each other. So there are 𝑘! ways to order the 𝑘 elements, and therefore
each subset of 𝑘 elements gets counted 𝑘! times by this process. Since this overcounting
factor 𝑘! is the same for all subsets of size 𝑘, we have
It follows that
Again, compare the number of arithmetic operations in the expressions (1.5) and
(1.6), and consider which would be more efficient for computation. It is also worth
thinking about the order in which the various multiplications and divisions are done.
It makes no difference mathematically, but on a computer the order of operations can
affect the accuracy of the result, because of limitations on the sizes and precision of
numbers stored in the computer. In particular, the calculation works better, in general,
if intermediate numbers used during the computation are not too large or small in
magnitude. So, how can the computation be organised to best keep the sizes of those
intermediate numbers under control?
1.10 C O U N T i N G S U B S E T S B Y S i Z E U S i N G B i N O M i A L C O E F F i C i E N T S 15
A couple of special cases deserve special treatment because of their ubiquity in the
analysis of algorithms and data structures.
𝑛 𝑛(𝑛 − 1)
⒧ ⒭ = ,
2 2
𝑛 𝑛(𝑛 − 1)(𝑛 − 2)
⒧ ⒭ = .
3 6
Counting subsets of a given size can also be done recursively. A recursive method
for doing a task is one based on breaking the task down into simpler tasks of the same
type. In this case, our task is to count the subsets, of a given size, in a given set. How
can we reduce this to simpler subset-counting tasks?
Consider again our set 𝐵 of size 𝑛 and suppose we want to determine the number
⒧𝑛𝑘⒭ of 𝑘-element subsets of 𝐵. Let 𝑏 ∈ 𝐵. We divide the 𝑘-element subsets of 𝐵 into
those that include 𝑏 and those that do not. How many of each kind do we have?
Let’s work through an example. Suppose 𝐵 = {1, 2, 3, 4, 5}, so 𝑛 = 5, and 𝑘 = 3. So
we want the number ⒧53⒭ of 3-element subsets of 𝐵. (This example is small enough that
you can just list these by hand, so please do so! It will be a handy check on what we
are about to do.) Pick 𝑏 ∈ 𝐵, say 𝑏 = 1. Some 3-element subsets of 𝐵 include 1, others
do not. The point is that
• Observe that choosing a 3-element subset that includes 1 is really just choosing
the rest of the subset that isn’t 1, and we need exactly two of those non-1 elements
to make up three elements altogether. So, counting 3-element subsets that include
1 is the same as counting 2-element subsets of the four-element set {2, 3, 4, 5}. So
4
# 3-element subsets that include 1 = ⒧ ⒭.
2
• Observe that choosing a 3-element subset that does not include 1 is really just
choosing three elements from among the non-1 elements. So, counting 3-element
subsets that don’t include 1 is the same as counting 3-element subsets of the four-
element set {2, 3, 4, 5}. So
4
# 3-element subsets that don’t include 1 = ⒧ ⒭.
3
16 SETS
So
5
⒧ ⒭ = total # 3-element subsets of 𝐵
3
= # 3-element subsets that include 1
+ # 3-element subsets that do not include 1
4 4
= ⒧ ⒭ + ⒧ ⒭.
2 3
• For those 𝑘-element subsets that do not include 𝑏, we choose all 𝑘 elements for our
subset from among all elements of 𝐵 other than 𝑏. So we now choose 𝑘 elements
from 𝑛 − 1 available elements. This can be done in ⒧𝑛−1
𝑘 ⒭ ways.
The total number of 𝑘-element subsets is obtained by adding these two quantities to-
gether. So we have
𝑛 𝑛−1 𝑛−1
⒧ ⒭=⒧ ⒭+⒧ ⒭. (1.7)
𝑘 𝑘−1 𝑘
So we can compute ⒧𝑛𝑘⒭ by doing two simpler computations of the same type (each with
𝑛−1 instead of 𝑛) and adding the results. Those two simpler computations can, in turn,
be done in terms of other even simpler computations (with 𝑛−2), and so on. Eventually,
the numbers get so small that we can use the simple cases 𝑘 = 0 and 𝑘 = 𝑛, which are
so simple that they can be solved without reducing them any further. We call these
the base cases:
cases they sit at the “base” of the whole reduction process, ensuring that the
process does stop eventually, instead of just “descending forever”.
It is worth comparing this method of computing ⒧𝑛𝑘⒭ with direct computation using
(1.5) or (1.6).
This recursive method is especially useful when you want to compute ⒧𝑛𝑘⒭ for all 𝑛
and 𝑘 up to some limits. We start with base cases ⒧𝑛0⒭ = ⒧𝑛𝑛⒭ = 1. The simplest case not
covered by these is ⒧12⒭, and applying (1.7) gives ⒧12⒭ = ⒧10⒭ + ⒧11⒭ = 1 + 1 = 2. The next
simplest cases are ⒧13⒭ and ⒧32⒭. For the first of these, (1.7) gives ⒧13⒭ = ⒧20⒭+⒧12⒭ = 1+2 = 3;
the second can be computed similarly, or even better, we can use symmetry: ⒧32⒭ = ⒧3−2 3
⒭,
by (1.2), which we just calculated to be 3. Similar calculations for 𝑛 = 4, using the
values we have just worked out, give ⒧14⒭ = 4, ⒧42⒭ = 6, and ⒧43⒭ = 4. And so on.
We can visualise the relation (1.7) using Pascal’s triangle,
triangle shown symbolically in
Figure 1.2a and with some actual values in Figure 1.2b. The binomial coefficients ⒧𝑛𝑘⒭
are arranged so that each one is the sum of the two immediately above it. In general,
⒧𝑛𝑘⒭ has ⒧𝑛−1 𝑛−1 𝑛−1
𝑘−1 ⒭ and ⒧ 𝑘 ⒭ just above it, in that order, with ⒧ 𝑘−1 ⒭ to its upper left and
1.11 C O M P L E M E N T A N D S E T D i F F E R E N C E 17
⒧00⒭ 1
⒧10⒭ ⒧11⒭ 1 1
(a) (b)
⒧𝑛−1 𝑛
𝑘 ⒭ to upper right, and we saw in (1.7) that adding these two gives ⒧ 𝑘 ⒭. To take a
specific example, consider ⒧52⒭ in Figure 1.2a. We know from (1.7) that ⒧52⒭ = ⒧14⒭ + ⒧42⒭,
and we see in the triangular array in Figure 1.2a that ⒧14⒭ and ⒧42⒭ sit just above ⒧52⒭. The
actual values ⒧52⒭ = 10, ⒧14⒭ = 4 and ⒧42⒭ = 6 are shown in the corresponding positions in
the triangular array in Figure 1.2b. The equation ⒧52⒭ = ⒧14⒭ + ⒧42⒭ becomes 10 = 4 + 6.
Often, the sets we are discussing may all be subsets of some universal set, set also called
the universe of discourse or simply the universe
universe.
For example, if we are working with various sets of integers (such as the even integers,
or the odd integers, or the negative integers, or the primes), then the set ℤ of all integers
can be used as the universal set. If we are working with sets of strings over the English
alphabet 𝐴 (such as the set of nouns, or the set of three-letter strings, or the set of
names in the FIT1058 class list), then the set 𝐴 ∗ of all strings over that alphabet may
be a suitable universal set.
Suppose 𝐴 is any set and 𝑈 is some universal set, so that 𝐴 ⊆ 𝑈. Then the
complement of 𝐴, denoted by 𝐴, is the set of all elements of 𝑈 that are not in 𝐴.
See Figure 1.3.
The notation 𝐴 has the shortcoming that it does not include the universal set 𝑈,
even though the definition depends on 𝑈. This is ok if the universal set has been clearly
stated earlier or is clear from the context. But there is alternative notation that makes
the dependence on 𝑈 clear. We write 𝑈 ∖ 𝐴 for everything (in the universal set) that is
not in 𝐴. So 𝐴 = 𝑈 ∖ 𝐴.
18 SETS
𝐵
𝐴
𝐴 = 𝐴,
𝑈 ∖ (𝑈 ∖ 𝐴) = 𝐴.
The operation ∖ is called set difference and can be used between any two sets. So,
if 𝐴 and 𝐵 are any sets, then 𝐵 ∖ 𝐴 is the set of elements of 𝐵 that are not in 𝐴:
𝐵 ∖ 𝐴 = {𝑥 ∈ 𝐵 ∶ 𝑥 ∉ 𝐴}.
𝐵
𝐴
In the special case when 𝑈 is the universal set, this equation is just (1.8).
The size of the set difference does not satisfy (1.9) unless 𝐴 ⊆ 𝐵. Why is this?
How would you modify eq:size-of-set-difference-when-A-subset-B so that it covers any
set difference 𝐵 ∖ 𝐴? What extra information about the sets would you need, in order
to determine |𝐵 ∖ 𝐴|?
𝐴 ⊆ 𝐵 ⟺ 𝐴 ⊇ 𝐵. (1.10)
This gives us another approach to proving that 𝐴 ⊆ 𝐵 (as well as the approach described
on p. 8 in § 1.6). Instead of taking a general member 𝑥 of 𝐴 and proving that it also
belongs to 𝐵, we could take a general nonmember of 𝐵 and prove that it also does not
belong to 𝐴. In other words, we show that, every time the condition for membership of
𝐵 is violated, then the condition for membership of 𝐴 must be violated too.
The union 𝐴 ∪ 𝐵 of two sets 𝐴 and 𝐵 is the set of all elements that belong to at least
one of the two sets:
𝐴 ∪ 𝐵 = {𝑥 ∶ 𝑥 ∈ 𝐴 or 𝑥 ∈ 𝐵}. (1.11)
The “or” here is inclusive in the sense that it includes the possibility that 𝑥 ∈ 𝐴 and
𝑥 ∈ 𝐵 are both true. This is how we will use the word “or” in set definitions and logical
statements, unless stated otherwise at the time.
The union is illustrated in Figure 1.5.
The intersection 𝐴 ∩ 𝐵 of two sets 𝐴 and 𝐵 is the set of all elements that belong
to both the two sets:
𝐴 ∩ 𝐵 = {𝑥 ∶ 𝑥 ∈ 𝐴 and 𝑥 ∈ 𝐵}. (1.12)
See Figure 1.6.
20 SETS
𝐵
𝐴
𝐵 ∖ 𝐴 = 𝐴 ∩ 𝐵.
When we count all the elements of 𝐴 and all the elements of 𝐵, we are counting
everything in either set except that everything in both sets is counted twice. Therefore
This means that, if we know |𝐴| and |𝐵|, then knowing either one of |𝐴 ∪ 𝐵| and |𝐴 ∩ 𝐵|
will enable us to determine the other.
𝐴 ∪ 𝐵, if 𝐴 ∩ 𝐵 = ∅;
𝐴 ⊔𝐵 =
undefined, otherwise.
See Figure 1.7. There are some alternative symbols for disjoint union, the most common
being obtained from the ordinary union symbol by placing a dot over it or + inside it: ∪̇
and ⊎.
When the disjoint union is defined, its size is just the sum of the sizes of the sets:
We will use disjoint union occasionally, but mostly will focus on the normal, and more
general, union.
The complement of the union of two sets is the intersection of their complements.
1.12 U N i O N A N D i N T E R S E C T i O N 21
𝐵
𝐴
Figure 1.7: The disjoint union 𝐴 ⊔ 𝐵, shaded. It is only defined when 𝐴 and 𝐵 are disjoint.
Theorem 1.
1
𝐴 ∪ 𝐵 = 𝐴 ∩ 𝐵.
Proof.
𝑥 ∈ 𝐴 ∪𝐵 ⟺ 𝑥 ∉ 𝐴 ∪𝐵
⟺ 𝑥 ∉ 𝐴 and 𝑥 ∉ 𝐵
⟺ 𝑥 ∈ 𝐴 and 𝑥 ∈ 𝐵
⟺ 𝑥 ∈ 𝐴 ∩𝐵
Similarly, the complement of the intersection of two sets is the union of their com-
plements. We could prove this in a similar way, but we can prove it even more easily
using Theorem 1.
Corollary 2.
2
𝐴 ∩ 𝐵 = 𝐴 ∪ 𝐵.
Proof.
𝐴 ∩𝐵 = 𝐴 ∩𝐵
= 𝐴 ∪𝐵 (by Theorem 1)
= 𝐴 ∪ 𝐵.
Theorem 1 and Corollary 2 are known as De Morgan’s Laws for Sets. Sets They de-
scribe a duality between union and intersection. We will meet a similar duality later,
when studying logic.
22 SETS
𝐵
𝐴
𝐵 𝐵
𝐴 𝐴
𝐶 𝐶
Figure 1.8: 𝐴 ∩ (𝐵 ∪ 𝐶) (top); compare with 𝐴 ∩ 𝐵 (left) and 𝐴 ∩ 𝐶 (right), and observe that
𝐴 ∩ (𝐵 ∪ 𝐶) = (𝐴 ∩ 𝐵) ∪ (𝐴 ∩ 𝐶).
How do union and intersection interact with each other? If we take the union of
two sets, and then the intersection with a third, what happens? What about taking an
intersection first, then a union?
Consider 𝐴 ∩ (𝐵 ∪ 𝐶), shown in Figure 1.8. It is evident from the Venn diagrams
that this is the same as (𝐴 ∩ 𝐵) ∪ (𝐴 ∩ 𝐶).
Now consider 𝐴 ∪ (𝐵 ∩ 𝐶). It is a good exercise to draw Venn diagrams to show how
this relates to (𝐴 ∩ 𝐵) ∪ (𝐴 ∩ 𝐶).
In summary, we have the following.
Theorem 3.
3 For any sets 𝐴, 𝐵 and 𝐶,
𝐴 ∩ (𝐵 ∪ 𝐶) = (𝐴 ∩ 𝐵) ∪ (𝐴 ∩ 𝐶), (1.15)
𝐴 ∪ (𝐵 ∩ 𝐶) = (𝐴 ∪ 𝐵) ∩ (𝐴 ∪ 𝐶). (1.16)
□
1.13 S Y M M E T R i C D i F F E R E N C E 23
𝐵
𝐴
Equations (1.15) and (1.16) are known as the Distributive Laws for sets.sets The
first law, (1.15), is sometimes described as saying that “intersection distributes over
union”. This means that, when taking the intersection of 𝐴 with a union of several
other sets, we can “distribute” the intersection among those other sets, taking all the
intersections separately, and then take the union. Similarly, the second law, (1.16), is
sometimes described as saying that “union distributes over intersection”. We will meet
very similar Distributive Laws later, in logic. As for De Morgan’s Laws, the algebra of
sets will be seen to mirror the algebra of logic.
Although this may be a new Distribute Law for you, the notion of a Distributive
Law should be familiar. You already know a Distributive Law for numbers. For any
real numbers 𝑎, 𝑏, 𝑐, we have
𝑎 × (𝑏 + 𝑐) = (𝑎 × 𝑏) + (𝑎 × 𝑐).
So multiplication distributes over addition. But, for numbers, addition does not dis-
tribute over multiplication: in general,
𝑎 + (𝑏 × 𝑐) ≠ (𝑎 + 𝑏) × (𝑎 + 𝑐).
(There are some cases where equality just happens to hold here, but they are atypical
and very rare.) So it is refreshing to work with sets, where the two operations are
distributive in all possible ways!
The symmetric difference 𝐴△𝐵 of 𝐴 and 𝐵 is the set of elements that are in exactly
one of 𝐴 and 𝐵.
𝐴△𝐵 = {𝑥 ∶ 𝑥 ∈ 𝐴 or 𝑥 ∈ 𝐵 but not both}.
The “or” here is now exclusive in the sense that the possibility of belonging to both sets
is excluded. See Figure 1.9.
24 SETS
There are other ways of writing the symmetric difference in terms of our other
operations.
𝐴△𝐵 = (𝐴 ∖ 𝐵) ∪ (𝐵 ∖ 𝐴) (1.17)
= (𝐴 ∩ 𝐵) ∪ (𝐴 ∩ 𝐵),
𝐴△𝐵 = (𝐴 ∪ 𝐵) ∖ (𝐴 ∩ 𝐵). (1.18)
𝐴△𝐴 = ∅.
This is the only situation where the symmetric difference of two sets is empty. So the
symmetric difference enables a neat characterisation of when two sets are identical.
Theorem 4. 4 For any two sets 𝐴 and 𝐵, they are identical if and only if their symmetric
difference is empty.
Proof.
𝐴=𝐵 ⟺ 𝐴 ⊆ 𝐵 and 𝐵 ⊆ 𝐴
⟺ 𝐴 ∖ 𝐵 = ∅ and 𝐵 ∖ 𝐴 = ∅
⟺ (𝐴 ∖ 𝐵) ∪ (𝐵 ∖ 𝐴) = ∅
⟺ 𝐴△𝐵 = ∅
The symmetric difference of two sets is the same as the symmetric difference of the
complements.
Theorem 5.
5
𝐴△𝐵 = 𝐴△𝐵.
Proof.
𝐴△𝐵 = (𝐴 ∩ 𝐵) ∪ (𝐴 ∩ 𝐵)
= (𝐴 ∩ 𝐵) ∪ (𝐴 ∩ 𝐵)
= 𝐴△𝐵
1.14 C A RT E S i A N P R O D U C T
If we have two objects 𝑎 and 𝑏, then the ordered pair (𝑎, 𝑏) consists of both of them
together, in that order. You have used ordered pairs many times, for example as co-
ordinates of points in the 𝑥, 𝑦-plane, or as rows in a table with two columns.
1.14 C A RT E S i A N P R O D U C T 25
The Cartesian product 𝐴 × 𝐵 of two sets 𝐴 and 𝐵 is the set of all ordered pairs
consisting of an element of 𝐴 followed by an element of 𝐵:
𝐴 × 𝐵 = {(𝑎, 𝑏) ∶ 𝑎 ∈ 𝐴, 𝑏 ∈ 𝐵}.
So, if 𝐴 is the set of all possible values of 𝑥, and 𝐵 is the set of all possible values of
𝑦, then 𝐴 × 𝐵 is the set of all possible ordered pairs (𝑥, 𝑦) of these values.
For example, if 𝐴 = {King, Queen, Jack} and 𝐵 = {♣, ♡}, then
𝐴 × 𝐵 = { (King, ♣), (King, ♡), (Queen, ♣), (Queen, ♡), (Jack, ♣), (Jack, ♡) }.
The Cartesian product ℝ × ℝ is the set of all coordinates of points in the plane. If, for a
given community of people, 𝑃 is the set of all first (or personal) names and 𝐹 is the set
of all family names, then 𝑃 × 𝐹 is the set of all pairs (first name, family name). This
would cover all pairings of names actually used by people in that community, but would
typically include many unused pairings of names too.
If 𝐴 and 𝐵 are both finite sets, then the size of the Cartesian product is just the
product of the sizes of the two sets:
This is because we have |𝐴| possibilities for the first member of a pair, and |𝐵| possibili-
ties for the second member, and these choices are made independently of each other. In
more detail, each possibility for the first member gives |𝐵| possibilities for the second
member, so the total number of pairs is
𝐴1 × 𝐴2 × ⋯ × 𝐴𝑛 = {(𝑎1 , 𝑎2 , … , 𝑎𝑛 ) ∶ 𝑎1 ∈ 𝐴1 , 𝑎2 ∈ 𝐴2 , … , 𝑎𝑛 ∈ 𝐴𝑛 }.
26 SETS
Again, if all the sets are finite then the size of the Cartesian product is the product of
the sizes of all the sets:
If the sets 𝐴1 , … , 𝐴𝑛 are all the same, then we can use an exponent to indicate how
many of them are in the product:
𝐴𝑛 = 𝐴 ×𝐴 ×⋯×𝐴
𝑛 factors
= {(𝑎1 , 𝑎2 , … , 𝑎𝑛 ) ∶ 𝑎𝑖 ∈ 𝐴 for all 𝑖 ∈ {1, 2, … , 𝑖} }.
For example,
{0, 1}3 = {(0, 0, 0), (0, 0, 1), (0, 1, 0), (0, 1, 1), (1, 0, 0), (1, 0, 1), (1, 1, 0), (1, 1, 1)}.
In the special case when 𝐴 is an alphabet (i.e., a finite set of characters), we often write
𝑛-tuples in 𝐴 𝑛 as strings of length 𝑛. So, for the binary alphabet {0,1}, we can write
{0, 1}3 = {000, 001, 010, 011, 100, 101, 110, 111},
as we did in § 1.5.
For another example, the sets of coordinates of points in two- and three-dimensional
space are ℝ2 = ℝ × ℝ and ℝ3 = ℝ × ℝ × ℝ, respectively. These are used extensively to
model physical spaces, since the space around us is three-dimensional and we often deal
with surfaces (terrain, paper, screens) that are two-dimensional. Higher-dimensional
spaces are also useful. The set of coordinates in 𝑛-dimensional space is
ℝ𝑛 = ℝ ×ℝ×⋯×ℝ.
𝑛 axes
Spaces of more than three dimensions are hard to visualise, since the physical space we
live in is only three-dimensional. But they are very useful and powerful. Models devel-
oped by machine learning programs can require millions or even billions of dimensions.
For the two smallest exponents, we have, for any set 𝐴,
𝐴 0 = ∅, 𝐴 1 = 𝐴.
1.15 PA RT i T i O N S
This partition has two parts: {𝑎, 𝑐} and {𝑏}. These parts are each nonempty, and
disjoint, and their union is 𝐴, so the definition is satisfied. Another partition of 𝐴 is
This partition has three parts, namely the three sets {𝑎}, {𝑏}, {𝑐}. At the other extreme,
we have a partition of 𝐴 with just one part:
{ {𝑎, 𝑏, 𝑐} }.
Our set 𝐴 is small enough that we can list all its five partitions:
partition # parts
{ {a, b, c} } 1
{ {a,b}, {c} } 2
{ {a,c}, {b} } 2
{ {b,c}, {a} } 2
{ {a}, {b}, {c} } 3
There are several ways in which a collection of subsets of a set can fail to be a
partition. For our set 𝐴 = {𝑎, 𝑏, 𝑐}, the collection { {𝑎, 𝑏}, {𝑐}, ∅ } fails because one of
its members is empty. The collection { {𝑎, 𝑏}, {𝑏, 𝑐} } fails because its members are not
all disjoint, in particular {𝑎, 𝑏} ∩ {𝑏, 𝑐} = {𝑏} ≠ ∅, so 𝑏 belongs to two members of the
collection instead of just one. The collection { {𝑎}, {𝑏} } fails because the union of the
collection’s members is not the entire set 𝐴, i.e., {𝑎} ∪ {𝑏} = {𝑎, 𝑏} ≠ 𝐴; in particular, 𝑐
does not belong to any members of this collection.
Partitions have many applications. Consider, for example, classification. Suppose
that 𝐴 is a collection of plant specimens. We would like to classify the specimens
according to their species: specimens from the same species are grouped together, while
those from different species are kept in separate groups. (Some groups may have just
one specimen, if there is no other specimen of the same species.) These groups form the
parts of a partition of 𝐴, with each part corresponding to one of the species represented
in the collection. Finding such classifications, from data obtained from specimens, is a
major topic in machine learning.
The number of partitions of a finite set grows rapidly as the size of the set increases.
set size, 𝑛 1 2 3 4 5 6 7 8 9 10
# partitions 1 2 5 15 52 203 877 4140 21147 115975
28 SETS
We can also talk about partitions of infinite sets. For example, { {even numbers}, {odd
numbers} } is a partition of the set of nonnegative integers; this partition has two parts.
Consider also the following partition of the set 𝐴 ∗ of all strings over a finite alphabet 𝐴:
{𝐴 𝑛 ∶ 𝑛 ∈ ℕ0 }.
This partition has infinitely many parts, one for each 𝑛 ∈ ℕ0 . The parts are the sets of
all strings of a given length.
Every set 𝐴 has two partitions that might be thought of as “extreme”, but in opposite
directions.
• The coarsest partition of 𝐴 is the partition { 𝐴 } which has just one part, namely
the entire set 𝐴 itself. In effect, everything in 𝐴 is “lumped together”.
• The finest partition of 𝐴 is the partition { {𝑎} ∶ 𝑎 ∈ 𝐴 } which has one part for
each element of 𝐴, and each part contains only that element. If 𝐴 is finite, then
this partition has |𝐴| parts; if 𝐴 is infinite, then this partition has infinitely many
parts. Every part of this partition is as small as a part of a partition can be. In
effect, all the elements of 𝐴 are “kept apart” from each other.
If 𝐴 has just one element, then the coarsest and finest partitions are the same, but if
𝐴 is larger then they are different. If 𝐴 has just two elements, then these are the only
partitions of 𝐴, but if 𝐴 is larger, then it has many other partitions too, with all the
others being in a sense intermediate between these two extreme partitions.
1.16 EXERCiSES
1. Why does the set difference only satisfy (1.9) when 𝐴 ⊆ 𝐵? How would you modify
(1.9) so that it covers any set difference 𝐵 ∖ 𝐴? What extra information about the sets
would you need, in order to determine |𝐵 ∖ 𝐴|?
2. We mentioned at the start of this chapter that sets are used to define types of
objects in many programming languages. For example, in C, the statement
int monthNumber;
declares that the variable monthNumber has type int. The declaration also assigns a
piece of memory to the variable, to contain the values that the variable has during the
computation. Similarly, the statement
char monthName[10];
is C’s way of declaring that the variable monthName is a string of at most 9 characters;
again, a piece of memory is allocated to it as well.
Let Int be the set of possible values for a variable of type int. Similarly, let String
be the set of possible values for a variable that is declared to be a string of at most 9
letters.
1.16 E X E R C i S E S 29
(a) The following statement creates a new type, called aNewType, for representing objects
consisting of any int followed by any string; it also sets aside consecutive pieces of
memory, so that the int is followed by the string in memory. It also declares the variable
monthBothWays to be of this type.
struct aNewType {
int year;
char monthName[10];
} monthBothWays;
Using the sets Int and String, together with a standard set operation, what set is repre-
sented by the type aNewType?
(b) The following statement creates another new type, called anotherNewType, for rep-
resenting objects that can be either an int or a string. It sets aside a piece of memory
that is large enough to contain either an int or a string; at any one time, it will contain
just one of these. It also declares the variable monthEitherWay to be of this type.
union anotherNewType {
int year;
char monthName[10];
} monthEitherWay;
Using the sets Int and String, together with a standard set operation, what set is repre-
sented by the type anotherNewType?
(a) If 𝐴 ⊆ 𝐵, what is 𝐴 ∪ 𝐵?
4. Consider the following diagrams. The one on the left shows the set {𝑎} and its
sole subset, ∅. The one on the right shows {𝑎, 𝑏} and all its subsets.
30 SETS
{𝑎, 𝑏}
{𝑎}
{𝑎} {𝑏}
• Sets of the same size are shown on the same horizontal level.
• The arrows indicate when a lower set is a subset of another set that has just one
extra element. If 𝑋 ⊂ 𝑌 and |𝑌| = |𝑋 | + 1 then there is an arrow from 𝑋 to 𝑌.
(b) For every pair of sets 𝑋 , 𝑌 such that 𝑋 ⊂ 𝑌 and |𝑌| = |𝑋 | + 1, label the correspond-
ing arrow in your diagram by the sole member of 𝑌 ∖ 𝑋 .
(c) Suppose you are now liberated from the requirement to draw your diagram on a
medium of only two dimensions such as paper or a computer screen. How could you
draw this diagram in three dimensions in a natural way?
(d) For a set of 𝑛 elements, how many sets and how many arrows does a diagram of this
type have?
(e) For each element of the 𝑛-element set considered in (d), how many arrows are la-
belled by that element (if we label them as in (c))?
(f) In such a diagram, suppose we have two sets 𝑋 , 𝑌 that satisfy 𝑋 ⊆ 𝑌. How many
directed paths are there from 𝑋 to 𝑌? Give an expression for this.
• A directed path is a path along arrows in which all arrows are directed forwards;
you can’t go backwards along an arrow. Paths are counted as different as long as
they are not identical; they are allowed to have some overlap.
1.16 E X E R C i S E S 31
(g) With 𝑋 , 𝑌 as in (f), what is the maximum number of mutually internally disjoint
paths from 𝑋 to 𝑌? Describe this
• An internal set on a path is a set on the path that is not the start or end of
the path, i.e., it’s not 𝑋 or 𝑌. Two paths are internally disjoint if no inter-
nal set on either path appears anywhere on the other path. If we have a col-
lection of some number of paths (possibly more than two), then the paths are
mutually internally disjoint if every pair of paths in the collection are inter-
nally disjoint.
(h) Explain how to use the diagram to find, for any two sets in it, their union and
intersection.
1, if 𝑒𝑖 ∈ 𝐴;
𝑏𝑖 =
0, if 𝑒𝑖 ∉ 𝐴.
Write down the characteristic string of each of the subsets of a set of three elements.
List them, one above the other, so that each differs from the one above it in just one
bit.
See if you can extend this to subsets of a set of 𝑛 elements.
This has algorithmic applications. Suppose we want to search through all subsets
of a set, by looking at each of their characteristic strings in turn. If each characteristic
string differs from its predecessor in only one bit, then moving from one characteristic
string to the next requires fewer changes than may be required otherwise, which saves
time.
6. In a Venn diagram, the closed curves representing the sets together divide the
plane into regions. A single set divides the plane — or the portion of the plane within
the rectangular box representing the universal set, if that is shown in the diagram —
into two regions, its interior and exterior. See Figure 1.3, where the regions correspond
to 𝐴 and 𝐴 (with the latter shaded in that particular diagram, as it was being used to
explain the complement, but that does not have to be done in general). Two intersecting
sets, represented by two closed curves, divide the plane into four basic regions. (See
Figure 1.4, Figure 1.5, and Figure 1.6.) If the sets are 𝐴 and 𝐵, then the basic regions
correspond to 𝐴 ∪ 𝐵, 𝐴 ∩ 𝐵, 𝐴 ∖ 𝐵 and 𝐵 ∖ 𝐴. Three sets can be drawn to divide the
plane into eight regions.
(a) Label each basic region of a Venn diagram for three sets 𝐴, 𝐵, 𝐶 with appropriate
intersections of sets.
32 SETS
(b) What is the maximum number of sets for which a general Venn diagram can be
drawn in which all the sets are circles of the same size?
(c) Draw a general Venn diagram in the plane for four sets. You can use closed curves
other than circles.
(d) How could you draw a three-dimensional general Venn diagram using four sets,
each represented as a sphere?
8. Consider the sequence of binomial coefficients ⒧𝑛𝑟⒭ with 𝑛 fixed and 𝑟 going from
0 to 𝑛:
𝑛 𝑛 𝑛 𝑛 𝑛
⒧ ⒭, ⒧ ⒭, ⒧ ⒭, … … , ⒧ ⒭, ⒧ ⒭.
0 1 2 𝑛−1 𝑛
(a) Using one of the formulas for ⒧𝑛𝑟⒭, prove that these binomial coefficients increase as
𝑟 goes from 0 to ⌊𝑛/2⌋ and then decrease as 𝑟 goes from ⌈𝑛/2⌉ to 𝑛.
• Here, ⌊𝑛/2⌋ is the “floor” of 𝑛/2, which is the greatest integer ≤ 𝑛/2. If 𝑛 is
even, this is just 𝑛/2 itself, but if 𝑛 is odd (which means 𝑛/2 is not an integer),
its floor is the integer (𝑛 − 1)/2.
Similarly, ⌈𝑛/2⌉ is the “ceiling” of 𝑛/2, which is the least integer ≥ 𝑛/2. If 𝑛 is
even, this is just 𝑛/2 again, but if 𝑛 is odd, its ceiling is the integer (𝑛 + 1)/2.
1.16 E X E R C i S E S 33
(b) Now prove that, for every positive integer 𝑟 in the range 1 ≤ 𝑟 ≤ 𝑛 − 1,
2
𝑛 𝑛 𝑛
⒧ ⒭ > ⒧ ⒭⒧ ⒭.
𝑟 𝑟 −1 𝑟 +1
This is an important property and means that the sequence is said to be strictly
log-concave. (If the inequality is just ≥ instead of >, then it’s log-concave.)
(c) Suppose you and a friend are considering a fixed set of size 𝑛. Suppose you get to
choose an ordered pair of subsets of size 𝑟 of the set of size 𝑛, with no restriction
at all (so your two sets are allowed to overlap, or be disjoint, or be identical, or
whatever; the only rule is that they both have to have size 𝑟. Suppose also that
your friend gets to choose one subset of size 𝑟 − 1 and another of size 𝑟 + 1, with
again no restriction on the sets apart from these size requirements. Who has more
options, you or your friend? Does the answer depend on 𝑛 and 𝑟 in any way? If so,
how? If not, why not? Is there any situation where the numbers of choices that you
and your friend have are the same?
11. Express |𝐴 ∪ 𝐵 ∪ 𝐶| in terms of |𝐴|, |𝐵|, |𝐶|, |𝐴 ∩ 𝐵|, |𝐴 ∩ 𝐶|, |𝐵 ∩ 𝐶|, |𝐴 ∩ 𝐵 ∩ 𝐶|.
12. Express |𝐴 ∩ 𝐵 ∩ 𝐶| in terms of |𝐴|, |𝐵|, |𝐶|, |𝐴 ∪ 𝐵|, |𝐴 ∪ 𝐶|, |𝐵 ∪ 𝐶|, |𝐴 ∪ 𝐵 ∪ 𝐶|.
So 𝑖1 gives just the sum of the sizes of the 𝑛 sets, and 𝑖2 gives the sum of the sizes of
each intersection of two of the 𝑛 sets, and so on.
34 SETS
15. Express |𝐴△𝐵| in terms of the sizes of some other sets in three different ways.
16. Draw a Venn diagram for general sets 𝐴, 𝐵, 𝐶 and shade the region(s) that form
𝐴△𝐵△𝐶.
(𝐴 ∪ 𝐵)△(𝐴 ∩ 𝐵) = 𝐴△𝐵.
18. Suppose 𝐴1 , 𝐴2 , … , 𝐴𝑛 are sets. How would you characterise 𝐴1 △𝐴2 △ ⋯ △𝐴𝑛 ?
More specifically, the members of 𝐴1 △𝐴2 △ ⋯ △𝐴𝑛 are those that satisfy some specific
condition on how many of the sets 𝐴1 , 𝐴2 , … , 𝐴𝑛 they belong to. What is that condition?
1.16 E X E R C i S E S 35
For this exercise, you don’t need to construct a formal proof that your condition is
correct.
19. If 𝐴 and 𝐵 are sets, does 𝐴 × 𝐵 equal 𝐴 × 𝐵? If so, prove it. If not: give an
example when it does not hold; characterise those cases when it does hold; and determine
what can be said about the relationship between them.
21. Suppose 𝐴, 𝐵, 𝐶 ⊆ 𝑈 are nonempty sets such that all basic regions in their general
Venn diagram are nonempty (see ??). Then the three sets 𝐴, 𝐵, 𝐶 define a partition of
𝑈 into eight parts. What are those parts? Express each part in terms of one or more of
𝐴, 𝐵, 𝐶 using set operations.
Computation helps get things done. By “things”, we mean tasks where we take some
information and transform it somehow. To specify such a task, we must specify:
For example, suppose we want to sort a list of names into alphabetical order. We
first specify exactly what kinds of lists of names we are dealing with (which alphabet?
any requirements or restrictions on the names to be considered? how long can the names
be? how many names are we allowed to have in the list? etc.). Then we specify that
we’ll end up with another list. Then we specify what we want to do to our first list,
which is to sort it. So the list we end up with has all the same names as the list we
started with, but now they are in alphabetical order.
Or suppose you want to determine the most goals kicked by any team in any season of
your favourite football competition, from records of games. We first specify exactly what
kinds of records we are working with: the information in them and how it is arranged.
Then we specify that we’ll end up with a number (to be precise, a non-negative integer).
Then we specify that this number must be the maximum, over all seasons and all teams,
of the number of goals kicked by that team in that season.
To model such tasks precisely, we use functions.
2.1𝛼 DEFiNiTiONS
A function 𝑓 consists of
• a specification, for each element 𝑥 of the domain, of a unique member 𝑓(𝑥) of the
codomain. This member, 𝑓(𝑥), is the value of the function for the argument 𝑥.
37
38 FUNCTiONS
Informally, the argument 𝑥 “goes in” and the value 𝑓(𝑥) “comes out”. These terms
have some common synonyms.
But some care is needed with the use of “input” and “output” in this context, since
in programming “input” is often used for extra information that a program reads from
another source (such as a file), while “output” is often used for information a program
writes to a file or screen, and these may be quite different to the argument and value.
Functions are ubiquitous in computer science, as well as in science, engineering,
economics, finance, and any other field where symbolic objects of some type need to be
transformed into something else in a precisely specified way.
We need to make some key points about the three parts of the definition of a func-
tion 𝑓.
2.1.1𝛼 Domain
The domain of a function 𝑓 can be any set, finite or infinite, provided that:
• Every member of the domain is considered to be a valid argument for the function.
So, for every member 𝑥 of the domain, its corresponding value 𝑓(𝑥) must be
properly defined and “make sense”.
• For everything that does not belong to the domain, the function is considered to
be undefined and 𝑓(𝑥) has no meaning.
For our Sorting example, the domain is the set of all possible lists of names of the
required type.
For the Maximum Goals example, it is the set of all possible records of all the games
in a single season. Note that the domain is not just the set of all past seasons’ records,
since we want our function to be able to determine the required information for any
possible future season as well.
Suppose now that we have a function called SumOfFourIntCubes which takes any
four integers and finds the sum of their cubes, according to the rule
SumOfFourIntCubes(𝑤, 𝑥, 𝑦, 𝑧) = 𝑤 3 + 𝑥3 + 𝑦 3 + 𝑧 3 .
Then its domain is the set of all quadruples of integers, which we can write as ℤ×ℤ×ℤ×ℤ.
We could also envisage a function SumOfFourRealCubes that takes any four real numbers
and finds the sum of their cubes. The rule looks the same:
SumOfFourRealCubes(𝑤, 𝑥, 𝑦, 𝑧) = 𝑤 3 + 𝑥3 + 𝑦 3 + 𝑧 3 .
2.1𝛼 D E F i N i T i O N S 39
But its domain is now ℝ × ℝ × ℝ × ℝ. Since this function has a different domain, it is
considered to be a different function, even though the rule looks the same.
Why are functions with the same rule considered to be different if their domains are
different? Why would we even need the function SumOfFourIntCubes when we can use
the function SumOfFourRealCubes to do everything it does and more?
There are several reasons for this.
• Firstly, rules may look the same without actually being the same, because of details
of how the operations in the rule depend on the types of objects being used. A
multiplication symbol might be used to multiply numbers and also to multiply
matrices, but they are very different operations.
• Thirdly, the purpose of the domain is not merely to help explain the rule; it is also
a promise to the function’s user that the function will work for every member of
the domain. Sometimes, you may want a larger domain so that the function works
for as many cases as possible. But bigger promises require more work to keep!
Sometimes, it is better to use a more modest domain, provided it still captures
everything that your function is supposed to work with; a more modest promise
is easier to keep!
If 𝑓 is a function, then we write dom(𝑓) for the domain of 𝑓. So
dom(SumOfFourIntCubes) = ℤ × ℤ × ℤ × ℤ,
dom(SumOfFourRealCubes) = ℝ × ℝ × ℝ × ℝ.
The codomain of a function 𝑓 must include every possible value of 𝑓(𝑥) for every
member 𝑥 of the domain. But the codomain is allowed to be “loose”, in the sense that
it is allowed to include other stuff too. We do not need to ensure that the only things
in the codomain are things we can get by applying our function 𝑓 to some member of
its domain.
So, for the codomain, instead of specifying the possible function values exactly, we
specify some superset of the set of possible function values.
The exact set of possible values of a function is called the image of the function.
This is a subset of the codomain, but not necessarily a proper subset.
40 FUNCTiONS
You may wonder at this point, why do we allow such “looseness” in the codomain
when we were so insistent that the domain be the exact set of allowed arguments of the
function? Why shouldn’t we use the image, instead of the codomain, when defining a
function?
The reason for this is practical. It is often harder to know the image than it is to
specify a natural codomain. Sometimes it’s impossible.
In our Maximum Goals example above, the codomain is the set ℕ ∪ {0}. This does
not mean that every nonnegative integer must arise as a possible maximum number of
goals kicked by any team in a season. Indeed, some numbers (e.g., 10100 ) could never
arise in this way in practice. But it may be hard or impossible to know exactly which
numbers are feasible values and which are not. So it is more practical to give a codomain,
in the form of a simple, easily-described set which we know includes all possible function
values (even if it has other things too). The set ℕ ∪ {0} works well.
In some cases, the difficulty may be computational. For example, consider the
notorious Pompous Python function which takes any positive integer 𝑛 and gives,
as its value, the length (in characters) of the longest output that can be printed by
any Python program which reads no input, eventually stops, and whose source code file
has at most 𝑛 characters.1 On the face of it, this seems painful to compute, because
there are so many programs that could be considered (once 𝑛 is large enough to allow
interesting programs of at most 𝑛 characters to be written). In fact, it’s worse than that;
it can be shown that this function is impossible to compute perfectly. (This fact is not
obvious, but is a consequence of some famous results on uncomputability from the 1930s.
Uncomputability is covered in detail in the unit FIT2014 Theory of Computation.) So it
is impossible, in a precise sense, to know exactly which numbers can be possible values
of this function. Therefore, in specifying the function, we would prefer not to have
to specify, in advance, which numbers to allow for its possible values! So, instead, we
specify a suitable codomain, and in this case ℕ ∪ {0} will do fine, even though only very
few numbers can actually be values of the Pompous Python function.
In other cases, the difficulty may be the limits of our current knowledge. Consider
again SumOfFourIntCubes. It is not yet known whether or not every integer can be
written as the sum of four cubes, so we do not know if the values taken by this function
are all integers or only some subset of them. But we can specify the codomain to be
ℤ and know that this is a superset (but not necessarily a proper superset, as far as we
currently know) of the actual set of possible values.
The codomain, then, represents a promise to users of the function that all the
values they get from it will be in that set, but it gives no guarantee that every one of
its members is an actual function value.
1 Here we envisage an ideal computing environment where arbitrarily long programs can be run, arbitrarily
long outputs can be printed, and programs can take an arbitrarily long time to run before stopping. If
a program crashes or prints no output, then we define the length of its output to be 0. If a program
runs forever, in an infinite loop, then we exclude it from consideration, regardless of how much output it
produces.
2.1𝛼 D E F i N i T i O N S 41
It is good to know the image too, when that’s possible. But we do not want our
definition of the function concept to be hampered by the difficulties that are often
associated with specifying the image. So we do not include a specification of the image
in our specification of the function. Rather, we use the codomain as the “next best
thing”.
Of course, if we do know the image, there is nothing to stop us from stating it as
our codomain. But, even then, it is often neater to specify a simple codomain than
it is to specify the image. For example, a function with domain ℕ whose value is the
𝑛-th Fibonacci number has, as its image, the set of all Fibonacci numbers, and it’s easy
enough to write that down.2 But it’s even easier to specify ℕ as a codomain.
A function in which the codomain is the same as the image of the function is said
to map its domain onto its codomain, and may be said to be onto onto. Such a function is
also said to be surjective and is called a surjection
surjection.
Finally, a word of warning! We have studiously avoided using the word “range”. This
is because the term is, confusingly, used in two different ways: sometimes, it means the
image, while at other times, it means the codomain. We will dodge this issue by not
using “range” at all.
2.1.3𝛼 Rule
The rule of a function must specify the relationship between each member of its do-
main and the corresponding value of the function. But it does not, in general, give an
algorithm for computing the function.
In other words, the rule specifies what must be done, but it does not need to specify
how it is to be done.
In our Sorting example, the rule is that the function’s value is the sorted list of
names. This rule does not specify how the sorting is to be done. As computer science
students, you will meet many different sorting algorithms, including Bucket Sort, Merge
Sort, Insertion Sort, and Quick Sort. They all have their strengths and all could be
used to compute our sorting function. But the function itself does not include a choice
of which algorithm we will use to compute it; that choice is a separate issue to the
specification of the function.
Sometimes, a function’s rule does give some information on how to compute it. For
example, consider a function that squares integers. It has domain ℤ, and we’ll use the
(loose) codomain ℤ too. Its rule is just that any integer argument is squared. This may
be thought of as a small algorithm: it tells you what the value is in such a way that
you also know how to work it out. Or do you? There is more than one algorithm for
squaring an integer! To specify the rule, we don’t need to specify which algorithm is to
2 The Fibonacci numbers are the numbers you get by starting with two consecutive 1s and then repeatedly
adding the two most recent numbers together to get the next number. So the Fibonacci sequence is:
1,1,2,3,5,8,13,21,34,55,89,144,233,….
42 FUNCTiONS
be used; we only need to give enough information so that the reader can know, for each
argument, what its corresponding value is.
A function’s rule associates, to each argument in its domain, a unique value in its
codomain. One way to specify this information is to give the set of all possible ordered
pairs (𝑥, 𝑓(𝑥)). For example, consider a function Employer, defined as follows. Its domain
is the set
{Annie Jump Cannon, Henrietta Swan Leavitt, Muriel Heagney, Winsome Bellamy},
which consists of four human computers who worked at various astronomical observato-
ries. For its codomain, we use the set of all astronomical observatories over the last two
centuries. The rule gives values to arguments as follows.
We can specify this rule by giving the following set of ordered pairs (computer, Employer(computer)).
Note that this function is not a surjection, since its image is only
{(𝑥, 𝑥2 ) ∶ 𝑥 ∈ ℤ}.
The set of all ordered pairs (𝑥, 𝑓(𝑥)) of a function 𝑓 is called the graph of the
function. This term reminds us that we often illustrate a function by drawing a plot of
all points (𝑥, 𝑓(𝑥)), with the horizontal axis containing the domain and the vertical axis
containing the codomain. We are used to referring to such a plot as a “graph” of the
function, but the term graph as just defined is more abstract: it just refers to the set of
pairs, without regard to how they might be displayed to a reader.
2.1𝛼 D E F i N i T i O N S 43
⋮
Greenwich Observatory
Annie Jump Cannon
Melbourne Observatory
Muriel Heagney
Sydney Observatory
Winsome Bellamy
Jantar Mantar
domain ⋮
codomain
Plots of functions, using horizontal and vertical axes, are convenient visual ways
to display information about the function, but they also have their limitations. Many
domains and codomains do not have an inherently one-dimensional structure, and in-
creasing the number of dimensions — say, by using a 3D plot — does not always help.
Some domains and codomains are not geometric in character at all. For example, the
domain of Employer is a set of four people, and the domain of our Sorting function is the
set of all possible lists of names, neither of which are defined in numerical or geometric
terms.
There are other ways to depict functions. For example, we could start with a Venn
diagram of the domain and codomain, draw points within the domain representing its
members, and then, for every pair (𝑥, 𝑓(𝑥)), draw an arrow from 𝑥 to 𝑓(𝑥) to indicate
that the function sends 𝑥 to 𝑓(𝑥). Because 𝑓 is a function, every point in the domain
has exactly one arrow going out of it. Our Employer function is depicted in this way in
Figure 2.1.
44 FUNCTiONS
In software development, the first task is to work out what must be done. This process
is traditionally called analysis and usually involves extensive communication with the
owner of a problem (e.g., a client) in order to come up with a precise description of the
task at hand. One possible outcome of this analysis process is a function.
At this stage, we have not yet worked out how to solve the problem at hand. But
at least we have a precise statement of the task to be done (the “what”). With this, we
can then try to design a method for doing this task (the “how”). If our task is specified
by a function, then we will design an algorithm for the function.
Once we have an algorithm, we proceed to implementation
implementation, or programming the
algorithm using a programming language such as Python.
Of course, this is a very simplified and incomplete view of software development.
The process is seldom purely linear; each step often involves going back and re-doing
parts of a previous stage. The design process might highlight problems, or gaps, in
the specification, which may require further communication with the problem owners
to sort out, leading to changes to the specification. Or the clients may simply change
their minds about some aspect of the task! The implementation process may bring to
light some problems with the design which must then be fixed. There are later stages
we haven’t mentioned, notably maintenance
maintenance. And not all problem analysis leads to
a function specification. For example, some may lead to a specification of how the
various components of some system must interact; some may lead to a specification of
a database. But functions remain a very important product of analysis, partly because
more complicated systems often contain functions as components.
Our view of functions, as specifications of what rather than how, originated in mathe-
matics although is now widespread in computer science and other disciplines. But, as
you study programming, you will find that the term is used in another way too.
In many programming languages, and even in many pseudocode conventions for
writing algorithms, a definition of a “function” has — in addition to the parts discussed
here — some code in the programming language, or an algorithm, that specifies how
the function is to be computed. In fact, the very word “function” is a reserved word in
some programming languages and has this meaning, possibly with additional technical
details.
By default, we use the term “function” in the mathematical sense, where there is no
code or algorithm given. Occasionally we may use the term “mathematical function” to
emphasise this, but even without that adjective, the term “function” will be used in this
way.
2.3𝛼 N O TAT i O N 45
If we wish to use the programming sense of the term, we will specify that explicitly,
as in “Python function” or “programming function” or “algorithmic function”.
In functional programming languages, algorithmic functions are the most funda-
mental objects used, and all computation is done by manipulating them. They can be
represented by variables and treated as both arguments and values of other functions.
We do not consider the functional programming paradigm in this unit or in FIT1045.
You can learn more about functional programming in FIT2102 Programming Paradigms.
2.3𝛼 N O TAT i O N
For example, a function 𝑓 that gives squares of real numbers can be defined by
𝑓 ∶ ℝ ⟶ ℝ,
𝑓(𝑥) = 𝑥2 .
Here, the domain is ℝ, and the codomain is ℝ too. Since we stated that our function
would give the squares of real numbers, we really have no alternative but to specify ℝ as
the domain. But we have some more flexibility with the codomain. We could have used
ℝ+0 , since that is the image of this function: the squares of real numbers are precisely
the nonnegative real numbers. We could, instead, have used ℝ ∪ {0, −√2, −42} as the
codomain, which is perfectly valid mathematically, although this codomain has some
extra detail that is irrelevant, useless, distracting, and shows poor user interface design!
The first line of these function definitions, such as 𝑓 ∶ ℝ ⟶ ℝ, is like a declaration in
a program. It announces the name of the function and specifies the types of objects that
it can take as its arguments and give as its values. The second line, such as 𝑓(𝑥) = 𝑥2 ,
completes the definition by specifying the rule. A common form of wording is to say
something like, “The function 𝑓 ∶ ℝ → ℝ is defined by 𝑓(𝑥) = 𝑥2 .”
There is another common convention for specifying the rule of a function, where we
just write
Note how the “mapping arrow” ↦ in the rule differs from the ordinary arrow → going
from domain to codomain. It is important not to mix the two arrow types up.
If we use this second convention, then our squaring function would be defined by
𝑓 ∶ ℝ ⟶ ℝ,
𝑥 ↦ 𝑥2 .
46 FUNCTiONS
The most trivial, vacuous, degenerate function of all is the empty function,
function denoted
∅. Its domain and codomain are each empty, and it has no rule because there is nothing
in the domain for any rule to apply to. It can be defined simply as ∅ ∶ ∅ → ∅. It’s pretty
useless; we may hope never to see it again! Let’s move on.
For any set 𝐴, the identity function 𝑖𝐴 on 𝐴 is defined by
𝑖𝐴 ∶ 𝐴 ⟶ 𝐴,
𝑖𝐴 (𝑥) = 𝑥.
𝜒𝐴 ∶ 𝐷 ⟶ {0, 1},
1, if 𝑥 ∈ 𝐴;
𝜒𝐴 (𝑥) =
0, if 𝑥 ∉ 𝐴;
Although indicator function notation 𝜒𝐴 only mentions 𝐴, it must be kept in mind that
a function’s definition always includes a specification of its domain, which in this case
is 𝐷. Different domains give rise to different indicator functions.
We can express the indicator functions of sets obtained from set operations on 𝐴
and 𝐵 using the indicator functions of 𝐴 and 𝐵. For example, for all 𝑥 we have
See Exercise 2.
For any domain 𝐷 and any object 𝑎, the constant function 𝑐𝑎 just maps everything
to 𝑎:
𝑐𝑎 ∶ 𝐷 ⟶ {𝑎},
𝑐𝑎 (𝑥) = 𝑎.
2.5𝛼 F U N C T i O N S W i T H M U LT i P L E A R G U M E N T S A N D VA L U E S
Many functions you will meet have multiple arguments. If 𝑓 is a function of two argu-
ments 𝑥 and 𝑦, then we write its value as 𝑓(𝑥, 𝑦). Suppose 𝑥 ∈ 𝑋 and 𝑦 ∈ 𝑌, and that
the value of the function belongs to a codomain 𝐶. Then the function definition would
start by stating 𝑓 ∶ 𝑋 × 𝑌 → 𝐶.
3 So, in a sense, it does nothing. But at least it does nothing to something, whereas the empty function
does nothing to nothing!
2.6 R E S T R i C T i O N S 47
It may seem that we are extending our definition of functions here, since we seem
to have two domains, 𝑋 and 𝑌, for the first and second argument respectively. And no
real harm can come from this view. But functions of two arguments may also be viewed
functions of a single argument where that one argument happens to be a pair (𝑥, 𝑦) and
its domain is the Cartesian product 𝑋 ×𝑌. So, when we start a function definition with
𝑓 ∶ 𝑋 × 𝑌 → 𝐶, we are still just using our usual way of defining functions. When we
write 𝑓(𝑥, 𝑦), indicating two arguments, we are using a shorthand for 𝑓((𝑥, 𝑦)), where
the argument that we put inside 𝑓(⋯) is the ordered pair (𝑥, 𝑦). In accordance with
usual practice, we will drop the second pair of parentheses from 𝑓((𝑥, 𝑦)), writing 𝑓(𝑥, 𝑦)
instead, and we will happily speak of its first argument 𝑥, its second argument 𝑦, and
so on. But keep in the back of your mind that we can also view this as a function of
just one argument whose sole argument happens to be the ordered pair (𝑥, 𝑦).
When we write 𝑓(𝑥, 𝑦) for applying function 𝑓 to arguments 𝑥 and 𝑦, we are using
prefix notation, because we put the name of the function before the arguments. This
is common practice and is the one we use when defining new functions. But there are
also many well-known functions that use infix notation, where the function name is put
between its arguments. Familiar examples include ordinary arithmetic functions +, −,
×, / and many built-in operations in many programming languages. Far less common
is postfix notation, where the name is placed after the arguments.
All these remarks extend readily to functions of three or more arguments. For
example, a function of three arguments can also be regarded as a function of a single
argument which happens to be a triple.
We also sometimes want the value of a function to be a tuple. For example, suppose
the value of a function is to be a pair (𝑦, 𝑧) where 𝑦 ∈ 𝑌 and 𝑧 ∈ 𝑍. If the domain of
the function is 𝑋 , then we may write 𝑓 ∶ 𝑋 → 𝑌 × 𝑍. This function may be regarded as
giving two values, which we find it convenient to put in an ordered pair. We also view
it as giving a single value which happens to be the ordered pair (𝑦, 𝑧) ∈ 𝑌 × 𝑍.
Again, these comments extend readily to functions that return tuples of three or
more values.
2.6 RESTRiCTiONS
Sometimes we want to restrict a function to some subset of its domain, and to treat
this restricted version of the function as a new function in its own right. For example,
consider a function that assigns ID numbers to Monash students. Its domain is the set
of all Monash students. If we want to use a function which only considers FIT1058
students, and assigns ID numbers to them, then this function is a restriction of the
previous function just to the set of all FIT1058 students.
There are several reasons why we might want to focus just on the restriction of some
function.
• If the domain is significantly smaller than the original function, then storing the
restricted function as a list of ordered pairs takes up less space.
48 FUNCTiONS
𝑓|𝑋 ∶ 𝑋 → 𝐵,
𝑓|𝑋 (𝑥) = 𝑓(𝑥) for all 𝑥 ∈ 𝑋 .
𝑓 ∶ ℝ ⟶ ℝ,
𝑓(𝑥) = 𝑥2 .
𝑓|ℝ+ ∶ ℝ+
0 ⟶ ℝ,
0
𝑓(𝑥) = 𝑥2 .
This has some properties that the original function 𝑓 does not have. For example, 𝑓|ℝ+ is
0
continually increasing as 𝑥 increases across its domain, whereas 𝑓(𝑥) is decreasing along
+
some of its domain (specifically, along ℝ− 0 ) and increases elsewhere (along ℝ0 ), so its
behaviour is a bit more complicated. Each member 𝑦 in the image of 𝑓|ℝ+ comes from a
0
unique 𝑥 in the domain ℝ+ 0 , namely the positive square root √𝑦 of 𝑦 (or 0, in the case
𝑦 = 0). By contrast, each nonzero member 𝑦 of the image of 𝑓 comes from two different
values of 𝑥, namely the two square roots ±√𝑦 of 𝑦. This illustrates the point mentioned
above that a restriction can be simpler and have stronger properties, which can make it
more useful in some situations. (Later, in § 2.8, we discuss inverse functions. Then, we
can say that 𝑓|ℝ+ is invertible but 𝑓 is not.)
0
2.7 i N j E C T i O N S , S U R j E C T i O N S , B i j E C T i O N S 49
As we have seen from some of our examples, it is perfectly ok for different function
arguments to give the same value. So it is ok for both Annie Jump Cannon and Henrietta
Swan Leavitt to have the value Harvard College Observatory under the Employer function
(p. 42 in § 2.1.3𝛼 ). There is no requirement for there to be a unique argument for
each value. This contrasts with the requirement that there be a unique value for each
argument, which is an essential property of any function. This specific Employer function
only assigns one employer to each human computer.
Although it’s ok in general for different arguments to be mapped to the same value,
there are situations where we do not want that to happen. For example, a function that
assigns an ID number to each student must ensure that different students get different
ID numbers. A function that encrypts files must ensure that different files are encrypted
differently, else the contents of a file cannot be recovered from its encrypted form.
A function with this property, that different arguments are always mapped to dif-
ferent values, is said to be injective and is called an injection
injection. Mathematically, this
property of a function 𝑓 ∶ 𝐴 ⟶ 𝐵 can be expressed as follows: for any two distinct
𝑥1 , 𝑥2 ∈ 𝐴, we have 𝑓(𝑥1 ) ≠ 𝑓(𝑥2 ). Such a function gives a one-to-one correspondence
between the domain and the image, but not between the domain and the codomain in
general.
Injections have the virtue of preserving information: for every member 𝑦 in the
image of an injection 𝑓, there is a unique 𝑥 in its domain such that 𝑓(𝑥) = 𝑦. In every
case, knowing 𝑦 is logically sufficient for determining 𝑥 (although we are not saying
anything here about how much work it might be to recover 𝑥; that depends on the
details of the function). If a function is not an injection, then there must be at least one
member 𝑦 of its image such that there are two or more members 𝑥1 , 𝑥2 of its domain
which map to that image: 𝑓(𝑥1 ) = 𝑓(𝑥2 ) = 𝑦. So, in that case, knowing 𝑦 still leaves you
in doubt as to how it could have been produced by 𝑓.
Functions that aren’t injections lose information. We might call them lossy lossy. In
fact, this term is used for data compression functions that are not injections. By contrast,
an injective data compression function is called lossless
lossless.
Similarly, we can express mathematically the onto property of a function, which we
defined in § 2.1.2𝛼 . A function 𝑓 ∶ 𝐴 ⟶ 𝐵 is surjective
surjective, and is said to be a surjection
surjection,
if for any value 𝑦 ∈ 𝐵 there must be a value of the function’s argument 𝑥 ∈ 𝐴 such that
𝑦 = 𝑓(𝑥).
A function that is both an injection and a surjection is said to be bijective and
is called a bijection
bijection. Such a function is a one-to-one correspondence between the
domain and the codomain (which is also the image in this case).
A bijection whose domain and codomain are the same set is also called a permutation
permutation,
though the latter term is usually used only for bijections on finite sets.
Bijections preserve information, since they are injections. Furthermore, since they
are also surjections, each member of the codomain may be thought of as encoding a
50 FUNCTiONS
unique member of the domain. So a bijection establishes that the domain and codomain
contain the same information, although it may be represented in different ways.
If a function has finite domain and codomain of the same size, and is an injection,
then it must also be a surjection, and hence also a bijection. This is because there is no
room in the codomain for the injection to avoid mapping to all its members. Similarly, a
surjection with finite domain and codomain of the same size must also be an injection,
and hence a bijection, since the need to map to all members of the codomain prevents any
repetition of codomain elements. Both these assertions fail if the domain and codomain
are infinite. Can you find examples to illustrate the failure in each case?
We think of a function as going from any member of its domain to its corresponding
value in the codomain. But there are times when we may want to go backwards: given a
value in the codomain, what argument in the domain does it come from? For example,
which computer worked at Harvard College Observatory? Which number, when squared,
gives 4? Which file corresponds to a particular encrypted file?
For functions in general, the answer may not be unique. We have just mentioned
some cases of this: our Employer function mapped two different computers to Harvard
College Observatory, and 22 = (−2)2 = 4.
This failure of uniqueness can happen in either of two different ways. Let 𝑓 ∶ 𝐴 → 𝐵
be a function. For a given value 𝑦 in the codomain:
• There may be more than one argument that gives the value 𝑦 under the function.
So we may have 𝑥1 , 𝑥2 ∈ 𝐴 such that 𝑥1 ≠ 𝑥2 but 𝑓(𝑥1 ) = 𝑓(𝑥2 ) = 𝑦.
• There may be no argument that gives the value 𝑦. This happens when the image is
a proper subset of the codomain and 𝑦 lies in the codomain but not in the image.
But if neither of these occurs, then every 𝑦 ∈ 𝐵 has a unique 𝑥 ∈ 𝐴 such that 𝑓(𝑥) = 𝑦.
This means that, in giving each 𝑦 ∈ 𝐵 a corresponding 𝑥 ∈ 𝐴, we are actually defining a
function from 𝐵 to 𝐴, with domain 𝐵 and codomain 𝐴. So the roles played by 𝐴 and
𝐵 are reversed, in keeping with the reversed “direction” of this new function. We call
this new function the inverse function of 𝑓 and denote it by 𝑓 −1 . We can write its
definition as follows.
𝑓 −1 ∶ 𝐵 ⟶ 𝐴,
𝑓 −1 (𝑦) = the unique 𝑥 such that 𝑓(𝑥) = 𝑦.
If we want to write the rule of 𝑓 −1 as a set of ordered pairs, then we just take all the
ordered pairs in 𝑓 and reverse them:
{ (𝑓(𝑥), 𝑥) ∶ 𝑥 ∈ 𝐴 }.
For a function to have an inverse function, it must be an injection (so that no value has
two corresponding arguments) and it must also be a surjection (so that every value in
the codomain is also in the image, i.e., has a corresponding argument). So, in fact, a
function has an inverse function if and only if it is a bijection.
Our Employer function is not a bijection and therefore does not have an inverse
function. The squaring function also does not have an inverse function, for the same
reason. But we would want an encryption function to have an inverse function, so that
an intended user of an encrypted file has no doubt about its contents. (In that context,
there is the separate issue of how easy or hard it should be to actually compute the
inverse. We would like that to be easy for intended users and hard for others. Achieving
these competing aims is the fundamental challenge of cryptography.)
The role of the inverse function is to “undo” the function and get back what you
started with. So, if you have 𝑥 ∈ 𝐴 and apply 𝑓 to get 𝑦 = 𝑓(𝑥), then it does not matter
much if you “lose” 𝑥, because you can recover it from 𝑦:
𝑥 = 𝑓 −1 (𝑦) = 𝑓 −1 (𝑓(𝑥)).
2.9 COMPOSiTiON
It is common to use values obtained from one function as arguments to another function.
For example, consider the functions Father and Mother which each have, as their domain
and codomain, the set ℙ of all people who have ever lived.
Father ∶ ℙ ⟶ ℙ,
Father(𝑝) = the father of person 𝑝.
Mother ∶ ℙ ⟶ ℙ,
Mother(𝑝) = the mother of person 𝑝.
52 FUNCTiONS
Starting with Alan Turing, the function Mother gives Alan’s mother, Sara Turing. Ap-
plying the function Father to her gives Sara’s father — Alan’s maternal grandfather —
Edward Stoney.
This “chaining” of the two functions is called composition, and is denoted by stating the
second function, followed by the composition symbol ∘, followed by the first function.
(Note the order there.) In this case, the function is Father ∘ Mother and it is defined as
follows.
Father ∘ Mother ∶ ℙ ⟶ ℙ,
Father ∘ Mother(𝑝) = Father(Mother(𝑝)).
Note now the order in which the functions are written in Father ∘ Mother is the same as
the order in which they are written when we write one function as an argument of the
other, i.e., in Father(Mother(𝑝)). But our usual order of reading and writing (left to
right) is the reverse of the order of application (right to left): we apply the function
Mother first, and then we apply the function Father.
In this example, the codomain of the first function applied (Mother) equals the
domain of the second function applied (Father); both are ℙ. This ensures that our
first function application, using Mother, always produces something (or someone) that
our second function, Father, can deal with. This is a general requirement for function
composition.
In general, the composition 𝑔 ∘𝑓 of two functions 𝑓 ∶ 𝐴 → 𝐵 and 𝑔 ∶ 𝐵 → 𝐶 is defined
by
𝑔 ∘ 𝑓 ∶ 𝐴 ⟶ 𝐶,
𝑔 ∘ 𝑓(𝑥) = 𝑔(𝑓(𝑥)).
Note that, in this composition, 𝑓 is applied first, and then the result is given to 𝑔. The
order of doing things does matter; in general, function composition is not commutative,
meaning that 𝑔 ∘ 𝑓 and 𝑓 ∘ 𝑔 are not the same. It is, however, associative: if 𝑓 ∶ 𝐴 → 𝐵,
𝑔 ∶ 𝐵 → 𝐶 and ℎ ∶ 𝐶 → 𝐷, then
ℎ ∘ (𝑔 ∘ 𝑓) = (ℎ ∘ 𝑔) ∘ 𝑓,
of all students, 𝕌 is the set of all Monash units, and the function FavouriteUnit is defined
as follows.
FavouriteUnit ∶ 𝕊 ⟶ 𝕌,
FavouriteUnit(𝑝) = the favourite Monash unit of person 𝑝.
If you want to know your mother’s favourite unit at Monash, you might be tempted to
use the function FavouriteUnit ∘ Mother. But not all mothers are students, and not all
students are mothers. Formally, the codomain of Mother is ℙ, which does not equal the
domain of FavouriteUnit, namely 𝕊. So this function composition is undefined.
The requirement that the codomain of the first function 𝑓 equals the domain of the
second function 𝑔 amounts to insisting that our guarantee about what 𝑓 can produce
(expressed in the form of its codomain) is the same as our guarantee about what 𝑔 can
handle (expressed in the form of its domain). So the two functions are compatible, in a
precise sense.
You have seen composition before for mathematical functions. For example, the
expression (𝑥 − 1)2 may be regarded as the rule for the composition 𝑔 ∘ 𝑓 of the two
functions
𝑓 ∶ ℝ ⟶ ℝ,
𝑓(𝑥) = 𝑥 − 1.
𝑔 ∶ ℝ ⟶ ℝ+
0,
𝑔(𝑥) = 𝑥2 .
𝑓 ∘ 𝑓 −1 = 𝑖𝐴 ,
𝑓 −1 ∘ 𝑓 = 𝑖𝐴 .
𝑔 ∘ 𝑖𝐶 = 𝑔,
𝑖𝐷 ∘ 𝑔 = 𝑔.
We can compose a function with itself provided its domain and codomain are the
same. If 𝑓 ∶ 𝐴 → 𝐴 then the definition of composition tells us that 𝑓 ∘𝑓 ∶ 𝐴 → 𝐴 is defined
for all 𝑥 ∈ 𝐴 by 𝑓 ∘ 𝑓(𝑥) = 𝑓(𝑓(𝑥)). We can then do iterated composition of 𝑓 with itself,
if we wish. We write 𝑓 (𝑛) for the composition of 𝑛 copies of 𝑓:
𝑓 (𝑛) = 𝑓
∘𝑓∘⋯∘𝑓
𝑛 copies of 𝑓
54 FUNCTiONS
• Consider the function LCG ∶ [0, 231 −1]ℤ → [0, 231 −1]ℤ defined for all 31-bit nonneg-
ative binary integers 𝑥 by
We always keep only the last 31 bits, to ensure that the numbers generated stay
within our fixed interval. This function has been used to generate sequences of
numbers that are pseudorandom in the sense that, superficially, they look random
if you don’t look too closely. Starting with some initial “seed” number 𝑠, the
function LCG is applied repeatedly, and the successive numbers LCG(𝑛) (𝑠) should
behave in a way that looks statistically random in some sense. (We have used this
example as it is one of the simpler pseudorandom number generators that have
been used in practice, but its randomness properties are imperfect and it should
not be used by itself in this naive way. It is usually used in conjunction with other
methods in order to increase the randomness.) The name LCG comes from the
term Linear Congruential Generator, which is a type of pseudorandom number
generator of which this is one example.
• Consider the function 𝐾 ∶ [0, 9999]ℤ → [0, 9999]ℤ defined for any nonnegative integer
𝑥 with at most four (decimal) digits as follows. First, form a four-digit number
by writing the four digits of 𝑥 (using leading zeros if necessary) from smallest
to largest. Then reverse that number, so that the digits now go from largest to
smallest. Then 𝐾(𝑥) is defined to be the difference between these two numbers.
For example,
𝐾(1729) = 9721 − 1279 = 8442.
This function is not an injection (why?). It is clear that, if all the digits in 𝑥 are the
same, then 𝐾(𝑥) = 0, and therefore 𝐾 (𝑛) (𝑥) = 0 for all 𝑛 ≥ 1. More surprisingly, in
all other cases (i.e., when 𝑥 has at least two different digits), iterated composition
of this function with itself eventually reaches 6174, and does so after at most seven
iterations. For example, starting with 1729 as above, we have
𝐾 𝐾 𝐾 𝐾 𝐾 𝐾 𝐾 𝐾
1729 ⟼ 8442 ⟼ 5994 ⟼ 5355 ⟼ 1998 ⟼ 8082 ⟼ 8532 ⟼ 6174 ⟼ 6174.
So 𝐾 (7) (𝑥) is 0 if all digits are the same and 6174 otherwise. This was discovered
by the Indian mathematician D. R. Kaprekar in 1946 and published in 1955.4
4 D. R. Kaprekar, An interesting property of the number 6174, Scripta Mathematica 21 (1955) 304. Martin
Gardner, Mathematical Games, Scientific American (March 1975).
2.9 C O M P O S i T i O N 55
3𝑥 + 1, if 𝑥 is odd;
Collatz(𝑥) =
𝑥/2, if 𝑥 is even.
7 ↦ 22 ↦ 11 ↦ 34 ↦ 17 ↦ 52 ↦ 26 ↦ 13 ↦ 40 ↦ 20 ↦ 10 ↦ 5 ↦ 16 ↦ 8 ↦ 4 ↦ 2 ↦ 1 ↦ 4 ↦ 2 ↦ 1 ↦ ⋯
Note how it eventually gets stuck in a loop, going from 4 to 2 to 1, then back to
4, and so on.
Iterated composition of this function is mysterious. It is conjectured that, for
every 𝑥, there exists 𝑛 such that Collatz(𝑛) (𝑥) = 1. This has become known as
Collatz’s Conjecture or the 3𝑥 + 1 problem.
problem Currently it is unsolved. It is a
remarkable illustration that even simple questions about very simple algorithms
can be very deep and hard to answer.
We now consider how the injective, surjective and bijective properties are affected
by composition.
We start with injection.
Theorem 6.
6 If 𝑓 ∶ 𝐴 → 𝐵 and 𝑔 ∶ 𝐵 → 𝐶 are injections then 𝑔 ∘ 𝑓 is also an injection.
𝑎 𝑔 ∘𝑓
𝑔 ∘ 𝑓(𝑎) = 𝑔 ∘ 𝑓(𝑏)
𝑔 ∘𝑓
𝑏
𝑎 𝑓
𝑔
𝑓(𝑎) = 𝑓(𝑏) 𝑔 ∘ 𝑓(𝑎) = 𝑔 ∘ 𝑓(𝑏)
𝑓
𝑏
• If 𝑓(𝑎) ≠ 𝑓(𝑏), then we have 𝑓(𝑎) ≠ 𝑓(𝑏) and 𝑔(𝑓(𝑎)) = 𝑔(𝑓(𝑏)), so 𝑔 is not an
injection (since we have two distinct members of its domain, namely 𝑓(𝑎) and
𝑓(𝑏), that are mapped by 𝑔 to the same value).
𝑓
𝑎 𝑓(𝑎) 𝑔
𝑔 ∘ 𝑓(𝑎) = 𝑔 ∘ 𝑓(𝑏)
𝑔
𝑓
𝑏 𝑓(𝑏)
So we see that, whatever happens with 𝑓(𝑎) and 𝑓(𝑏), at least one of 𝑓 and 𝑔 is not an
injection.
The converse of this theorem does not hold: 𝑔 ∘ 𝑓 being an injection does not imply
that both 𝑓 and 𝑔 are injections. For example, define 𝑓 ∶ {1, 2} → {1, 2, 3} by
𝑓(1) = 1,
𝑓(2) = 2,
𝑔(1) = 1,
𝑔(2) = 2,
𝑔(3) = 2.
So 𝑔 ∘ 𝑓 is an injection, but it is not the case that both 𝑓 and 𝑔 are injections. In fact,
𝑓 is an injection, but 𝑔 is not.
Now let’s consider surjections.
Theorem 7.
7 If 𝑓 ∶ 𝐴 → 𝐵 and 𝑔 ∶ 𝐵 → 𝐶 are surjections then 𝑔 ∘ 𝑓 is also a surjection.
2.9 C O M P O S i T i O N 57
For Theorem 7, too, the converse does not hold. In fact, the same 𝑓 and 𝑔 we gave
above, after the proof of Theorem 6 and before stating Theorem 7, shows this here too.
In that example, 𝑔 ∘𝑓 is a surjection, but 𝑓 is not a surjection (although 𝑔 is a surjection).
Theorem 6 and Theorem 7 together give a similar statement for bijections.
Theorem 8.
8 If 𝑓 ∶ 𝐴 → 𝐵 and 𝑔 ∶ 𝐵 → 𝐶 are bijections then 𝑔 ∘ 𝑓 is also a bijection.
Once again, the converse does not hold in general, and once again our little functions
𝑓 and 𝑔 show this, since 𝑔 ∘ 𝑓 is a bijection but neither 𝑓 nor 𝑔 is a bijection.
There is one important situation where the converse does hold as well.
Proof. The domains and codomains of 𝑓, 𝑔 and 𝑔 ∘ 𝑓 are finite and of the same size, as
stated. So each of them is a bijection if and only if it is an injection, by our remarks at
the end of § 2.7. So it is enough to prove that
We have already seen that, if 𝑓 and 𝑔 are injections, then so is 𝑔 ∘ 𝑓 (Theorem 6). So it
remains to prove that
Our starting assumption here, that at least one of 𝑓 and 𝑔 is not an injection, divides
naturally into two cases: (i) 𝑓 is not an injection, and (ii) 𝑔 is not an injection. These
two cases overlap, which is ok.
Case (i):
If 𝑓 is not an injection, then by definition there exist distinct 𝑎, 𝑏 ∈ 𝐴 such that
𝑓(𝑎) = 𝑓(𝑏). Then 𝑔(𝑓(𝑎)) = 𝑔(𝑓(𝑏)). So in fact our distinct 𝑎, 𝑏 also give 𝑔∘𝑓(𝑎) = 𝑔∘𝑓(𝑏),
so 𝑔 ∘ 𝑓 is not an injection.
𝑎 𝑓
𝑔
𝑓(𝑎) = 𝑓(𝑏) 𝑔 ∘ 𝑓(𝑎) = 𝑔 ∘ 𝑓(𝑏)
𝑓
𝑏
Case (ii):
It remains to consider the possibility that 𝑔 is not an injection. Within this case, we
can restrict to cases where 𝑓 is an injection, since we have just dealt with the possibility
that 𝑓 is not an injection. (Effectively, we are ignoring the overlap between the two
cases, since that overlap is covered by Case (i).)
Suppose then that 𝑓 is an injection. Since its domain and codomain are finite and
have the same size, this means it is also a bijection, and therefore has an inverse.
If 𝑔 is not an injection, then by definition there exist distinct 𝑐, 𝑑 ∈ 𝐵 such that
𝑔(𝑐) = 𝑔(𝑑). Now because 𝑓 is a bijection, its inverse 𝑓 −1 is defined, has the same
domain 𝐵, and is also a bijection. So 𝑓 −1 (𝑐) and 𝑓 −1 (𝑑) are both defined and must
be distinct since 𝑐 ≠ 𝑑. Furthermore, 𝑐 = 𝑓(𝑓 −1 (𝑐)) and 𝑑 = 𝑓(𝑓 −1 (𝑑)). So 𝑔(𝑐) = 𝑔(𝑑)
implies 𝑔(𝑓(𝑓 −1 (𝑐))) = 𝑔(𝑓(𝑓 −1 (𝑑))), which may be rewritten
But 𝑓 −1 (𝑐) ≠ 𝑓 −1 (𝑑), so we have two distinct members of 𝐴 which are mapped to the
same thing by 𝑔 ∘ 𝑓. So 𝑔 ∘ 𝑓 is not an injection.
𝑓
𝑓 −1 (𝑐) 𝑐 𝑔
𝑔(𝑐) = 𝑔(𝑑)
𝑔
𝑓
𝑓 −1 (𝑑) 𝑑
Now that we can compose two functions, it is natural to ask about the inverse of the
composition. This turns out to be the reverse composition of their inverses. This aligns
with everyday experience of doing and undoing sequences of tasks: if we wrap up a gift
in multiple layers of wrapping paper, then the recipient unwraps the layers in reverse
order.
Similarly,
2.10 C RY P T O S Y S T E M S
• a message space,
space which is a finite set 𝑀 of possible messages (in the form of
strings over some alphabet),
60 FUNCTiONS
• a cypher space,
space which is a finite set 𝐶 of strings, which we call cyphertexts
cyphertexts,
and
• a keyspace
keyspace, which is a finite set 𝐾 whose members we call keys
keys.
(ii) 𝑑𝑘 = 𝑒𝑘−1 .
In practice we will also want some conditions on how easy or hard it is to compute these
functions or even to obtain partial information from them.
For convenience, we restrict ourselves to cryptosystems where the cypherspace and
message space are the same, i.e., 𝑀 = 𝐶. (Most real cryptosystems either have this
property or can easily be modified so that they do.)
Suppose we have two cryptosystems with the same message spaces but with keyspaces
and encryption/decryption maps that may be different. Call them 𝒞 = (𝑀 , 𝑀 , 𝐾, 𝑒, 𝑑)
and 𝒞 ′ = (𝑀 , 𝑀 , 𝐾 ′ , 𝑒′ , 𝑑 ′ ). We would like to compose them to make a more complex
cryptosystem. For encryption, we want to first encrypt with 𝑒 and then encrypt further
with 𝑒′ . This is shown in Figure 2.2. For decryption, we want to do the reverse: decrypt
using 𝑑 ′ , then decrypt further with 𝑑.
But our definition of function composition (§ 2.9) only applies to functions of one
argument. So we need to extend this definition for our encryption and decryption
functions.
The keyed composition of encryption functions 𝑒 ∶ 𝑀 ×𝐾 → 𝑀 and 𝑒′ ∶ 𝑀 ×𝐾 ′ → 𝑀
is the function 𝑒′ • 𝑒 ∶ 𝑀 × (𝐾 × 𝐾 ′ ) → 𝑀 defined for all 𝑚 ∈ 𝑀 and (𝑘, 𝑘 ′ ) ∈ 𝐾 × 𝐾 ′ by
message for 𝑒
key for 𝑒: 𝑘 𝑒
cyphertext from 𝑒
becomes
message for 𝑒′
key for 𝑒′ : 𝑘′ 𝑒′
cyphertext from 𝑒′
But, by (2.2), this is just 𝑒′ (𝑒(𝑚, 𝑘), 𝑘 ′ ). And we can express this in terms of composition
of our single-argument encryption functions:
and
𝑑(𝑘,𝑘′ ) = 𝑑 ∘ 𝑑 ′ .
Theorem 11.
11 The composition of two cryptosystems is also a cryptosystem.
Proof. We need to prove that, for each key pair (𝑘, 𝑘 ′ ), the encryption function 𝑒(𝑘,𝑘′ )
and the decryption function 𝑑(𝑘,𝑘′ ) are both bijections, and that the latter is the inverse
of the former.
The fact that they are bijections follows from the fact that 𝑒𝑘 , 𝑒𝑘′ ′ , 𝑑𝑘 , 𝑑𝑘′ ′ are all bijec-
tions for all 𝑘 and 𝑘′ (because 𝒞 and 𝒞 ′ are both cryptosystems) and Theorem 8.
The fact that 𝑑(𝑘,𝑘′ ) = (𝑒(𝑘,𝑘′ ) )−1 follows from Theorem 10.
The domain of a loose composition is the subset of the domain of the first function
(i.e., the one that is applied first) containing everything for which the successive function
applications are possible. In other words, it’s everything that the first function maps into
the domain of the second function. For FavouriteUnit ∘ Mother, this means every person
whose mother is a student, since for any such person, the function Mother produces a
member of 𝕊.
In general, the loose composition 𝑔 ∘ 𝑓 of two functions 𝑓 ∶ 𝐴 → 𝐵 and 𝑔 ∶ 𝐶 → 𝐷
is defined by
𝑔 ∘ 𝑓 ∶ {𝑥 ∈ 𝐴 ∶ 𝑓(𝑥) ∈ 𝐶} ⟶ 𝐷,
𝑔 ∘ 𝑓(𝑥) = 𝑔(𝑓(𝑥)).
Note that, as usual for our composition notation, 𝑓 is applied first, and then the result
is given to 𝑔.
The loose composition of two functions is always defined. It might sometimes be
useless: if 𝐵 ∩ 𝐶 = ∅, then nothing that 𝑓 produces is in the domain of 𝑔, so the
composition 𝑔 ∘ 𝑓 is just the empty function.
It will be seen from the definition of loose composition that it is harder to work out
the domain of loose composition than it is to work out the domain of tight composition.
This makes it a bit harder to use in practice. Tight composition has the advantage that
the question of whether the composition is defined can be answered solely by looking at
the appropriate codomain and domain; you do not need to study the rule at all. This
makes it much easier to work with, and from a computing perspective, much easier to
use as a specification of a task based on combining two tasks.
In this unit, we will use tight composition rather than loose composition.
We often want to count functions of various types. You might want to determine the
amount of time an algorithm takes, if the algorithm has to search through all functions
of some type. You might want to determine the amount of space that a collection of
data requires, if the data items correspond to functions. You might want to determine
the probability that a random function has some particular property.
Suppose 𝑓 ∶ 𝐴 → 𝐵, where the sets have sizes |𝐴| = 𝑚 and |𝐵| = 𝑛. How many
functions of this type are there? The domain has 𝑚 elements, and each of them is
mapped to one, and only one, member of the codomain 𝐵. So there are 𝑛 possibilities
for 𝑓(𝑥) for each element 𝑥 ∈ 𝐴. These choices are independent; there is no requirement
for the various values of 𝑥 to differ from each other or to be related in any other way. So
we have 𝑚 independent choices, each being among 𝑛 possibilities. This means we have
𝑛𝑚 functions.
Suppose now we require 𝑓 to be an injection. We still have 𝑚 elements in the
domain, and for each of these, we must still choose exactly one member of the codomain.
64 FUNCTiONS
But now these choices are no longer independent, since as soon as one member of the
codomain is chosen, that member is no longer available for any other member of the
domain. Suppose we make these choices in order, and to help describe this, we suppose
that the elements of 𝐴 are enumerated as 𝑎1 , 𝑎2 , … , 𝑎𝑚 . Now, 𝑎1 can be mapped to any
of the 𝑛 elements of 𝐵. Then, 𝑎2 can be mapped to any element of 𝐵 except 𝑓(𝑎1 ), so it
has 𝑛 − 1 choices. Then, 𝑎3 can be mapped to any element of 𝐵 except 𝑓(𝑎1 ) and 𝑓(𝑎2 ),
so it has 𝑛 − 2 choices. And so on. Finally, 𝑎𝑚 can be mapped to any element of 𝐵
except 𝑓(𝑎1 ), 𝑓(𝑎2 ), … , 𝑓(𝑎𝑚−1 ), so it has 𝑛 − (𝑚 − 1) choices, which is 𝑛 − 𝑚 + 1 choices.
So the total number of injections is
This formula also copes nicely with the possibility that 𝑚 > 𝑛. In that case, we know
that the codomain is too small to allow any injections from the domain at all, so the
answer should be 0, and that is indeed what the formula gives, since one of the factors
will be 0.
Requiring 𝑓 to be a surjection needs some more care and will be considered later.
If we require 𝑓 to be a bijection, then that’s the same as requiring 𝑓 to be an injection
whose codomain and domain are the same size, since both sets are finite. (See the end
of § 2.7 on p. 50.) So this is the same as the injective case when 𝑚 = 𝑛. So the number
of bijections is just 𝑛(𝑛 − 1)(𝑛 − 2) ⋯ (𝑛 − 𝑛 + 1), which is 𝑛!.
Recall that, when a bijection maps a finite set to itself, it is called a permutation of
that set (§ 2.7). So the number of permutations of an 𝑛-element set is 𝑛!.
2.13 B i N A RY R E L AT i O N S
We often want to know when objects of one type are related in some specific way to
objects of another type. For example, in considering sets of people, we might like to
know who is a parent of whom. In mobile communications, systems that manage calls
would use information on which smartphones contacted which cellphone towers. Some
ecologists monitor predator-prey relationships among species in a geographic area. In
timetabling Monash classes in a given semester, we want to know which pairs of units
have at least one student in common, so that we can try to schedule classes in those
units at different times.
These situations, and very many others, can be modelled by binary relations.
A binary relation consists of two sets 𝐴 and 𝐵 and a set of ordered pairs (𝑎, 𝑏)
where 𝑎 ∈ 𝐴 and 𝑏 ∈ 𝐵. The ordered pairs are used to state which members of 𝐴 and 𝐵
are related to each other in the required way.
Recall that the Cartesian product 𝐴 × 𝐵 is the set of all ordered pairs in which the
first and second members of the pair belong to 𝐴 and 𝐵 respectively. This gives us a
very succinct way to restate our definition. A binary relation consists of two sets 𝐴
2.13 B i N A RY R E L AT i O N S 65
and 𝐵 and a subset of 𝐴 × 𝐵. We sometimes say that the binary relation is from 𝐴 to
𝐵, and we may still call 𝐴 the domain and 𝐵 the codomain
codomain.
The two sets might be the same. A binary relation on a set 𝐴 is a binary relation
from 𝐴 to itself. Each of the two sets is 𝐴, so that the relation is a subset of 𝐴 × 𝐴.
A binary relation is also called a binary predicate or a predicate with two arguments.
arguments
If 𝑅 is the name of a binary relation, then we write 𝑥𝑅𝑦 or 𝑅(𝑥, 𝑦) or (𝑥, 𝑦) ∈ 𝑅 to
mean that (𝑥, 𝑦) is one of the ordered pairs in 𝑅. The notation 𝑥𝑅𝑦 is an example of
infix notation, where the name of the operation/function/relation is placed between the
two things it links. The notation 𝑅(𝑥, 𝑦) is a further example of prefix notation. (Recall
the discussion of prefix, infix and postfix notation on p. 47.)
For example, the Parent relation is a relation on the set ℙ of all people (so the two
sets are the same in this case), and the pair of people (𝑝, 𝑞) belongs to this relation if 𝑞
is a parent of 𝑝. Members of the Parent relation include:
For the mobile communications example, the two sets are different, namely a set of
smartphones and a set of cellphone towers, and the relation includes pairs like
It may be tempting to use binary relation names in the same way we use function
names. Recall the functions Mother and Father, which enable us to write statements like
But we will not write “Parent(Alan Turing)”; such notation would treat Parent as a
function of people, when in fact “the parent” of a person is not, in general, uniquely
defined.
We saw in § 2.1.3𝛼 that one way to specify the rule of a function 𝑓 ∶ 𝐴 → 𝐵 is to give its
graph, i.e., to give all its ordered pairs (𝑥, 𝑓(𝑥)), which all belong to 𝐴 ×𝐵. So a function
is a type of binary relation. To be precise, a function with domain 𝐴 and codomain 𝐵
is a binary relation on 𝐴 and 𝐵 in which, for every 𝑎 ∈ 𝐴, there is a unique 𝑏 ∈ 𝐵 such
that (𝑎, 𝑏) belongs to the relation. To put it the other way round (and less formally),
a binary relation is a “function” where we drop the requirement that each member of
the domain gives exactly one member of the codomain. With this latter viewpoint, a
binary relation is sometimes called a “many-valued function” because a single member of
the domain can yield many different values in the codomain (whereas normal functions
are just single-valued). But we will not refer to “many-valued functions” because it is a
contradiction in terms: a function, by definition, cannot be many-valued in that sense.
Examples of binary relations on sets of numbers include =, ≤, ≥. Examples of
binary relations on sets of sets include =, ⊆, ⊇.
One common source of binary relations is network structures. Networks consist of
nodes with links between some pairs of nodes. For example:
• In a social network, the nodes are people and the links represent which people
know which other people.
• In Monash timetabling, nodes represent the units running in a given semester, and
links represent pairs of units that have at least one student in common for that
semester. A small fragment of this network is shown in Figure 2.3. In this example,
MTH1030 and MTH1035 have no students in common (because it is prohibited to
enrol in both of them), so their classes may be at the same time (or at different
times), and we represent this lack of restriction by the absence of a link between
the corresponding nodes. But every other pair of these units has some students
that do both of them, so every other pair of nodes has a link between them.
2.13 B i N A RY R E L AT i O N S 67
FIT1058 MTH1035
MTH1030 FIT1045
In each of these cases, we have a set of nodes, and the set of links between them can be
represented by a binary relation.
Another common source of binary relations is tables of data. Whenever you write a
table with two columns, you are specifying a binary relation whose ordered pairs (𝑥, 𝑦)
correspond to the rows of the table, with 𝑥 and 𝑦 being the entries in the first and second
column respectively. If you have a table with more than two columns, then often taking
some (or perhaps any) pairs of columns will give you a binary relation in the same way.
Similar remarks apply to data stored in other ways that may be thought of as tables,
such as in spreadsheets and databases.
Important operations which you might want to do, when using a binary relation 𝑅
on sets 𝐴 and 𝐵, include:
• Determine the set of all members of 𝐴 that actually appear as the first member of
a pair in 𝑅. This is {𝑥 ∈ 𝐴 ∶ (𝑥, 𝑦) ∈ 𝑅 for some 𝑦 ∈ 𝐵}.
• Determine the set of all members of 𝐵 that actually appear as the second member
of a pair in 𝑅. This is {𝑦 ∈ 𝐵 ∶ (𝑥, 𝑦) ∈ 𝑅 for some 𝑥 ∈ 𝐴}.
Let 𝑅 be any binary relation from 𝐴 to 𝐵. Its inverse 𝑅 −1 is the binary relation
from 𝐵 to 𝐴 defined by
𝑦𝑅 −1 𝑥 ⟺ 𝑥𝑅𝑦.
So the inverse is constructed by just swapping the roles of the domain and codomain
and reversing all the pairs. In the special case when 𝑅 is actually a function, this is just
the definition of the inverse of a function (see § 2.8). We can now talk of the inverse of
any function, not just of bijections, but we have to remember that, if 𝑓 is not a bijection,
then 𝑓 −1 is not a function (although it is a valid binary relation).
2.14 P R O P E RT i E S O F B i N A RY R E L AT i O N S
2.15 C O M B i N i N G B i N A RY R E L AT i O N S
Viewing a binary relation as a set of ordered pairs enables us to apply ordinary set
operations on them. Suppose 𝑅 and 𝑆 are both binary relations from 𝐴 to 𝐵. Then
their union 𝑅 ∪ 𝑆 is the binary relation from 𝐴 to 𝐵 containing every pair that belongs
to at least one of 𝑅 and 𝑆.
For example,
Parent = Mother ∪ Father.
The intersection 𝑅 ∩ 𝑆 is the binary relation from 𝐴 to 𝐵 containing every pair that
belongs to both 𝑅 and 𝑆.
For example,
Employer ∩ Parent
is the set of pairs (𝑥, 𝑦) such that 𝑦 is both an employer and a parent of 𝑥. This includes,
for example, the pairs
contains all pairs (𝑥, 𝑧) such that 𝑧 employs a parent of 𝑥. If you are 𝑥, then an employer
of one of your parents is 𝑧, and the pair of you are related by Employer ∘ Parent. But the
order matters: this relation is not the same as the one that relates people to their own
employers’ parents.
In the special case when 𝑅 and 𝑆 are both functions, their composition is just their
composition as functions, using the definition of function composition given in § 2.9.
Let’s investigate the composition of any relation 𝑅 from 𝐴 to 𝐵 with its inverse
relation 𝑅 −1 . The composition 𝑅 −1 ∘ 𝑅, which goes from 𝐴 to itself, satisfies
Binary relations, like functions, can be composed with themselves. Consider again
the binary relation knows on the set of all people. In the composition knows ∘ knows,
two people are related if they have a mutual acquaintance, i.e., they each know someone
who knows the other. We can extend this. In the composition knows∘knows∘knows, one
person is related to another if they know someone who knows someone who knows the
other. The five-fold composition
is said to relate every pair of people on Earth. This is the principle known as “six degrees
of separation”; it uses the knows relation six times, with five compositions.
As for function composition, we can use exponents in parentheses to denote com-
position of relations with themselves: 𝑅 (𝑛) is the composition of 𝑛 copies of 𝑅. So the
composition we wrote above, for six degrees of separation, could be written knows(6) .
Sometimes, for a binary relation 𝑅 on a set 𝐴, we may want to go further and identify
every pair of elements of 𝐴 that are linked by any chain, no matter how long, of pairs
in 𝑅.
The transitive closure of a binary relation 𝑅 on a set 𝐴 is the unique binary
relation 𝑅 + on 𝐴 such that, for all 𝑥, 𝑦, 𝑧 ∈ 𝐴,
Actually, you can drop either the second or third (but not both) of these conditions
from the definition. You cannot drop the first condition, which gets the transitive
closure process started.
The transitive closure 𝑅 + certainly contains all pairs in 𝑅, by condition (i) in its
definition. It also contains all pairs in 𝑅 (2) , and all pairs in 𝑅 (3) , and so on. So the set
of pairs in the transitive closure is given by
The transitive closure of knows identifies whenever two people can be linked by
an arbitrarily long chain of social connections. Under the hypothesis of six degrees of
separation,
knows+ = knows(6)
and every pair of people on Earth are linked in this way.
The transitive closure of a binary relation is always transitive, hence the name.
2.16 E Q U i VA L E N C E R E L AT i O N S
It seems fair to regard equality as the most fundamental binary relation. No matter
what kind of objects we are working with, we want to be able to say whether or not two
of them are really the same. If we cannot do even that, it is hard to imagine having any
useful discussion about such objects at all. So, even if there is no other relationship to
speak of among the objects we are discussing, there must be an equality relation. We
will always feel free to use equality on any set at all, without announcing its existence
beforehand.
In many situations, different objects may be treated as being equivalent for some
purposes even though they must be treated differently for other purposes. For example,
if two different students are in the same weekly tutorial class, then they can be treated
as equivalent for FIT1058 timetabling purposes, even though their other subjects may
be different so that they cannot be treated as equivalent for other timetabling purposes.
1058
Define the binary relation === on the set 𝕊 of all students by
1058
𝑝 === 𝑞 ⟺ 𝑝 and 𝑞 are in the same FIT1058 tutorial,
for any 𝑝, 𝑞 ∈ 𝕊.
If a binary relation is to capture some notion of “equivalence”, what properties should
it have? We can use equality as a guide, since “equivalence” should be like equality but
72 FUNCTiONS
a bit “looser” in that two things can be equivalent without being identical. We have
just seen that the equality relation is reflexive, symmetric, and transitive. This is also
what we would expect of a binary relation that tells us when two things are equivalent.
Every object, of any kind, is certainly equivalent to itself; if one object is equivalent to
another, then the latter object must also be equivalent to the former; and if an object is
equivalent to another, which in turn is equivalent to a third object, then the first object
must also be equivalent to the third object.
Equality is also antisymmetric. But we will not add this to our requirements of
equivalence in general, since antisymmetry requires that if an object is equivalent to
another, which in turn is equivalent to our first object, then the two objects must actually
be equal. This would be tantamount to saying that equivalence implies equality, which
would mean that we have no form of equivalence other than equality itself, which is too
narrow.
With these thoughts in mind, we make the following definition.
• Two real numbers 𝑥, 𝑦 are equivalent in integer part if, when each is rounded
int
to the nearest integer, they become equal. We write this as 𝑥 == 𝑦.
• Two triangles in the plane are congruent if one can be moved onto the other by
some sequence of translations, rotations and reflections.
• For congruence modulo 𝑚, the equivalence classes are the sets {𝑘𝑚 + 𝑟 ∶ 𝑘 ∈ ℤ},
for each 𝑟 ∈ {0, 1, … , 𝑚 − 1}. These sets are called the residue classes modulo
𝑚. There is one such set for each 𝑟 ∈ {0, 1, … , 𝑚 − 1}. For example, if 𝑚 = 2, then
𝑟 ∈ {0, 1} and we have two equivalence classes: the set of all even integers (for
𝑟 = 0), and the set of all odd integers (for 𝑟 = 1).
int
• For the relation ==, the equivalence classes are the intervals [𝑛 − 12 , 𝑛 + 12 ) for all
𝑛 ∈ ℤ.
• For congruent triangles, the equivalence classes each contain all triangles of one
particular shape and size (so they all have the same angles and side lengths, but
otherwise can be in any location and orientation in the plane).
• For ∼𝑓 , the equivalence classes are the preimages 𝑓 −1 (𝑦) = {𝑥 ∶ 𝑓(𝑥) = 𝑦} of each
𝑦 ∈ im 𝑓.
In each of these cases, the equivalence classes divide up the set 𝐴 neatly, so that
each element of 𝐴 belongs to exactly one equivalence class. These are manifestations of
a general phenomenon.
Theorem 12.12 The equivalence classes of an equivalence relation 𝑅 on a set 𝐴 form a
partition of 𝐴.
Proof. To show that the equivalence classes form a partition of 𝐴, we need to show
that (a) every member of 𝐴 belongs to an equivalence class, and (b) no two different
equivalence classes overlap. (Note that equivalence classes are nonempty by definition.)
(a)
Let 𝑥 ∈ 𝐴. Define 𝑋 to be the set of all members of 𝐴 that are equivalent to 𝑥 under
our equivalence relation 𝑅:
𝑋 = {𝑦 ∈ 𝐴 ∶ 𝑥𝑅𝑦}.
We claim that this is an equivalence class of 𝑅 that contains 𝑥. We prove parts (i) and
(ii) of the definition of equivalence relation, in turn.
74 FUNCTiONS
(b)
Suppose 𝑋 and 𝑌 are overlapping equivalence classes of 𝑅. So 𝑋 ∩ 𝑌 ≠ ∅. Let
𝑥 ∈ 𝑋 ∩ 𝑌.
We claim that 𝑋 = 𝑌.
To do this, we consider 𝑌 ∖𝑋 and 𝑋 ∖𝑌 and show that they are both empty, which
implies that 𝑋 = 𝑌. (See Theorem 4 and (1.17).)
Consider first the possibility that 𝑌 ∖ 𝑋 ≠ ∅. Suppose 𝑦 ∈ 𝑌 ∖ 𝑋 . Because 𝑋 is an
equivalence class and 𝑦 ∉ 𝑋 , it follows that (𝑥, 𝑦) ∉ 𝑅. But because 𝑌 is an equivalence
class and 𝑥, 𝑦 ∈ 𝑌, it follows that (𝑥, 𝑦) ∈ 𝑅. So we have a contradiction in this case.
Therefore 𝑌 ∖ 𝑋 ≠ ∅ is impossible, so 𝑌 ∖ 𝑋 = ∅.
It remains to consider the possibility that 𝑋 ∖ 𝑌 ≠ ∅. But the argument here is the
same as that of the previous paragraph, with 𝑋 and 𝑌 interchanged. So 𝑋 ∖ 𝑌 = ∅.
So we have 𝑌 ∖ 𝑋 = 𝑋 ∖ 𝑌 = ∅. This implies that 𝑋 = 𝑌.
So the only way two equivalence classes can overlap is if they are identical.
So we have shown that every member of 𝐴 belongs to exactly one equivalence class.
So the equivalence classes of 𝑅 form a partition of 𝐴.
2.17 R E L AT i O N S
Binary relations are binary in the sense that they consists of ordered pairs. It is often
useful to consider relations whose members have more objects.
A ternary relation on sets 𝐴, 𝐵, 𝐶 is a set of triples (𝑥, 𝑦, 𝑧) such that 𝑥 ∈ 𝐴, 𝑦 ∈ 𝐵
and 𝑧 ∈ 𝐶. (Triples written in parentheses are always considered to be ordered.) In other
words, it is a subset of 𝐴 × 𝐵 × 𝐶. For example, a set of points in real three-dimensional
space is a ternary relation on ℝ. A list of names in which each is listed as
could be represented as a ternary relation in which each triple has the form
Dates can be represented by (day, month, year) triples, so they too can be represented
naturally in a ternary relation.
More generally, an 𝑛-ary
-ary relation on sets 𝐴1 , 𝐴2 , … , 𝐴𝑛 consists of 𝑛-tuples (𝑥1 , 𝑥2 , … ,
𝑥𝑛 ) where 𝑥𝑖 ∈ 𝐴𝑖 for all 𝑖 ∈ {1, … , 𝑛}. In other words, it is a subset of 𝐴1 × 𝐴2 × ⋯ × 𝐴𝑛 .
-th domain.
The set 𝐴𝑖 is called the 𝑖-th domain
When you use a table with 𝑛 columns, its rows represent the 𝑛-tuples in an 𝑛-ary
relation. For each 𝑖, the set 𝐴𝑖 is a set that contains all objects appearing in the 𝑖-th
column (and is allowed to contain more, much as a codomain of a function is allowed to
contain more than just the image). So, when you work with spreadsheets, you are usually
working with 𝑛-ary relations whose members represent the rows of the spreadsheet.
Relations are fundamental in databases, so much so that the term relational database
is used for one of the most widely used types of database.
An 𝑛-ary relation is also called an 𝑛-ary
-ary predicate or a predicate with 𝑛 arguments.
arguments
2.18 C O U N T i N G R E L AT i O N S
How many binary relations from 𝐴 to 𝐵 are there, where 𝐴 and 𝐵 are finite sets?
Put 𝑚 ∶= |𝐴| and 𝑛 ∶= |𝐵|.
A binary relation is just a subset of 𝐴 × 𝐵, so the number of binary relations from
𝐴 to 𝐵 is just the number of subsets of 𝐴 × 𝐵. This is just the size of the power set
𝒫(𝐴 × 𝐵), which is
2|𝐴×𝐵| ,
by (1.1). But |𝐴 × 𝐵| = 𝑚𝑛, by (1.19). So
Now let’s look at more general relations. How many 𝑛-ary relations are there on sets
𝐴1 , 𝐴2 , … , 𝐴𝑛 ?
Put 𝑚𝑖 ∶= |𝐴𝑖 | for each 𝑖 = 1, 2, … , 𝑛.
An 𝑛-ary relation is just a set of 𝑛-tuples from 𝐴1 × 𝐴2 × ⋯ × 𝐴𝑛 . So the number of
𝑛-ary relations is just the size of the power set of 𝐴1 × 𝐴2 × ⋯ × 𝐴𝑛 :
In the special case when all sets are the same (say, 𝐴𝑖 = 𝐴 for all 𝑖) and have size 𝑚, we
have
𝑛
# 𝑛-ary relations on 𝐴 = 2𝑚 .
2.19 EXERCiSES
1. Let 𝐴 be a finite set. How many indicator functions with domain 𝐴 are there?
2. Let 𝐴 and 𝐵 be any sets. Express the indicator functions of each of the following
in terms of the indicator functions 𝜒𝐴 and 𝜒𝐵 .
(a) 𝐴
(b) 𝐴 ∩ 𝐵
(c) 𝐴 ∖ 𝐵
(d) 𝐴△𝐵
3. If two functions 𝑓 ∶ 𝐴 → ℝ and 𝑔 ∶ 𝐴 → ℝ have the same domain 𝐴 and give real
numbers as their values, then their sum 𝑓 + 𝑔 ∶ 𝐴 → ℝ is defined for all 𝑥 ∈ 𝐴 by
What is the sum of all the indicator functions of all the subsets of a finite set?
4. Draw a Venn diagram showing each of the following sets of functions and the rela-
tionships between them: functions, injections, surjections, bijections, identity functions,
binary relations, ternary relations, relations.
5. Functions can be viewed as sets of ordered pairs, so we can combine functions using
set operations. The result will be a binary relation but, depending on the operation, it
may or may not be another function.
Suppose 𝑓 ∶ 𝐴 → 𝐵 and 𝑔 ∶ 𝐶 → 𝐷 are functions. Which of the following is always a
function?
𝑓 ∩ 𝑔; 𝑓 ∪ 𝑔; 𝑓 ∖ 𝑔; 𝑔 ∖ 𝑓; 𝑓△𝑔; 𝑓 × 𝑔 .
For each that is always a function, give its definition in the usual form, including showing
how the rule depends on 𝑓 and 𝑔. For each that is not necessarily a function, explain why
this is the case (e.g., with the help of examples for 𝑓 and 𝑔 under which the operation
does not give a function).
Now let {0, 1}∗+ denote the set of all finite nonempty tuples (or sequences) of bit-
strings. Examples of members of {0, 1}∗+ include:
10. For each 𝑘 ∈ {1, … , 𝑛} and any subset 𝑋 ⊆ 𝐴 with |𝑋 | = 𝑘, write 𝑓𝑘 for the number
of bijections on 𝐴 that fix 𝑋 .
(a) Why don’t we include 𝑋 in the notation 𝑓𝑘 , given that its definition refers to 𝑋 ?
(c) Express |𝐹 | in terms of 𝑓1 , 𝑓2 , … , 𝑓𝑛 , and then use your expression for 𝑓𝑘 to give an
expression for |𝐹 |.
(e) For each of the following sets, what proportion of all bijections on the set are fixed-
point-free? {0, 1}; {♠, ♣, ♢, ♡}; the set of ten decimal digits; the 26-letter English
alphabet.
You may need to use a program like Wolfram Alpha, or a spreadsheet, to help
calculate the latter two.
(f) What is the largest set for which you can determine this proportion, and what is the
value of the proportion for that set? (Use a spreadsheet or a program if you wish.)
11.
(a) Write down all surjections from {1, 2, 3, 4} to {𝑎, 𝑏, 𝑐}. Compare your answer with
Exercise 1.20.
to
{ partitions of {1, 2, 3, 4} into three parts }.
(c) What sizes can the preimages of members of its image be?
13. If the function 𝑓 has an inverse function, which of its iterated compositions 𝑓 (𝑘)
have inverse functions (where 𝑘 ∈ ℕ)?
14. Determine all functions from {1, 2, 3} to itself whose indefinitely iterated compo-
sition is a constant function (i.e., 𝑓 (𝑘) is a constant function if 𝑘 is large enough).
For a challenge: investigate when this happens in general. Try to characterise those
functions 𝑓 ∶ {1, 2, … , 𝑛} → {1, 2, … , 𝑛} such that, for large enough 𝑘, the iterated compo-
sition 𝑓 (𝑘) is constant.
15. This exercise is about the Caesar slide cryptosystem, one of the oldest and
simplest cryptosystems. It is not secure, but its core operation is used in many stronger
and more complex cryptosystems.
Caesar slide encrypts a message by sliding each letter along the alphabet, rightwards,
by some fixed number of steps. This fixed number is the key, 𝑘, and the same amount of
sliding is done to each letter of the message. The sliding is done with wrap-around, so
if sliding ever takes you beyond the end of the alphabet, then you resume sliding at the
2.19 E X E R C i S E S 79
start of the alphabet. In effect, we treat the alphabet as a circle rather than a straight
line.
For example, if the message is5
thefamilyofdashwoodhadlongbeen
wkhidplobrigdvkzrrgkdgorqjehhq
Here, sliding t along the alphabet by 3 steps gives w. We see the wrap-around when
encrypting the letter y, since sliding it step-by-step gives z, then (wrapping around) a,
then b.
Decryption is the reverse of encryption, meaning that we slide 𝑘 steps leftwards
instead of rightwards, again with wrap-around.
We now give a formal definition of this cryptosystem, using the definition of cryp-
tosystems given in §2.10.
• The message space 𝑀 is the set of all strings of English lower case letters (without
blanks).
• The encryption and decryption functions, to be defined below, need to use the
letter addition operation defined by the following table.
5 from the first sentence of Sense and Sensibility by Jane Austen, first published by Thomas Egerton in
London in 1811.
80 FUNCTiONS
+ a b c d e f g h i j k l m n o p q r s t u v w x y z
a a b c d e f g h i j k l m n o p q r s t u v w x y z
b b c d e f g h i j k l m n o p q r s t u v w x y z a
c c d e f g h i j k l m n o p q r s t u v w x y z a b
d d e f g h i j k l m n o p q r s t u v w x y z a b c
e e f g h i j k l m n o p q r s t u v w x y z a b c d
f f g h i j k l m n o p q r s t u v w x y z a b c d e
g g h i j k l m n o p q r s t u v w x y z a b c d e f
h h i j k l m n o p q r s t u v w x y z a b c d e f g
i i j k l m n o p q r s t u v w x y z a b c d e f g h
j j k l m n o p q r s t u v w x y z a b c d e f g h i
k k l m n o p q r s t u v w x y z a b c d e f g h i j
l l m n o p q r s t u v w x y z a b c d e f g h i j k
m m n o p q r s t u v w x y z a b c d e f g h i j k l
n n o p q r s t u v w x y z a b c d e f g h i j k l m
o o p q r s t u v w x y z a b c d e f g h i j k l m n
p p q r s t u v w x y z a b c d e f g h i j k l m n o
q q r s t u v w x y z a b c d e f g h i j k l m n o p
r r s t u v w x y z a b c d e f g h i j k l m n o p q
s s t u v w x y z a b c d e f g h i j k l m n o p q r
t t u v w x y z a b c d e f g h i j k l m n o p q r s
u u v w x y z a b c d e f g h i j k l m n o p q r s t
v v w x y z a b c d e f g h i j k l m n o p q r s t u
w w x y z a b c d e f g h i j k l m n o p q r s t u v
x x y z a b c d e f g h i j k l m n o p q r s t u v w
y y z a b c d e f g h i j k l m n o p q r s t u v w x
z z a b c d e f g h i j k l m n o p q r s t u v w x y
In effect, letters correspond to numbers in {0, 1, 2, … , 24, 25}, and when we do the
letter addition 𝛼 + 𝛽, we start at 𝛼 and slide to the right (with wrap-around
as needed) by a number of steps given by 𝛽. Letter addition is commutative and
associative. We can also define letter subtraction, where we slide to the left instead
of the right.
𝑒(𝑚, 𝑘) = 𝑐1 𝑐2 ⋯ 𝑐𝑛 ,
𝑐𝑖 = 𝑚𝑖 + 𝑘.
2.19 E X E R C i S E S 81
The addition here is letter addition. Note that, for each 𝑖, the same key letter is
used. Although message and cypher letters may (and usually do) change as you
go along the message, the key letter used for encryption does not change.
𝑑(𝑐, 𝑘) = 𝑚1 𝑚2 ⋯ 𝑚𝑛 ,
𝑚𝑖 = 𝑐𝑖 − 𝑘.
16. If 𝑓 is an injection and 𝑓 −1 is its inverse relation, what can you say about 𝑓 ∘ 𝑓 −1
−1
and 𝑓 ∘ 𝑓? What kinds of functions or relations are they, and what are their domains
and codomains?
17. If a binary relation is symmetric, what can you say about its inverse relation?
19. Rephrase the Collatz Conjecture as a statement about the transitive closure of
the Collatz function.
20. Below we give some binary relations on the set of all Python programs. The
Python programs are denoted by 𝑃 and 𝑄, and the binary relations are denoted by
≃1 , ≃2 , … , ≃10 . For each relation, determine whether or not it is an equivalence relation,
and give reasons.
82 FUNCTiONS
21. What can you say about the transitive closure of an equivalence relation?
22. The anagram relation on the set of all English words is defined as follows. Let
𝑥1 𝑥2 ⋯ 𝑥𝑚 and 𝑦1 𝑦2 ⋯ 𝑦𝑛 be two English words of lengths 𝑚 and 𝑛 respectively, where
the 𝑥𝑖 and 𝑦𝑗 are their letters (1 ≤ 𝑖 ≤ 𝑚, 1 ≤ 𝑗 ≤ 𝑛). The ordered pair of words
(𝑥1 𝑥2 ⋯ 𝑥𝑚 , 𝑦1 𝑦2 ⋯ 𝑦𝑛 )
belongs to the anagram relation if and only if there exists a bijection 𝑓 ∶ {1, 2, … , 𝑚} →
{1, 2, … , 𝑛} such that, for all 𝑖 ∈ {1, 2, … , 𝑚}, we have 𝑥𝑖 = 𝑦𝑓(𝑖) .
We assume all letters belong to the usual lower-case English alphabet {a, b, … , z}.
The exact choice of dictionary does not matter for this exercise.
(a) Find the largest set of three-letter words you can that are all related to each other
by the anagram relation.
(b) For each 𝑘 = 1, 2, 3, 4, find an English word of length 𝑘 that is not related to any
other word by anagram.
23. How many non-reflexive binary relations are there on a set of size 𝑛?
24. Recall that when you have a table with 𝑛 columns, its rows represent the 𝑛-tuples
in an 𝑛-ary relation. For 1 ≤ 𝑖 ≤ 𝑛, let 𝐴𝑖 be a set containing all the entries in column 𝑖
(and possibly more elements), so that the columns of the table define an 𝑛-ary relation
on 𝐴1 × 𝐴2 × ⋯ × 𝐴𝑛 . Assuming the columns of this table are all distinct, how many
2.19 E X E R C i S E S 83
ternary relations can you construct by choosing columns from this table of 𝑛 columns?
3
PROOFS
Why should a programmer learn to write mathematical proofs? The most obvious
answer is that proofs give rigorous justification for properties of your programs or of
the structures that they work with. A computer scientist is not just a computer user
or hobbyist or fan. A computer scientist provides rational support for their claims.
Sometimes, this can be done by computational experiments, where programs are run
on a large number of different inputs that hopefully form a representative sample of
the situations of interest, and the outputs are studied carefully and perhaps analysed
statistically. But a proof gives a more fundamental kind of support for a claim. It is
independent of the specific technology which is used to develop and run the program.
It applies to a far wider range of scenarios than can ever actually be run in a set of
computational experiments.
In this chapter, we learn about the nature of mathematical proofs and the art of
writing them. We treat the main types of proof, with emphasis on proof by induction.
We conclude in § 3.14 by reflecting further on the role of proofs in computer science.
Proofs are not only a tool for proving statements about programs; proofs are, themselves,
like programs in many ways, and writing them is like programming, and developing skill
in proof-writing will make you a better programmer.
1 Lemmas can tend to be highly technical and are often forgotten even in cases where the theorem they
help prove becomes famous. But some lemmas have found fame in their own right, e.g., the Handshaking
Lemma, which you’ll meet later in this unit, and Burnside’s Lemma.
2 These days, the plural of “lemma” is “lemmas”, as you would expect. But, traditionally, the plural was
“lemmata”, which you may encounter in old books or papers. Are there other English words where the
plural ending is -ta?
85
86 PROOFS
of less significance than other theorems being proved in the same article/chapter/book.
We also use the term “proposition” in a more specific sense from next week onwards. A
corollary is a theorem that follows almost immediately from another theorem that has
just been stated.
A proof of a claim is a step-by-step argument that establishes, logically and with
certainty, that the claim is true.
A proof consists of a finite sequence of statements, culminating in the claim that is
being proved. These statements are often called the steps of the proof. Each statement
in the proof must be one of the following:
• something you already knew before the start of the proof, i.e.,
– a definition,
– an axiom (i.e., some fundamental property that is always taken to be true for
the objects under discussion, such as the distributive law 𝑥(𝑦 + 𝑧) = 𝑥𝑦 + 𝑥𝑧
for numbers),
– a previously-proved theorem;
or
• an assumption, as a start towards proving something that is true under that as-
sumption;
or
• The last statement in the sequence should establish that the claim follows from
some previous statements in the proof. If the last statement is, by that stage, an
obvious consequence of the statements before it, it is often omitted.
3 Another traditional way to indicate the end of a proof is using the acronym “QED”, which stands for the
Latin phrase “quod erat demonstrandum”, meaning “which was to be proved”. Occasionally, the end of a
proof is indicated by //.
3.1𝛼 T H E O R E M S A N D P R O O F S 87
We illustrate these concepts with the following theorem and proof, which you have
seen before in Exercise 7. We number the steps of the proof, and give each its own line,
to help with later discussion. But we won’t normally do this in proofs.
Theorem 13.
13 For any sets 𝐴 and 𝐵, if 𝐴 ⊆ 𝐵 then 𝒫(𝐴) ⊆ 𝒫(𝐵).
Proof.
(1) Assume 𝐴 ⊆ 𝐵.
(2) Let 𝑋 ∈ 𝒫(𝐴).
(3) Therefore 𝑋 ⊆ 𝐴, by definition of 𝒫(𝐴).
(4) Therefore 𝑋 ⊆ 𝐵.
(5) Therefore 𝑋 ∈ 𝒫(𝐵).
(6) So we have shown that 𝑋 ∈ 𝒫(𝐴) implies 𝑋 ∈ 𝒫(𝐵).
(7) Therefore 𝒫(𝐴) ⊆ 𝒫(𝐵).
• What type of proof step is it? (See our listing of types of proof steps above.)
• Does it make use of any previous steps? If not, why not? If so, how?
• Does it use any subsequent steps? If not, good! If so, we have a problem!
The following table repeats all the proof steps, slightly expanding some of them, and
annotating each step to show how it relates to our discussion of proof steps above.
88 PROOFS
(6) So 𝑋 ∈ 𝒫(𝐴) implies 𝑋 ∈ 𝒫(𝐵). This really just summarises what we’ve done
over steps (2)–(5).
(7) Therefore 𝒫(𝐴) ⊆ 𝒫(𝐵). This is a logical consequence of (6) and the
definition of the subset relation.
The backbone of any proof consists of its logical deductions. These are the steps where
we deduce a logical consequence of previous steps. If a proof does not have logical
deductions, then it’s just a collection of known facts and assumptions, and we get nothing
new. It’s a kind of “logical jelly” without structure or substance.
When making a logical deduction from a previous statement in a proof, the funda-
mental principle is:
For example, consider the deduction we made at step (3) of the proof of Theorem 13.
At that stage, we know from earlier steps that
• 𝑋 ∈ 𝒫(𝐴), which we can treat as true simply because it’s the definition of 𝑋 (step
(2));
If we let 𝑃 stand for 𝑋 ∈ 𝒫(𝐴), and 𝑄 stand for 𝑋 ⊆ 𝐴, then 𝑃 ⇒ 𝑄 represents the
assertion that
if 𝑋 ∈ 𝒫(𝐴) then 𝑋 ⊆ 𝐴.
So, step (2) is 𝑃, the definition of power set gives 𝑃 ⇒ 𝑄, and logical deduction (or
modus ponens, if we want to practise our Latin) then gives 𝑄.
The role of implication in logical deduction is crucial. An implication 𝑃 ⇒ 𝑄 is the
link that translates the truth of 𝑃 into the truth of 𝑄. So let us consider it further.
Suppose we have two dominos standing on their ends, a short distance apart and
with their faces parallel, as shown side-on in Figure 3.1.
Let 𝑃 be the statement that the left domino falls to the right, and let 𝑄 be the
statement that the right domino falls to the right. Each of these statements could be
true or false. There is no requirement here for either domino to fall; it’s fine for them
both to remain standing. It’s also fine for the left domino to remain standing but
for the right one to fall (by whatever means). But if the left domino falls, then the
right domino must fall too. It is impossible to have the left domino fall with the right
domino remaining standing. So we have three possible situations, which we’ll represent
as ordered pairs:
4 That definition actually gives “if and only if” here: 𝑋 ∈ 𝒫(𝐴) ⇔ 𝑋 ⊆ 𝐴. But we do not need the reverse
implication right now.
90 PROOFS
We can depict these various possibilities using sets. Let 𝑃 be the set of those
situations where the left domino falls, and let 𝑄 be the set of those situations where the
right domino falls. Then
The various situations and the sets 𝑃 and 𝑄 are shown in a Venn diagram in Figure 3.3.
Observe that the impossible situation
is not shown on the Venn diagram. If it were possible, then it would belong to 𝑃 ∖ 𝑄
and then 𝑃 would not be a subset of 𝑄. But its impossibility means that 𝑃 ∖ 𝑄 = ∅
and 𝑃 ⊆ 𝑄.
This example illustrates the general principle that the logical implication 𝑃 ⇒ 𝑄
between the statements 𝑃 and 𝑄 corresponds to the subset relation 𝑃 ⊆ 𝑄 between the
sets of situations they represent.
3.2 L O G i C A L D E D U C T i O N 91
𝑃
(left falls, right falls)
We have seen this principle in action before, when the statements are framed as
statements about set membership, on p. 7 in § 1.6. For any two sets 𝐴 and 𝐵,
Our domino example illustrates that the link between the subset relation and logical
implication is more general, and applies even where the logical implication is not stated
in terms of set membership.
An important special case of implication is when the starting condition (on the left)
is false. If we have an implication 𝑋 ⇒ 𝑌 when 𝑋 is false, then the implication 𝑋 ⇒ 𝑌
is true regardless of what 𝑌 is. This corresponds to the fact that, if 𝑋 is the empty set
and 𝑌 is any set, then 𝑋 ⊆ 𝑌, since ∅ ⊆ 𝑌; the empty set is a subset of every set.
Keep in mind that the truth of an implication does not mean that either of its two
parts is true. In our domino example, we know that 𝑃 ⇒ 𝑄, but this does not mean
that 𝑃 falls or that 𝑄 falls. It just gives a logical relationship between these two events,
namely that if 𝑃 falls then 𝑄 falls.
Note also that this is a purely logical relationship. In the domino scenario, there is
also the ingredient of time. This plays a role in the physical mechanism by which the
falling of 𝑃 (if it happens) causes the falling of 𝑄, and it follows from that mechanism
that the fall of 𝑄 happens later, in time, than the fall of 𝑃. But this is a detail of the
actual physical setting. Logical implication itself is not an assertion about time, but
merely an assertion about the truth or falsehood of the two parts, in this case 𝑃 and 𝑄.
92 PROOFS
Indeed, it is entirely possible that logical implication can “go backwards” in time.
Suppose 𝑋 is the statement that you can see stars in the sky and 𝑌 is the statement
that the sun has set. The sun setting does not itself mean that you can see stars, since it
might be too cloudy. But if you can see stars, then you know the sun has set. So 𝑋 ⇒ 𝑌
holds, even though 𝑋 happened after 𝑌, and there is no suggestion that 𝑋 causes 𝑌.
Always keep in mind that implication is not symmetric. In the domino example, we
have 𝑃 ⇒ 𝑄, but we do not have 𝑄 ⇒ 𝑃, because 𝑄 falling does not have to be because
𝑃 falls (as we have supposed throughout that 𝑄 could be made to fall, by some external
force, even if 𝑃 remains standing). In the sunset example just given, we have 𝑋 ⇒ 𝑌,
but we do not have 𝑌 ⇒ 𝑋 , because the sun setting does not imply that you can see
stars (as it might be cloudy).
The converse of an implication is the reverse implication, i.e., the implication you
get by swapping the order of the two parts or by reversing the direction of the implication
arrow symbol.5 So the converse of 𝑃 ⇒ 𝑄 is 𝑄 ⇒ 𝑃, which can also be written 𝑃 ⇐ 𝑄.
When an implication holds, we cannot assume that the converse also holds. In the
examples of the previous paragraph, we showed that, for the two example implications
we have been discussing, the converse does not hold.
Sometimes, an implication and its converse both hold. For example, suppose you
are holding a ball above the ground. Suppose 𝑅 means that you release the ball and
𝑆 means that the ball falls to the ground. Then 𝑅 ⇒ 𝑆, and its converse 𝑆 ⇒ 𝑅 holds
too. In these situations, we can put the two implications together to make a two-way
implication, written 𝑅 ⇔ 𝑆, which means that 𝑅 and 𝑆 are logically equivalent, and
we often say that 𝑅 holds if and only if 𝑆 holds. We have seen statements of this type
before, in some of our Theorems.
3.3 P R O O F S O F E X i S T E N T i A L A N D U N i V E R S A L S TAT E M E N T S
We now give more examples of theorems and proofs, highlighting the relationship be-
tween the kind of statement you are trying to prove and the way you need to prove
it.
Theorem 14.
14 English has a palindrome.
Proof. ‘rotator’ is an English word and also a palindrome.
We have only shown a few lines of this proof. The number of lines of the full proof
equals the number of English words, which is several tens of thousands.
A universal statement is a statement that asserts that everything within some set
has a specified property.
To prove a universal statement, such as …
For every English word, it has a vowel or a ‘y’
… you need to cover every possible case.
One way is to go through all possibilities, in turn, and check each one. But the
number of things to check may be huge, or infinite. So usually we want to reason in a
way that can apply to many different possibilities at once.
There is no systematic method for finding proofs for theorems. There are deep theoretical
reasons for this, based on work in the 1930s (Gödel, 1931; Church, 1936; Turing, 1936).
Discovering proofs is an art as well as a science. It requires
• skill at logical thinking and reasoning
• perseverence.
94 PROOFS
Although we can’t give a recipe for discovering proofs, we will give some general
advice on dealing with some common situations.
To prove subset relations, 𝐴 ⊆ 𝐵 (where 𝐴 and 𝐵 are sets):
1. Prove 𝐴 ⊆ 𝐵
2. Prove 𝐴 ⊇ 𝐵
1. Prove 𝐴 ≤ 𝐵
2. Prove 𝐴 ≥ 𝐵
• Proof by construction
• Proof by cases
• Proof by contradiction
• Proof by induction.
3.6 P R O O F B Y S Y M B O L i C M A N i P U L AT i O N
Some proofs proceed by a sequence of equations, where each equation uses some basic
law (an axiom or some fundamental theorem) about the objects in question.
Many proofs you did in secondary school mathematics would have been of this type.
For example, consider the difference of two squares:
Incidentally, this proof also illustrates that, if you want to prove an equation, you can
start with either side of the equation and work towards the other side. You don’t have
to start with the left side, just because you read it first! In the above proof, we started
with the right side. Any proof of this type can be done in either direction, but sometimes
one direction seems more intuitive than the other.
Similarly, the basic laws of sets can be used to prove equality between some ex-
pressions involving sets. We have already seen some proofs of this type, in Theorem 1,
Corollary 2, and Theorem 5,
We can also use symbolic manipulation in parts of other proofs, as we did for example
in the proof of Theorem 10, where we used basic properties of function composition
and inverse functions to do chains of equalities that establish the desired claims about
inverses of compositions of functions.
– If a theorem asserts that every object has some property, then it’s not enough
to just construct one object with the property.
96 PROOFS
• constructing an example that has the claimed property, thinking that a convincing
example is enough to prove that the property holds for other objects too.
– An example may be useful in illustrating a proof or explaining the key ideas
of a proof. But it is not, of itself, a proof.
We saw an example in Theorem 15. That was not typical of proofs by cases, since
the number of cases was so large (one case for each English dictionary word) and the
number of possibilities to be covered was finite. More typically, a theorem asserts that
every object from some infinite set has some property, and we divide the infinite set up
into a small finite number of cases, and do a separate proof for each of the cases. In
such situations, some of these cases must cover an infinite number of objects, and our
reasoning in each case must be general enough to apply to all the objects covered by
that case.
It’s ok if cases overlap (although it might indicate that the proof includes some
unnecessary duplication of effort and is inefficient). But they must be exhaustive in the
sense that every object considered by the theorem must belong to (at least) one of the
cases.
Our first proof by contradiction is somewhat whimsical, but has the structure of
many proofs of this type and illustrates some important points about such proofs.
3.10 P R O O F B Y M AT H E M AT i C A L i N D U C T i O N 97
Proof. Assume that not every natural number is interesting. So, there exists at least
one uninteresting number. Therefore there exists a smallest uninteresting number. But
that number must be interesting, by virtue of having this special property of being the
smallest of its type. This is a contradiction, as this number is uninteresting. Therefore
our original assumption was wrong. Therefore every natural number is interesting.
Comments:
That “theorem” and “proof” is really just an informal argument, as the meaning of
“interesting” is imprecise and subjective.
But it illustrates the structure of proof by contradiction.
It also illustrates the point that, if you know an ordered set of objects is nonempty,
then you can choose an element of smallest size in the set.
Often, the smallest object in a set may have special properties that can help you go
further in the proof.
Theorem 17 (Euclid).
(Euclid) There are infinitely many prime numbers.
Proof. Suppose, by way of contradiction, that there are only finitely many primes.
Let 𝑛 be the number of primes.
Let 𝑝1 , 𝑝2 , … , 𝑝𝑛 be all the primes.
Define: 𝑞 ∶= 𝑝1 ⋅ 𝑝2 ⋅ ⋯ ⋅ 𝑝𝑛 + 1.
This is bigger than every prime 𝑝𝑖 . Therefore 𝑞 must be composite.
Therefore 𝑞 is a multiple of some prime.
But, for each prime 𝑝𝑖 , if you divide 𝑞 by 𝑝𝑖 , you get a remainder of 1.
So 𝑞 cannot be a multiple of 𝑝𝑖 .
So 𝑞 cannot be a multiple of any prime. This is a contradiction.
So our initial assumption was wrong.
So there are infinitely many primes.
3.10 P R O O F B Y M AT H E M AT i C A L i N D U C T i O N
Suppose you want to prove that a statement 𝑆(𝑛) holds for every natural number 𝑛.
6 See, e.g., Ch. 14 (Fallacies), in: Martin Gardner, The Scientific American Book of Mathematical Puzzles
and Diversions, Simon & Schuster, New York, 1959.
98 PROOFS
One powerful technique for proving theorems of this type is the Mathematical
Induction. It is widely used across computer science and is particularly useful in
Induction
proving theorems about the behaviour of algorithms and the discrete structures they
work with including those considered in this unit. You will also use it extensively in
later Computer Science units: FIT2004, FIT2014 and MTH3170/3175.
The Principle of Mathematical Induction is:
• since 𝑆(1) is true and the inductive step tells us that 𝑆(1) ⇒ 𝑆(2),
we deduce that 𝑆(2) is true;
• since 𝑆(2) is true and the inductive step tells us that 𝑆(2) ⇒ 𝑆(3),
we deduce that 𝑆(3) is true;
• ⋮
⋮
𝐴1 ∪ 𝐴2 ∪ ⋯ ∪ 𝐴𝑛 = 𝐴1 ∩ 𝐴2 ∩ ⋯ ∩ 𝐴𝑛 . (3.2)
Proof. Let 𝑆(𝑛) be the statement that (3.2) holds for 𝑛. We must prove that, for all
𝑛 ∈ ℕ, the statement 𝑆(𝑛) is true.
We prove it by induction on the number 𝑛 of sets.
3.10 P R O O F B Y M AT H E M AT i C A L i N D U C T i O N 99
𝐵
𝐴 𝑈
Inductive basis:
𝑆(1) is trivially true. In that case, each side of (3.2) is just 𝐴1 , so the equation holds.
Inductive step:
Let 𝑘 ≥ 1.
Suppose 𝑆(𝑘) is true:
𝐴1 ∪ 𝐴2 ∪ ⋯ ∪ 𝐴𝑘 = 𝐴1 ∩ 𝐴2 ∩ ⋯ ∩ 𝐴𝑘 .
𝐴1 ∪ 𝐴2 ∪ ⋯ ∪ 𝐴𝑘+1
= (𝐴1 ∪ 𝐴2 ∪ ⋯ ∪ 𝐴𝑘 ) ∪ 𝐴𝑘+1 (just grouping …) (3.3)
= 𝐴1 ∪ 𝐴2 ∪ ⋯ ∪ 𝐴𝑘 ∩ 𝐴𝑘+1 (by De Morgan’s Law for two sets, Theorem 1) (3.4)
= 𝐴1 ∩ 𝐴2 ∩ ⋯ ∩ 𝐴𝑘 ∩ 𝐴𝑘+1 (by Inductive Hypothesis)
Conclusion:
So, by the Principle of Mathematical Induction, it’s true for any number of sets.
Because induction helps prove theorems about statements that hold for all positive
integers, it is a natural tool for proving statements about infinite sequences. In the next
section (§ 3.11), we will use it to prove some statements about infinite sequences of
100 PROOFS
numbers. Because of this, it is a useful tool for proving statements about the behaviour
of loops in programs; we will see this in the next section too.
A key thought process in the inductive step is to construct, from the “(𝑘 +1)-object”,
a smaller object to which you can apply the inductive hypothesis. In the previous proof,
our “(𝑘 + 1)-object” is
𝐴1 ∪ 𝐴2 ∪ ⋯ ∪ 𝐴𝑘+1 .
Our aim is to show that this satisfies (3.2) (with 𝑛 = 𝑘 + 1). In order to do this, we try
to construct, from it, a “𝑘-object”, in this case
𝐴1 ∪ 𝐴 2 ∪ ⋯ ∪ 𝐴 𝑘 .
We first group the first 𝑘 sets, in (3.3), as a step in this direction. Then applying De
Morgan’s Law for two sets, in (3.4), gives us what we are aiming for: 𝐴1 ∪ 𝐴2 ∪ ⋯ ∪ 𝐴𝑘 ,
our “𝑘-object”, shown in green in (3.4). Once we have the “𝑘-object”, we can apply the
Inductive Hypothesis to it.
This technique, of reducing an object to a simpler object (or objects) of the same
type, is known as recursion
recursion. It is one of the most fundamental problem solving strate-
gies in computer science; you will encounter it and use it again and again and again. It
is also provided for in most programming languages, when functions or methods can call
themselves. By using this skill in proofs by induction, you are practising a core skill of
your discipline. Moreover, proof by induction is the go-to proof technique for proving
claims about the behaviour of recursive functions in programs.
For another use of induction, consider the following equation, which gives a formula for
obtaining the sum of the first 𝑛 positive integers, so that you can compute this sum
without having to add up all those numbers.
𝑛(𝑛 + 1)
1 + 2 + 3 + ⋯ + (𝑛 − 1) + 𝑛 = . (3.5)
2
This is very useful in computer science and in fact throughout science, engineering and
in many other fields. We illustrate this with an application.
Suppose you are trying to construct the Monash timetabling network described in
§ 2.13. You have access to a function which tells you, for any pair of units, how many
students are doing both those units. You need to search all pairs of units, check how
many students are doing both units in a pair, and if that number is nonzero, add a link
between those units, to record the fact that their classes must be at different times.
A natural approach is to use two nested loops. The outer loop iterates over all units
in some order (e.g., lexicographic order). For each unit considered in the outer loop, the
inner loop iterates over all units that come later in the order. This saves duplication: if
3.11 i N D U C T i O N : M O R E E X A M P L E S 101
the inner loop also iterated over all units, then each pair of units would be considered
twice, once for each of the two possible ordered pairs based on those units.
If there are 𝑁 units altogether, then these nested loops have the structure
for each 𝑖 = 1, 2, … , 𝑁 :
for each 𝑗 = 𝑖 + 1, 𝑖 + 2, … , 𝑁 :
if at least one student is doing both the 𝑖-th unit and the 𝑗-th unit,
then make a link between these two units.
There are 𝑁 iterations of the outer loop, but these have varying numbers of inner loop
iterations. If we want to work out how long this computation takes, we need to know
how many inner loop iterations are done altogether.
• For the first iteration of the outer loop, 𝑖 = 1, and the inner loop starts at 𝑖 + 1 =
1 + 1 = 2 and considers 𝑗 = 2, 3, … , 𝑁 , so it does 𝑁 − 1 iterations.
• For the second iteration of the outer loop, 𝑖 = 2, and the inner loop considers
𝑗 = 3, 4, … , 𝑁 , so it does 𝑁 − 2 iterations.
• For the third iteration of the outer loop, the inner loop does 𝑁 − 3 iterations.
• …and so on …
• For the (𝑁 − 1)-th iteration of the outer loop, we have 𝑖 = 𝑁 − 1, so there is only
one iteration of the inner loop, with 𝑗 = 𝑖 + 1 = (𝑁 − 1) + 1 = 𝑁 .
• The 𝑁 -th iteraton of the outer loop actually has no inner loop iterations at all,
because the range 𝑖 + 1 ≤ 𝑗 ≤ 𝑁 is empty, because 𝑖 + 1 > 𝑁 .
(𝑁 − 1) + (𝑁 − 2) + ⋯ + 2 + 1.
This expression is of the same type as the left-hand side of (3.5). We’ve written it in a
different order, but that doesn’t matter at all. The only other difference is that we’re
adding up the first 𝑁 − 1 positive integers instead of the first 𝑛, but we can still use the
right-hand side of (3.5) to give us the answer, with an appropriate substitution.
Nested loop structures along these lines are very common in programming, and
equations like (3.5) enable us to give good estimates of how long they will take even
before we run the program.
So, how do we prove (3.5)? There are many beautiful proofs of this fact, and we
will see at least two proofs in this unit. The first we give now, using induction.
Theorem 19.
19 For all 𝑛:
𝑛(𝑛 + 1)
1+⋯+𝑛 = .
2
102 PROOFS
Inductive basis:
When 𝑛 = 1, LHS = 1 and RHS = 1(1+1)/2 = 1.
Inductive step:
Let 𝑘 ≥ 1.
Suppose it’s true for 𝑛 = 𝑘:
1 + ⋯ + 𝑘 = 𝑘(𝑘 + 1)/2.
Conclusion:
Therefore, by the Principle of Mathematical Induction, the equation holds for all 𝑛.
Notice, in the inductive step, how we construct, from the “(𝑘 +1)-object” 1+2+⋯+
(𝑘 + 1), a “𝑘-object” 1 + 2 + ⋯ + 𝑘. In this case, it’s just a matter of grouping, since the
𝑘-object just sits within the (𝑘 + 1)-object. As soon as we construct the 𝑘-object, we
can apply the Inductive Hypothesis to it.
Often, as in our proof of Theorem 18, we need to do some more work to construct
the 𝑘-object from the larger (𝑘 + 1)-object.
There is a slightly different way of writing inductive proofs that you are likely to
come across. We could make the inductive step go from 𝑛 = 𝑘 − 1 to 𝑛 = 𝑘 , instead of
from 𝑛 = 𝑘 to 𝑛 = 𝑘 + 1. Let us re-do the previous proof in this way.
Inductive basis:
3.11 i N D U C T i O N : M O R E E X A M P L E S 103
Inductive step:
Let 𝑘 ≥ 2. [Note the change here!]
Suppose it’s true for 𝑛 = 𝑘 − 1, where 𝑘 ≥ 2:
1 + ⋯ + (𝑘 − 1) = (𝑘 − 1)𝑘/2.
Conclusion:
Therefore, by the Principle of Mathematical Induction, the equation holds for all 𝑛.
□
We will mostly frame our inductive steps as going from 𝑛 = 𝑘 (where we assume
the statement is true) to 𝑛 = 𝑘 + 1 (where we deduce it’s true, using the Inductive Hy-
pothesis), as we did in our first proof of Theorem 19. But there is nothing wrong with
doing it from 𝑛 = 𝑘 − 1 to 𝑛 = 𝑘, as in the second proof above. If you read other books
and resources, you will find that some authors do it one way, while others do it the
other way. If you do it the second way (from 𝑛 = 𝑘 − 1 to 𝑛 = 𝑘), then you need to take
care that the inductive step is grounded in the inductive basis. In the first proof, the
inductive step starts with “Let 𝑘 ≥ 1”; in the second proof, the inductive step starts with
“Let 𝑘 ≥ 2” which ensures that 𝑘 − 1 ≥ 1 so that the inductive step is, indeed, grounded
in the inductive basis.
Even though we have now proved Theorem 19, you are entitled to wonder where the
expression, 𝑛(𝑛 +1)/2, came from in the first place. There are many ways to derive this
expression from scratch. We will discuss this in more detail later, in § 6.12.
When you first try to work out what the expression should be, you might explore
the first few cases and try to discern a pattern. This may lead you to conjecture that
the expression is 𝑛(𝑛 + 1)/2.
104 PROOFS
A pattern that works for small cases is not, in itself, a proof. But it might still help
you come up with a proof, since now you have something to aim for: a conjecture that
you can try to prove. And, in this case, mathematical induction can be used to prove
the conjecture.
We will discuss this explore-conjecture-prove methodology, for discovering and prov-
ing formulae for mathematical expressions, in § 6.6.
Mathematical induction is a proof technique, and as such can only be used once you
have a statement — in this case, an equation — to prove. It won’t help you discover
what the right equation should be; that’s where exploration and conjecture come in.
The upside is that you get a rigorous proof for a conjecture which you might otherwise
have remained unsure about or been unable to justify fully.
Having used induction to prove that the sum of the first 𝑛 positive integers is indeed
𝑛(𝑛 + 1)/2, it is natural to ask, what about sums of higher powers of positive integers?
To start with, what about the sum of the squares of the first 𝑛 positive integers, i.e.,
12 +22 +⋯+𝑛2 ? This gives the number of iterations in the following triply-nested loops:
for each 𝑖 = 1, 2, … , 𝑛:
for each 𝑗 = 𝑖, 𝑖 + 1, … , 𝑛:
for each 𝑘 = 𝑖, 𝑖 + 1, … , 𝑛:
[some action]
If you’d like a challenge, explore 12 +22 +⋯+𝑛2 by computing it for several small values
of 𝑛 and trying to understand its behaviour. How does it grow as 𝑛 increases? Then
try to conjecture a possible formula for it, and then try to prove it by induction.
We now give another detailed example of proof by induction. This is quite involved
but it does give extended practice at reasoning about functions and relations as well as
induction.
Consider the following problem. Suppose there are 𝑚 job vacancies to be filled from
a pool of 𝑛 applicants. For each vacant job, the employer has constructed a shortlist of
applicants. Is it possible for each position to be filled by a different applicant, so that
no position remains unfilled?
If 𝑚 > 𝑛, then this is impossible: there are simply too many positions for the
available applicants, and some number of positions must remain unfilled. If 𝑚 ≤ 𝑛, then
it may or may not be possible, depending on the shortlists.7
We model this problem as follows. Let 𝐴 be the set of positions, with |𝐴| = 𝑚, and
let 𝐵 be the set of applicants, with |𝐵| = 𝑛. Let 𝑆 ⊆ 𝐴 × 𝐵 be the shortlist relation, a
binary relation consisting of all pairs (𝑎, 𝑏) such that the shortlist for position 𝑎 includes
7 If 𝑚 < 𝑛, then some applicants will not get a position, but that is permitted in this problem. One could
also imagine a scenario where it is the applicants who have shortlists and the employers who have no choice.
3.12𝜔 i N D U C T i O N : E X T E N D E D E X A M P L E 105
applicant 𝑏. The full shortlist for position 𝑎 is the set 𝑆(𝑎), which is a subset of 𝐵; here,
we use the notation introduced on p. 67 in § 2.13. If we have a set of positions 𝑋 ⊆ 𝐴,
then we can, if we wish, form a combined shortlist for the set of positions by taking the
union of all the individual shortlists:
We want each position to be filled by exactly one applicant, so the set of pairs (𝑎, 𝑏)
that specify allocation of applicants to positions must be a function. Furthermore, we
want each position to be filled by an applicant who does not also fill any other position;
no applicant fills two positions. So we want, for each 𝑎 ∈ 𝐴, a unique 𝑏 ∈ 𝐵 such that
(𝑎, 𝑏) ∈ 𝑆. These pairs (𝑎, 𝑏) that specify a valid allocation of applicants to positions,
filling each position with a unique applicant, must therefore be an injection. So we are
asking: does the shortlist relation 𝑆 contain an injection?
Some situations can be dealt with easily.
If 𝑚 > 𝑛, then 𝐵 is too small to enable an injection to it from 𝐴, so 𝑆 contains no
injection. More generally, suppose there is a set 𝑋 ⊆ 𝐴 of jobs for which the union of
their shortlists 𝑆(𝑋 ) is smaller than 𝑋 , i.e., |𝑋 | > |𝑆(𝑋 )|. Then 𝑆 cannot contain an
injection, since an injection would ensure that the shortlists for 𝑋 together include at
least |𝑋 | applicants. We have now proved:
Theorem 20. 20 If a shortlist relation from 𝐴 to 𝐵 contains an injection with domain 𝐴
then, for all 𝑋 ⊆ 𝐴, we have |𝑋 | ≤ |𝑆(𝑋 )|.
(The condition here means that, for every set 𝑋 of jobs, the union of their shortlists
contains at least as many applicants as there are jobs in 𝑋 .) □
More surprisingly, the converse is true as well. We will prove this by induction.
Theorem 21.
21 Let 𝑆 ⊆ 𝐴×𝐵 be a binary relation. If, for all 𝑋 ⊆ 𝐴 we have |𝑋 | ≤ |𝑆(𝑋 )|,
then 𝑆 contains an injection with domain 𝐴.
Proof. We prove this by induction on |𝐴|.
Inductive basis:
When |𝐴| = 1, the set 𝐴 contains a single element 𝑎. Using 𝑋 = {𝑎}, if the shortlist
for this one job has at least one applicant, then there exists 𝑏 ∈ 𝐵 such that (𝑎, 𝑏) ∈ 𝑆.
So 𝑆 contains the simple function whose sole ordered pair is (𝑎, 𝑏). This function is an
injection with domain 𝐴. So the claim holds for |𝐴| = 1.
Inductive step:
Let 𝑚 ≥ 1.
Assume that the following holds for every binary relation 𝑇 ⊆ 𝐶×𝐷 in which |𝐶| ≤ 𝑚:
If, for each 𝑋 ⊆ 𝐶 we have |𝑋 | ≤ |𝑇(𝑋 )|, then 𝑇 contains an injection with
domain 𝐶.
106 PROOFS
This is our Inductive Hypothesis. Note that this is an implication. Like any implication,
it has a condition (namely, “for each 𝑋 ⊆ 𝐴 we have |𝑋 | ≤ |𝑇(𝑋 )|”) and a consequence
(namely, “𝑇 contains an injection with domain 𝐶”). In assuming this Inductive Hypoth-
esis, we are not assuming that the condition is true, and we are not assuming that the
consequence is true. We are merely assuming that, if the condition is true, then the
consequence is true.
Let 𝑆 ⊆ 𝐴 × 𝐵 where |𝐴| = 𝑚 + 1. Suppose that the following condition holds:
𝑔 ∪ {(𝑎, 𝑏)} ⊆ 𝑆,
• 𝑔 is an injection, and
|𝑋 | ≤ 𝑚. So we can apply the Inductive Hypothesis, with 𝐶 = 𝑋 and |𝐶| ≤ 𝑚. Now, the
Inductive Hypothesis is an implication, with condition and consequence as discussed
above. We first establish that the condition holds in our current situation. Recall
that every 𝑌 ⊆ 𝐴 satisfies |𝑌| ≥ |𝑆(𝑌)|, by (3.6). So, certainly every 𝑌 ⊆ 𝐶 satisfies
|𝑌| ≥ |𝑆(𝑌)| (since 𝐶 ⊆ 𝐴). Also, 𝑇 is just the restriction of 𝑆 to 𝐶 ×𝐵, so 𝑇(𝑌) = 𝑆(𝑌)
(because 𝑌 ⊆ 𝐶). Therefore every 𝑌 ⊆ 𝐶 satisfies |𝑌| ≥ |𝑇(𝑌)|. Therefore 𝑇 satisfies
the condition of the Inductive Hypothesis. Therefore, by the Inductive Hypothesis, its
consequence also holds, namely that 𝑇 contains an injection with domain 𝑋 , which we
call 𝑔. It has domain 𝑋 = 𝐶, and one suitable codomain is 𝑇(𝐶) which is the same as
𝑆(𝐶). So we can write 𝑔 ∶ 𝐶 → 𝑆(𝐶).
This injection 𝑔 is a step towards constructing an injection in 𝑆. Its domain is 𝑋 , so
it fills the jobs in that set. But 𝑋 is a proper subset of 𝐴, so there are other jobs what
𝑔 does not fill.
So we need to find a way to fill jobs in 𝐴 ∖ 𝑋 . Again, we work towards applying the
Inductive Hypothesis. Observe that 𝐴 ∖ 𝑋 is a nonempty proper subset of 𝐴, since 𝑋
is too.
Let 𝑍 ⊆ 𝐴 ∖ 𝑋 . We have
Let 𝑈 be the binary relation obtained from 𝑋 by restricting to those pairs (𝑎, 𝑏) such
that 𝑎 ∉ 𝑋 and 𝑏 ∉ 𝑆(𝑋 ). In other words, 𝑈 is the restriction of 𝑆 to (𝐴∖𝑋 )×(𝐵∖𝑆(𝑋 )).
If 𝑍 ⊆ 𝐴 ∖ 𝑋 , then 𝑈(𝑍) consists of those pairs (𝑎, 𝑏) such that 𝑎 ∈ 𝑍, 𝑏 ∈ 𝑆(𝑍) and
𝑏 ∉ 𝑆(𝑋 ). So
𝑈(𝑍) = 𝑆(𝑍) ∖ 𝑆(𝑋 ).
This, together with (3.7), gives
|𝑍| ≤ |𝑈(𝑍)|.
We are now in a position to apply the Inductive Hypothesis. It can be used on 𝑈,
because |𝐴 ∖ 𝑋 | ≤ 𝑚 (which follows from the fact that 𝐴 ∖ 𝑋 is a proper subset of 𝐴).
Its condition also holds, because, as we have just seen, |𝑍| ≤ |𝑈(𝑍)| for all 𝑍 ⊆ 𝐴 ∖ 𝑋 .
Therefore the consequence follows, namely that 𝑈 contains an injection with domain
𝐴 ∖ 𝑋 . Call this injection ℎ. Its domain is 𝐴 ∖ 𝑋 and its codomain can be taken to be
𝐵 ∖ 𝑋 . So the domains of 𝑔 and ℎ are disjoint, and their codomains are disjoint too.
108 PROOFS
Since 𝑔 and ℎ are functions on disjoint domains and the union of their domains is 𝐴,
the union 𝑔 ∪ ℎ is also a function and its domain is 𝐴. Also, since 𝑔 and ℎ are both
injections and their codomains are disjoint, their union 𝑔 ∪ ℎ is also an injection.
We have constructed an injection 𝑔 ∪ ℎ with domain 𝐴. Since 𝑔 ⊆ 𝑇 and 𝑇 ⊆ 𝑆, and
since also ℎ ⊆ 𝑈 and 𝑈 ⊆ 𝑆, we have 𝑔 ∪ ℎ ⊆ 𝑆. So 𝑆 contains an injection with domain
𝐴.
This completes Case 2. Since Cases 1 and 2 cover all possibilities, the Inductive Step
is now complete.
Conclusion:
Therefore, by the Principle of Mathematical Induction, the theorem holds.
3.13 M AT H E M AT i C A L i N D U C T i O N A N D S TAT i S T i C A L i N D U C T i O N
The use of the term “induction” here is different to the use of “induction” in statistics,
which is the process of drawing general conclusions from data. Statistical induction is
3.14 P R O G R A M S A N D P R O O F S 109
typically used in situations where there is some randomness in the data, and conclusions
drawn can include some amount of error provided the conclusions drawn are significant
enough that the error is unimportant. It is not a process of logical deduction; this,
together with the presence of errors, means that it cannot be used as a step in a math-
ematical proof. By contrast, Mathematical Induction is a rigorous and very powerful
tool for logical reasoning in mathematical proofs.
At the start of this chapter, we discussed the role of proofs in computer science, in
particular their importance in providing rigorous, rational support for general claims
about programs and the structures they work with.
The different types of proofs we have introduced can each be used to prove statements
about particular programming language constructs. If we want to prove something
about an if-statement, then we would normally use proof by cases. If we want to prove
something about a function that calls itself recursively, then we would normally use
proof by induction. Proving statements about loops in programs is also often done by
induction.
But there is a deeper reason for programmers to develop skills in writing proofs.
Programming is often thought of as a completely different activity to writing math-
ematical proofs. In fact, the two activities are surprisingly similar, and developing the
skill of writing mathematical proofs will make you a better programmer.
Let’s compare programs and proofs.
Each consists of a sequence of precise statements. Each of these statements obeys
the rules of some language: for programs, this is the programming language; for proofs,
this is the language of mathematics augmented by precise use of a natural language such
as English.
Each uses variables, which are names that can refer to any object of some type.
Variables can be combined into expressions using the operations that are available for
objects of that type. One difference between programs and proofs is that, in programs,
variables have memory set aside for them, but this does not happen in proofs.
Each statement must depend on previous statements in a precise way. In a program,
when an operation is applied to a variable, it uses the most recently computed value
of that variable; it does not “look ahead” and use some other value of the variable that
will be computed later. In a proof, each statement is a logical consequence of previous
statements (not of later statements), or it might be just an already-known fact or axiom.
In a program, we can use an if-statement to decide, according to some logical con-
dition, which set of statements to execute next. In a proof, we can use a finite number
of cases, which together cover all possible situations (§ 3.8). Each case pertains to
situations that satisfy some logical condition.
Programs can call other programs, which may have been written by you or by other
people. Proofs can use other theorems, which may have been written by you or by other
110 PROOFS
people. So building a proof, using theorems that have been proved previously, is like
writing a function in your program that calls other functions.
In a program, a block of statements can be executed repeatedly, for as long as some
logical condition is satisfied. Another way to do repetition is to use recursion, in which
a function or method calls itself. The analogue in proofs is mathematical induction.
A program crashes if it encounters a situation that its statements cannot deal with.
We say it has a bug, and this should be fixed. Ideally, the program never crashes. If a
proof cannot deal with some situation it is supposed to cover, then we might also call
this a bug, or a “hole” or a “gap”. This is more serious for proofs than for programs.
A program with a bug might still be of some use, provided it crashes rarely and can
correctly deal with the inputs that don’t cause it to crash (even though it should be
fixed). But a “proof” with a hole is actually not a proof at all. It might be able to be
repaired, to turn it into a proof, or it might not; in the latter case, the “theorem” being
“proved” might actually be false, in which case it has no proof, and is therefore not a
theorem after all.
Programming and proof-writing have differences as well as similarities, but the anal-
ogy is close. Writing mathematical proofs is like programming in a different “paradigm”.
A programming paradigm is, loosely speaking, an approach to programming and a
way of thinking about it. Important programming paradigms include procedural pro-
gramming (historically the first), object-oriented programming, logic programming and
functional programming. Learning to program in more than one paradigm (as well as
in more than one language) improves your thinking about programming and makes you
a better programmer in any language. Learning to write proofs — in the “programming
paradigm” of mathematical reasoning — yields similar benefits to a programmer.
3.15 EXERCiSES
Collatz(2) (𝑛) ≤ 3
2 𝑛 + 1.
2.
A set of strings 𝐿 is called hereditary if it has the following property:
For every nonempty string 𝑥 in 𝐿, there is a character in 𝑥 which can be
deleted from 𝑥 to give another string in 𝐿.
Prove by contradiction that every nonempty hereditary set of strings contains the
empty string.
3.15 E X E R C i S E S 111
3. Review the statements of all the theorems (including corollaries) in the previous
chapters.
5. Let 𝐸 be a finite set and let ℐ be a collection of subsets of 𝐸 that satisfies the
following three properties:
( i 1 ) ∅ ∈ ℐ.
• ℐ = {∅, {𝑎}, {𝑏}, {𝑐}, {𝑎, 𝑏}, {𝑎, 𝑐}, {𝑏, 𝑐}}
• ℐ = {∅}
By way of counterexample, using the same 𝐸, each of the following collections fails to
satisfy all three properties (I1)–(I3):
112 PROOFS
• ℐ = {∅, {𝑎}, {𝑏}, {𝑐}, {𝑎, 𝑏}}. This satisfies (I1) and (I2) but fails (I3).
(b) Prove that, for a social network (see § 1.7, § 2.14, § 2.13), the set of cliques satisfies
(I1) and (I2) but not, in general, (I3).
(c) Prove by contradiction that all maximal independent sets have the same size.
𝑤(𝑋 ) ∶= 𝑤(𝑒).
𝑒∈𝑋
Greedy algorithms are important because they are simple, easy to program, and efficient.
In general, they do not give best-possible solutions. But it is important to recognise sit-
uations when they do give best-possible solutions, since then we attain a rare state of
algorithmic bliss: simplicity, speed, optimality.
(d) Construct a simple social network, and give a nonnegative real weight to each
person, such that the greedy algorithm, with ℐ being the set of all cliques, does not find
3.15 E X E R C i S E S 113
𝑒1 , 𝑒2 , … , 𝑒𝑟 .
(e) Prove by contradiction that the weights of the chosen elements are decreasing, i.e.,
By “decreasing” we allow the possibility that some weights stay the same as we go along
the list; we don’t require that they be strictly decreasing (which is a stronger property
and would entail replacing each ≥ by >).
(f) Prove that the greedy algorithm finds a maximum weight independent set.
So, whenever (I1)–(I3) hold, we can confidently use the greedy algorithm, knowing
that it will give us the best possible solution.
Less obviously, it turns out that properties (I1)–(I3) give a precise characterisation
of situations where greedy algorithms give optimum solutions! To be specific, if ℐ is
a collection of sets satisfying (I1) and (I2), and the greedy algorithm always gives the
maximum weight member of ℐ (regardless of the weights of the elements of 𝐸), then ℐ
must satisfy (I3) too. It is an interesting but challenging exercise to try to prove this.
So, it is useful to be able to recognise structures where properties (I1)–(I3) hold,
because they are precisely the situations where you can’t do better than a greedy al-
gorithm. These structures are known as matroids and their role in optimisation goes
beyond their connection with greedy algorithms. They also have many applications in
the study of networks, vectors, matrices, and geometry.
6.
Prove the following statement, by mathematical induction:
For all 𝑘,
(a) First, give a simple expression for the 𝑘-th odd number.
Inductive Step:
Let 𝑘 ≥ 1.
114 PROOFS
Assume the statement (∗) true for 𝑘. This is our Inductive Hypothesis.
…in terms of the sum of the first 𝑘 odd numbers, plus something else.
(d) Use the inductive hypothesis to replace the sum of the first 𝑘 odd numbers by some-
thing else.
(f) When drawing your final conclusion, don’t forget to briefly state that you are using
the Principle of Mathematical Induction!
1 + 2 + 22 + 23 + ⋯ + 2𝑛 = 2𝑛+1 − 1.
𝑛(𝑛 + 1)(2𝑛 + 1)
12 + 2 2 + ⋯ + 𝑛 2 = .
6
13 + 23 + ⋯ + 𝑛3 = (1 + 2 + ⋯ + 𝑛)2 .
10.
Consider the following algorithm:
for each 𝑖 = 1, 2, … , 𝑛:
for each 𝑗 = 1, 2, … , 𝑖:
for each 𝑘 = 1, 2, … , 𝑗:
[some action]
3.15 E X E R C i S E S 115
(a) For each pair 𝑖, 𝑗, how many times is the action inside the innermost loop performed?
Express this in terms of 𝑖 or 𝑗 or both of them.
(b) For each 𝑖, how many times is the action inside the innermost loop performed?
Write this as both a sum of some number of terms, and (using a theorem in this
chapter) a simple algebraic expression. Both these expression should be in terms
of 𝑖.
(c) Now write a sum of some number of terms, expressed in terms of 𝑛, for the total
number of times the action is performed.
(d) Now work out this expression for 𝑛 = 1, 2, 3, 4, 5, and for as many further values of
𝑛 as you like.
(f) Conjecture a simple algebraic expression, in terms of 𝑛, for the total number of
times the action is performed.
11.
The 𝑛-th harmonic number 𝐻𝑛 is defined by
1 1 1
𝐻𝑛 = 1 + + + ⋯ + .
2 3 𝑛
These numbers have many applications in computer science. We will meet at least one
later in this unit.
In this exercise, we prove by induction that 𝐻𝑛 ≥ log𝑒 (𝑛 + 1). (It follows from this
that the harmonic numbers increase without bound, even though the differences between
them get vanishingly small so that 𝐻𝑛 grows more and more slowly as 𝑛 increases.)
(ii) Let 𝑛 ≥ 1. Assume that 𝐻𝑛 ≥ log𝑒 (𝑛 + 1); this is our inductive hypothesis. Now,
consider 𝐻𝑛+1 . Write it recursively, using 𝐻𝑛 . Then use the inductive hypothesis
to obtain 𝐻𝑛+1 ≥ … … (where you fill in the gap). Then use the fact that log𝑒 (1 +
𝑥) ≤ 𝑥, and an elementary property of logarithms, to show that 𝐻𝑛+1 ≥ log𝑒 (𝑛 +2).
(iii) In (i) you showed that 𝐻1 ≥ log𝑒 (1 + 1), and in (ii) you showed that if 𝐻𝑛 ≥
log𝑒 (𝑛 + 1) then 𝐻𝑛+1 ≥ log𝑒 ((𝑛 + 1) + 1). What can you now conclude, and why?
Advanced afterthoughts:
116 PROOFS
• The above inequality implies that 𝐻𝑛 ≥ log𝑒 𝑛, since log𝑒 (𝑛 + 1) ≥ log𝑒 𝑛. It is instructive
to try to prove directly, by induction, that 𝐻𝑛 ≥ log𝑒 𝑛. You will probably run into a snag.
This illustrates that for induction to succeed, you sometimes need to prove something that
is stronger than what you set out to prove.
• Would your proof work for logarithms to other bases, apart from 𝑒? Where in the proof
do you use the base 𝑒?
12. (Challenge)
Let 𝐸 be a finite set, and let 𝒮 be a set of subsets of 𝐸. We say 𝒮 is linear under △
if, for every two sets 𝑋 , 𝑌 ∈ 𝒮, we have 𝑋 △𝑌 ∈ 𝒮. So, applying symmetric difference to
sets in 𝒮 always gives sets in 𝒮; the operation never takes you out of 𝒮.
The span of 𝒮, written span(𝒮), is the set of all symmetric differences of any number
of members of 𝒮. In other words,
By convention, the symmetric difference of zero sets is the empty set. Therefore, the
span of any set 𝒮 always contains the empty set.
𝒮 is dependent under △ if, for some 𝑋1 , 𝑋2 , … , 𝑋𝑘 ∈ 𝒮 which are all distinct; i.e.,
𝑋𝑖 ≠ 𝑋𝑗 whenever 1 ≤ 𝑖 < 𝑗 ≤ 𝑘, we have
𝑋1 △𝑋2 △ ⋯ △𝑋𝑘 = ∅.
(a) Let 𝒮 be linear under △. Prove, by induction on 𝑘, that for all 𝑘 and all
𝑋1 , 𝑋2 , … , 𝑋𝑘 ∈ 𝒮,
𝑋1 △𝑋2 △ ⋯ △𝑋𝑘 ∈ 𝒮.
(b) Prove that, for every set 𝒮 of subsets of 𝐸, the set 𝒮 is linear under △ if and only
if it equals its own span, i.e.,
𝒮 = span(𝒮).
(c) Prove by induction on 𝑘, that for all 𝑘 and all 𝑋1 , 𝑋2 , … , 𝑋𝑘 ∈ 𝑆, either
𝑋1 △𝑋2 △ ⋯ △𝑋𝑘 = ∅,
3.15 E X E R C i S E S 117
(d) Prove that 𝒮 is dependent under △ if and only if there exists 𝑋 ∈ 𝒮 such that
𝑋 ∈ span(𝒮 ∖ {𝑋 }).
(e) Prove that if 𝒮 is a minimal dependent set under △, then for every 𝑋 ∈ 𝒮 we have
𝑋 ∈ span(𝒮 ∖ {𝑋 }).
(f) Prove that every minimal spanning set under △ is independent under △.
(g) Now prove that every minimal spanning set under △ is a maximal independent
set under △.
(h) Prove that every maximal independent set under △ is also a spanning set under
△.
(i) Now prove that every maximal independent set under △ is also a minimal spanning
set under △.
(j) Prove that 𝒮 is a minimal spanning set under △ if and only if it is a maximal
independent set under △.
|span(𝒮)| = 2|𝒮| .
(m) Prove that all bases under △ have the same size.
(n) Do all minimal dependent sets under △ have the same size? Give a proof of your
claim.
(o) Prove that, for every base ℬ of 𝒮 under △, and every 𝑋 ∈ 𝒮, there is a unique way to
write 𝑋 as a symmetric difference of members of ℬ, using each element of ℬ at most once.
118 PROOFS
(p) For each previous part of this question, (a)–(o), state what type of proof you have
used.
Does this algorithm always finds the best possible solution? Justify your answer, using
a previous exercise.
13. Identify the errors in the following “theorem” and “proof”, which are both
incorrect.
The steps in the “proof” are numbered for convenience.
“Theorem.” For all 𝑛 ∈ ℕ, a circular disc can be cut into 2𝑛 pieces by 𝑛 straight lines.
“Proof.”
1. Consider 𝑛 = 0. The disc is not cut at all, so it is still in one piece. So the number
of pieces is 20 = 1. So the claim is true for 𝑛 = 0.
2. Consider 𝑛 = 1. A single line across a disc cuts it into two pieces, so the number
of pieces is 21 . So the claim is true for 𝑛 = 1.
3. Now consider 𝑛 = 2. If the circle is cut by one line into two pieces, then we can cut
it again by a line that crosses the first line. Such a line must cut right across both
of the pieces created by the first cut. Therefore each of those pieces is divided in
two, so the total number of pieces is 2 × 2 = 22 , so the claim is true for 𝑛 = 2.
4. It is clear from the cases considered so far that, once the circle is cut by some
number of lines, then another line can be used to divide every piece into two
pieces, thereby doubling the number of pieces.
5. So, if we use 𝑛 lines, then the repeated doubling gives 2𝑛 pieces. Therefore the
claim is true for all 𝑛.
“□”
14. Identify the errors in the following “theorem” and “proof”, which are both incorrect.
3.15 E X E R C i S E S 119
“Proof.”
1. We need to show that the sets ℤ and ℚ are equal.
2. To prove that two sets are equal, we can prove the appropriate subset and superset
relations between the two sets.
3. We first prove the subset relation between these two sets, ℤ ⊆ ℚ.
4. Let 𝑥 ∈ ℤ.
𝑥
5. Since 𝑥 is an integer, we have 𝑥 = .
1
6. So 𝑥 is a quotient of integers.
7. Therefore 𝑥 is rational, i.e., 𝑥 ∈ ℚ.
8. So we have proved that ℤ ⊆ ℚ.
9. Now we need to prove the superset relation, which is the reverse of subset.
10. So we will prove that ℚ ⊇ ℤ.
11. Let 𝑥 ∉ ℚ.
𝑝
12. Then 𝑥 cannot be written in the form where 𝑝 and 𝑞 are integers.
𝑞
13. So it certainly cannot be written in this form with 𝑞 = 1.
14. Therefore 𝑥 is not an integer, i.e., 𝑥 ∉ ℤ.
15. So we have proved that ℚ ⊇ ℤ.
16. It follows from our subset and superset relations between the two sets that ℤ = ℚ.
“□”
15. Let 𝐺 be the cryptosystem obtained from Caesar slide by restricting the keyspace
to the first half of the alphabet. (See Exercise 2.15 for definitions relating to the Caesar
slide cryptosystem.) So the keyspace 𝐾𝐺 is
{a, b, c, d, e, f, g, h, i, j, k, l, m}.
Computers are logical machines. This is not just a description of their behaviour, but
also of their nature. Their hardware consists of huge numbers of tiny components that
each do simple operations based on logic, and that do them very quickly. In their
memory, information is stored as sequences of basic pieces which can each be True or
False. The instructions used to get computers to do things — their programs — are
expressed in a language that specifies what to do using formal, precise rules that use
logic. Even before a program is written, a formal specification of the task to be done
will make essential use of logic.
But logic is not just for machines. It is fundamental to clear thinking and pre-
cise communication. By learning about logic, you will improve your own thinking and
communication, and therefore your prospects in life.
Logic plays a fundamental role in rigorous reasoning, especially in proofs. It will
pervade everything we do in this unit and is fundamental to your future studies, not
only in computer science but in mathematics, other sciences, engineering, economics,
philosophy, and indeed any field of human activity where precise reasoning is important.
We will therefore now study the most fundamental types of logic. This week we
study Propositional Logic, and next week we study Predicate Logic.
4.1𝛼 T R U T H VA L U E S
1 There are forms of logic that use other values too, including three-valued logics which use the extra truth
value Unknown. But we will focus entirely on classical two-valued logic, firstly because it is what computers
are based on, and secondly because it is embedded within most other logics anyway, so that understanding
two-valued logic is necessary in order to understand those other logics.
121
122 PROPOSiTiONAL LOGiC
Another common abstraction of the notion of two possible states for something is
the bit
bit, which can be 0 or 1. This view is intimately related to truth values: in many
situations, we can represent False by 0 and True by 1. It is worth exploring how the
mathematical behaviour of 0 and 1 under basic arithmetic operations compares with
that of False and True under logical operations. We will develop this link further in
§ 7.4.
4.2𝛼 B O O L E A N VA R i A B L E S
You are familiar with variables for numerical quantities, whose value can be any number
from some set. We have already been using variables for other objects too. So it is
natural to use variables whose values are truth values.
A Boolean variable,
variable also called a propositional variable,
variable is a variable whose
value can be True or False.
Often, Boolean variables are used as names for statements which are either True or
False.
4.3𝛼 PROPOSiTiONS
Examples
1+1 = 2 — a proposition which is true.
Xià Péisù designed the first computer in China — a proposition which is true.
The earth is flat. — a proposition which is false.
It will rain tomorrow. — a proposition.
For brevity, a proposition may be given a name, which is a Boolean variable. For
example, let 𝑋 be the proposition 1 + 1 = 2. Then the truth value of 𝑋 is True.
4.4𝛼 L O G i C A L O P E R AT i O N S 123
4.4𝛼 L O G i C A L O P E R AT i O N S
Propositions may be combined to make other propositions using logical operations. Here
are the basic logical operations we will use. We will define each of them shortly.
Not ¬ ( ∼, , − )
And ∧ (&)
Or ∨
Implies ⇒ (→)
Equivalence ⇔ (↔)
Logical operations are also called connectives
connectives. Inside computers, they are imple-
mented in electronic circuits called logic gates.
gates
4.5 N E G AT i O N
Logical negation is a unary operation which changes True to False and False to True. It is
denoted by ¬, which is placed before its argument. So, ¬True = False and ¬False = True.
If a proposition 𝑃 is True then ¬𝑃 is False, and if 𝑃 is False then ¬𝑃 is True.
Example:
𝑃: You have prepared for next week’s tutorial.
¬𝑃: You have not prepared for next week’s tutorial.
Other notation for ¬𝑃 that you may come across: ∼𝑃, 𝑃, −𝑃, !𝑃
𝑃 ¬𝑃
F T
T F
Logical negation is reminiscent of set complementation (§ 1.11). In each case, you
get something completely opposite, and doing it twice gets you back where you started.
For logical negation, we have
¬¬𝑃 = 𝑃.
If a proposition asserts membership of a set, then its logical negation asserts membership
of the complement. For example, consider the proposition √2 ∈ ℚ (which is False) and
124 PROPOSiTiONAL LOGiC
suppose our universal set is ℝ. Its logical negation ¬(√2 ∈ ℚ) may be written √2 ∉ ℚ
or √2 ∈ ℝ ∖ ℚ (which is True).
4.6 CONjUNCTiON
Conjunction is a binary logical operation, i.e., it has two Boolean arguments. The result
is True if and only if both its arguments are True. So, if at least one of its arguments is
False, then the result is False.
We denote conjunction by ∧ and read it as “and”. So the conjunction of 𝑃 and 𝑄
is written 𝑃 ∧ 𝑄 and read as “𝑃 and 𝑄”. Note that we are using the English word “and”
in a strict, precise, logical sense here, which is narrower than the full range of meanings
this word can have in English.
Example:
We can define conjunction symbolically using its truth table. It has two arguments
(let’s call them 𝑃 and 𝑄 again), each of which can have two possible values (True and
False), so there are 22 = 4 combinations of arguments, hence four rows of the truth table.
In each row, the corresponding value of 𝑃 ∧ 𝑄 is given in the last column.
𝑃 𝑄 𝑃 ∧𝑄
F F F
F T F
T F F
T T T
Conjunction is closely related to set intersection. If 𝑥 is an object and 𝐴 and 𝐵
are sets, then the conjunction of the propositions 𝑥 ∈ 𝐴 and 𝑥 ∈ 𝐵 is the proposition
𝑥 ∈ 𝐴 ∩ 𝐵:
(𝑥 ∈ 𝐴) ∧ (𝑥 ∈ 𝐵) = (𝑥 ∈ 𝐴 ∩ 𝐵).
Restating (1.12) using conjunction, we have
𝐴 ∩ 𝐵 = {𝑥 ∶ 𝑥 ∈ 𝐴 ∧ 𝑥 ∈ 𝐵}.
4.7 D i S j U N C T i O N 125
4.7 DiSjUNCTiON
Disjunction is another binary logical operation. Its result is True if and only if at least
one of its arguments is True. So, if both of its arguments are False, then the result is
False.
We denote disjunction by ∨ and read it as “or”. The disjunction of 𝑃 and 𝑄 is written
𝑃 ∨ 𝑄 and read as “𝑃 or 𝑄”. Again, our use of English words is unusually specific: “or”
is being used here in a strict, precise, logical sense, much narrower than its full range
of English meanings. An analogous situation arose with “and” previously. Also, we are
using the word “or” inclusively, so that a disjunction is True whenever any one or more
of its arguments are True. For this reason, disjunction is sometimes called inclusive-OR.
(This contrasts with the exclusive-or of two propositions, which is True precisely when
exactly one of its two arguments is True; we discuss it in § 4.11.)
Example:
𝑃 I will study FIT3155 Advanced Data Structures & Algorithms.
𝑄 I will study MTH3170 Network Mathematics.
𝑃 𝑄 𝑃 ∨𝑄
F F F
F T T
T F T
T T T
Disjunction is closely related to set union. If 𝑥 is an object and 𝐴 and 𝐵 are sets,
then the disjunction of the propositions 𝑥 ∈ 𝐴 and 𝑥 ∈ 𝐵 is the proposition 𝑥 ∈ 𝐴 ∪ 𝐵:
(𝑥 ∈ 𝐴) ∨ (𝑥 ∈ 𝐵) = (𝑥 ∈ 𝐴 ∪ 𝐵).
𝐴 ∪ 𝐵 = {𝑥 ∶ 𝑥 ∈ 𝐴 ∨ 𝑥 ∈ 𝐵}.
4.8 D E M O R G A N ’ S L AW S
¬(𝑃 ∨ 𝑄) = ¬𝑃 ∧ ¬𝑄
These laws can be proved using truth tables. Consider the table below, which proves
the first of De Morgan’s Laws, ¬(𝑃 ∨ 𝑄) = ¬𝑃 ∧ ¬𝑄. We start out with the usual two
columns giving all combinations of truth values of our variables 𝑃 and 𝑄. The overall
approach is to gradually work along to the right, adding new columns that give some part
of one of the expressions we are interested in, always using the columns we’ve already
constructed in order to construct new columns. We’ll first work towards constructing a
column giving truth values for the left-hand side of the equation, ¬(𝑃 ∨ 𝑄). As a step
towards this, we make a column for 𝑃 ∨ 𝑄: this becomes our third column. In fact, our
first three columns are just the truth table for the disjunction 𝑃 ∨𝑄, which we have seen
before. Then we negate each entry in the third column to give the entries of the fourth
column, which gives the truth table for ¬(𝑃 ∨ 𝑄). So we’ve done the left-hand side of
the equation. Then we start on the right-hand side of the equation, which is ¬𝑃 ∧ ¬𝑄.
For this, we’ll need ¬𝑃 and ¬𝑄, which are obtained by negating the columns for 𝑃 and
𝑄 respectively. This gives the fifth and sixth columns. Finally we form ¬𝑃 ∧ ¬𝑄 in the
seventh column by just taking the conjunction of the corresponding entries in the fifth
and sixth columns.
We now have columns giving the truth tables of both sides of the first of De Morgan’s
Laws: these are the fourth and seventh columns below, shown in green. These columns
are identical! This shows that the two expressions ¬(𝑃 ∨ 𝑄) and ¬𝑃 ∧ ¬𝑄 are logically
equivalent, i.e., their truth values are the same for all possible assignments of truth
values to their arguments 𝑃 and 𝑄. In other words, as Boolean expressions, they are
equal. So ¬(𝑃 ∨ 𝑄) = ¬𝑃 ∧ ¬𝑄. This proves the first of De Morgan’s Laws.
𝑃 𝑄 𝑃 ∨𝑄 ¬(𝑃 ∨ 𝑄) ¬𝑃 ¬𝑄 ¬𝑃 ∧ ¬𝑄
F F F T T T T
F T T F T F F
T F T F F T F
T T T F F F F
4.9 i M P L i C AT i O N 127
We could prove the second of De Morgan’s Laws by the same method. But, now
that we know the first of De Morgan’s Laws, it is natural to ask: can we use it to prove
the second law more easily, so that we avoid doing the same amount of work all over
again? In other words, assuming that ¬(𝑃 ∨ 𝑄) = ¬𝑃 ∧ ¬𝑄 holds for all 𝑃 and 𝑄, can
you prove that ¬(𝑃 ∧𝑄) = ¬𝑃 ∨¬𝑄 holds for all 𝑃 and 𝑄 without starting from scratch
again and going through the same kind of detailed truth table argument? (Exercise 4.3)
De Morgan’s Laws can also be proved by reasoning in a way that covers all possible
combinations of truth values, rather than working laboriously through each combination
of truth values separately (which is what the truth table proof does). We give such a
proof now. For good measure, we do it for a more general version of the Law which
caters for arbitrarily long conjunctions and disjunctions. This would not be possible
just using the truth table approach, since we’d need a separate truth table for each 𝑛,
which means we’d need infinitely many truth tables.
Theorem 23.
23 For all 𝑛:
Proof.
Again, we ask: having proved the first of De Morgan’s Laws (in this more general
form), can we use it to prove the second law more easily? How would you prove the
second law?
There is a clear correspondence between De Morgan’s Laws for Sets, in Theorem 1
and Corollary 2, and De Morgan’s Laws for Logic.
4.9 i M P L i C AT i O N
Example:
128 PROPOSiTiONAL LOGiC
𝑃 𝑄 𝑃 ⇒𝑄
F F T
F T T
T F F
T T T
As we discussed in § 3.2, implication is closely related to the subset relation.
4.10 E Q U i VA L E N C E
𝑃 ⇔ 𝑄 = (𝑃 ⇒ 𝑄) ∧ (𝑄 ⇒ 𝑃).
𝑃 ⇔ 𝑄 can be written the other way round, as 𝑄 ⇔ 𝑃. They have the same meaning.
Example:
𝑐 𝑐 𝑐
𝑃 The triangle is right-angled. 𝑏 𝑏 𝑏
𝑄 The side lengths satisfy
𝑎2 + 𝑏2 = 𝑐 2 . 𝑎 𝑎 𝑎
𝑎2 + 𝑏2 < 𝑐 2 𝑎2 + 𝑏2 = 𝑐 2 𝑎2 + 𝑏2 > 𝑐 2
4.11 E X C L U S i V E - O R 129
𝑃 𝑄 𝑃 ⇔𝑄
F F T
F T F
T F F
T T T
Equivalence is closely related to set equality. If 𝑥 is an object and 𝐴 and 𝐵 are sets,
then the propositions 𝑥 ∈ 𝐴 and 𝑥 ∈ 𝐵 are equivalent if 𝑥 belongs to both sets or neither
of them. If this holds for all 𝑥 then the two sets are identical. Conversely, if two sets
are identical then the propositions 𝑥 ∈ 𝐴 and 𝑥 ∈ 𝐵 are always equivalent.
4.11 EXCLUSiVE-OR
𝑃 𝑄 𝑃 ⊕𝑄
F F F
F T T
T F T
T T F
It is evident from their truth tables that exclusive-or is actually the logical negation
of equivalence:
𝑃 ⊕ 𝑄 = ¬(𝑃 ⇔ 𝑄).
The exclusive-or of two propositions is True if and only if exactly one of them is True.
What happens if we combine three propositions with exclusive-or, as in 𝑃1 ⊕ 𝑃2 ⊕ 𝑃3 ?
The exclusive-or of the first two, 𝑃1 ⊕ 𝑃2 , is True precisely when exactly one of 𝑃1 and
𝑃2 is True, and in that case 𝑃1 ⊕ 𝑃2 ⊕ 𝑃3 can only be True if 𝑃3 is False, so we still
130 PROPOSiTiONAL LOGiC
have exactly one of the propositions 𝑃1 , 𝑃2 , 𝑃3 being True. But there is another way for
𝑃1 ⊕ 𝑃2 ⊕ 𝑃3 to be True, namely if 𝑃1 ⊕ 𝑃2 is False and 𝑃3 is True. For 𝑃1 ⊕ 𝑃2 to be
False, neither of 𝑃1 and 𝑃2 is True, or they both are. So we might again have exactly
one of 𝑃1 , 𝑃2 , 𝑃3 being True, or we might in fact have all three of 𝑃1 , 𝑃2 , 𝑃3 being True.
We have covered all the possible things that can happen, if 𝑃1 ⊕𝑃2 ⊕𝑃3 is to be True,
and we have found that the number of 𝑃1 , 𝑃2 , 𝑃3 that are True must be 1 or 3. So it
needn’t be exactly one; fortunately, we have avoided a common mistake there. In fact,
what we can say is that the number of 𝑃1 , 𝑃2 , 𝑃3 that are True must be odd.
This generalises to arbitrary numbers of propositions. You should play with some
examples (say, with four propositions 𝑃1 , 𝑃2 , 𝑃3 , 𝑃4 ), satisfy yourself that this does indeed
happen in general, and try to understand why. Then you will get more out of reading
the formal proof which we now present.
Theorem 24.
24 For all 𝑛 ∈ ℕ, and for any propositions 𝑃1 , 𝑃2 , … , 𝑃𝑛 ,
Inductive Basis:
When 𝑛 = 1, the expression on the left of (24) is just 𝑃1 , and this is True if if just
one of 𝑃1 is True, and False otherwise! So (24) holds in this case.
Inductive Step:
Let 𝑘 ≥ 1. Assume (24) holds for 𝑛 = 𝑘; this is the Inductive Hypothesis.
4.11 E X C L U S i V E - O R 131
𝑃1 ⊕ 𝑃2 ⊕ ⋯ ⊕ 𝑃𝑘+1
= ⒧𝑃1 ⊕ 𝑃2 ⊕ ⋯ ⊕ 𝑃𝑘 ⒭ ⊕ 𝑃𝑘+1
(identifying a smaller expression of the same type, within this expression)
True, if an odd number of 𝑃1 , 𝑃2 , … , 𝑃𝑛 is True;
= ⒧ ⒭ ⊕ 𝑃𝑘+1
False, if an even number of 𝑃1 , 𝑃2 , … , 𝑃𝑛 is True.
(by the Inductive Hypothesis)
True ⊕ 𝑃𝑘+1 , if an odd number of 𝑃1 , 𝑃2 , … , 𝑃𝑛 is True;
=
False ⊕ 𝑃𝑘+1 , if an even number of 𝑃1 , 𝑃2 , … , 𝑃𝑛 is True.
⎧ True ⊕ False, if an odd number of 𝑃1 , 𝑃2 , … , 𝑃𝑛 is True and 𝑃𝑘+1 is False;
⎪
⎪ True ⊕ True, if an odd number of 𝑃1 , 𝑃2 , … , 𝑃𝑛 is True and 𝑃𝑘+1 is True;
= ⎨
⎪ False ⊕ False, if an even number of 𝑃1 , 𝑃2 , … , 𝑃𝑛 is True and 𝑃𝑘+1 is False;
⎪
⎩ False ⊕ True, if an even number of 𝑃1 , 𝑃2 , … , 𝑃𝑛 is True and 𝑃𝑘+1 is True.
(breaking each option down according to the value of 𝑃𝑘+1 )
⎧ True, if an odd number of 𝑃1 , 𝑃2 , … , 𝑃𝑛 is True and 𝑃𝑘+1 is False;
⎪
⎪ False, if an odd number of 𝑃1 , 𝑃2 , … , 𝑃𝑛 is True and 𝑃𝑘+1 is True;
= ⎨
⎪ False, if an even number of 𝑃1 , 𝑃2 , … , 𝑃𝑛 is True and 𝑃𝑘+1 is False;
⎪
⎩ True, if an even number of 𝑃1 , 𝑃2 , … , 𝑃𝑛 is True and 𝑃𝑘+1 is True.
(evaluating the exclusive-ors)
True, if an odd number of 𝑃1 , 𝑃2 , … , 𝑃𝑘 , 𝑃𝑘+1 is True;
=
False, if an even number of 𝑃1 , 𝑃2 , … , 𝑃𝑘 , 𝑃𝑘+1 is True.
(combining the True cases and combining the False cases, and
noting that their conditions also combine to capture precisely
the evenness or oddness of the number of the 𝑃𝑖 that are True).
Conclusion:
Therefore, by Mathematical Induction, (24) holds for all 𝑛 ∈ ℕ.
4.12 TA U T O L O G i E S A N D L O G i C A L E Q U i VA L E N C E
A tautology is a statement that is always true. In other words, the right-hand column
of its truth table has every entry True.
Two statements 𝑃 and 𝑄 are logically equivalent if their truth tables are identical.
In other words, 𝑃 and 𝑄 are equivalent if and only if 𝑃 ⇔ 𝑄 is a tautology.
Examples:
¬¬𝑃 is logically
equivalent to 𝑃
¬(𝑃 ∨ 𝑄) is logically
equivalent to ¬𝑃 ∧ ¬𝑄
¬(𝑃 ∧ 𝑄) is logically
equivalent to ¬𝑃 ∨ ¬𝑄
𝑃 ⇒𝑄 is logically
equivalent to ¬𝑃 ∨ 𝑄
𝑃 ⇔𝑄 is logically
equivalent to (𝑃 ⇒ 𝑄) ∧ (𝑃 ⇐ 𝑄)
and to (¬𝑃 ∨ 𝑄) ∧ (𝑃 ∨ ¬𝑄)
𝑃 ⊕𝑄 is logically equivalent to ¬(𝑃 ⇔ 𝑄)
and to 𝑃 ⇔ ¬𝑄
and to (𝑃 ⇒ ¬𝑄) ∧ (𝑃 ⇐ ¬𝑄)
4.13𝜔 H i S T O RY
4.14 D i S T R i B U T i V E L AW S
𝑃 ∧ (𝑄 ∨ 𝑅) = (𝑃 ∧ 𝑄) ∨ (𝑃 ∧ 𝑅) (4.1)
𝑃 ∨ (𝑄 ∧ 𝑅) = (𝑃 ∨ 𝑄) ∧ (𝑃 ∨ 𝑅) (4.2)
𝑝 × (𝑞 + 𝑟) = (𝑝 × 𝑞) + (𝑝 × 𝑟)
but
𝑝 + (𝑞 × 𝑟) ≠ (𝑝 + 𝑞) × (𝑝 + 𝑟)
Just as for De Morgan’s Laws, we see a correspondence between the algebra of sets
and the algebra of logic. The Distributive Laws for sets, Theorem 3, correspond to the
Distributive Laws for Logic.
4.15 L AW S O F B O O L E A N A L G E B R A
Here is a full listing of the laws of Boolean algebra, which we may use to convert propo-
sitional expressions from one form to another. Reasons why we might do this include
finding simpler forms for Boolean expressions and determining algebraically whether or
not two given Boolean expressions are logically equivalent.
Because of the logical duality between conjunction and disjunction, these laws may
be arranged in dual pairs. In the table below, each law involving conjunction or disjunc-
tion is written on the same line as its dual.
¬¬𝑃 = 𝑃
¬True = False ¬False = True
𝑃 ∧𝑄 = 𝑄∧𝑃 𝑃 ∨𝑄 = 𝑄∨𝑃
(𝑃 ∧ 𝑄) ∧ 𝑅 = 𝑃 ∧ (𝑄 ∧ 𝑅) (𝑃 ∨ 𝑄) ∨ 𝑅 = 𝑃 ∨ (𝑄 ∨ 𝑅)
𝑃 ∧𝑃 = 𝑃 𝑃 ∨𝑃 = 𝑃
𝑃 ∧ ¬𝑃 = False 𝑃 ∨ ¬𝑃 = True
𝑃 ∧ True = P 𝑃 ∨ False = P
𝑃 ∧ False = False 𝑃 ∨ True = True
134 PROPOSiTiONAL LOGiC
Distributive Laws
𝑃 ∧ (𝑄 ∨ 𝑅) = (𝑃 ∧ 𝑄) ∨ (𝑃 ∧ 𝑅) 𝑃 ∨ (𝑄 ∧ 𝑅) = (𝑃 ∨ 𝑄) ∧ (𝑃 ∨ 𝑅)
De Morgan’s Laws
¬(𝑃 ∨ 𝑄) = ¬𝑃 ∧ ¬𝑄 ¬(𝑃 ∧ 𝑄) = ¬𝑃 ∨ ¬𝑄
We will introduce two standard ways of writing logical expressions. The first of these is
Disjunctive Normal Form (DNF), treated in this section. The second is Conjunctive
Normal Form (CNF), treated in the next section.
We treat DNF first, but CNF will be much more important for us. In brief, that’s
because CNF is more natural for encoding logical problems and real-world conditions.
You can always convert one to the other, but at considerable cost in both time and space
as we will see.
A literal is an appearance of a logical variable in which it is either unnegated or
negated just once. So, if 𝑋 is a logical variable, then its corresponding literals are 𝑋
and ¬𝑋 . Separate appearances of a logical variable within a larger logical expression are
counted as separate literals. For example, the expression (¬𝑋 ∧¬𝑌)∨(¬𝑋 ∧𝑌)∨(𝑋 ∧𝑌)
has six literals (even though some are equivalent to each other). We do not consider
¬¬𝑋 to be a literal, as it is, but it is equivalent to 𝑋 , which is a literal. Similarly, ¬¬¬𝑋
is not a literal but is equivalent to the literal ¬𝑋 .
A Boolean expression is in Disjunctive Normal Form (DNF) if
Examples:
You can convert any proposition into an equivalent one in DNF using its truth table.
Consider the proposition 𝑃 given by the following truth table.
4.16 D i S j U N C T i V E N O R M A L F O R M 135
𝑋 𝑌 𝑃
F F T
F T T
T F F
T T T
As a step towards designing a logical expression for the entire proposition 𝑃, we will
design one logical expression for each row where 𝑃 is True.
Consider the first row. This is for when 𝑋 = False and 𝑌 = False. We want the
result to be True. We can do this using ¬𝑋 ∧ ¬𝑌. Satisfy yourself that this is True
when 𝑋 = False and 𝑌 = False, and also that it is False for every other combination of
truth values for 𝑋 and 𝑌. So it is True only in this first row, as its own truth table shows:
𝑋 𝑌 ¬𝑋 ∧ ¬𝑌
F F T
F T F
T F F
T T F
Now consider the second row of the truth table for 𝑃, which is for when 𝑋 = False
and 𝑌 = True. This time we will use ¬𝑋 ∧ 𝑌, which is True for this row but False for all
the other rows. We can add an extra column to include its truth table too.
𝑋 𝑌 ¬𝑋 ∧ ¬𝑌 ¬𝑋 ∧ 𝑌
F F T F
F T F T
T F F F
T T F F
The third row of the truth table for 𝑃 has 𝑃 = False, so we will ignore that.
The fourth row of the truth table for 𝑃 has 𝑃 = True again. We will now use ¬𝑋 ∧𝑌,
which is True for this row but for no other. We add a further column for its truth table.
𝑋 𝑌 ¬𝑋 ∧ ¬𝑌 ¬𝑋 ∧ 𝑌 𝑋 ∧𝑌
F F T F F
F T F T F
T F F F F
T T F F T
Now look at what happens when we take the disjunction of the last three columns.
136 PROPOSiTiONAL LOGiC
This shows that the DNF expression (¬𝑋 ∧ ¬𝑌) ∨ (¬𝑋 ∧ 𝑌) ∨ (𝑋 ∧ 𝑌) is equivalent
to 𝑃.
conjunction conjunction conjunction
𝑃 = (¬𝑋 ∧ ¬𝑌) ∨ (¬𝑋
∧ 𝑌) ∨
(𝑋 ∧ 𝑌)
disjunction
dis
DNF expressions like this can be read easily from truth tables. You don’t have to
add extra columns as we did above. In each row, look at the pattern of truth values
for the variables. Then write down a conjunction of literals where variables that are
True are written normally and variables that are False are written in negated form. For
example, in the second row of our truth table, the variable 𝑋 is False so we negate it,
whereas 𝑌 is True so we just use it unchanged. Taking the conjunction of these two
literals gives ¬𝑋 ∧ 𝑌, which is the part we want for that row. Do this for every row in
which the proposition 𝑃 is True. This is shown for our current example in the following
table.
𝑋 𝑌 𝑃
F F T ¬𝑋 ∧ ¬𝑌
F T T ¬𝑋 ∧ 𝑌
T F F
T T T 𝑋∧ 𝑌
When this is done, just take the disjunction of all the parts in the final column and
you have your DNF expression for 𝑃.
Exercise: simplify the above expression 𝑃 as much as possible, using Boolean algebra.
𝑋 𝑌 𝑍 𝑃
F F F T ¬𝑋 ∧ ¬𝑌 ∧ ¬𝑍
F F T F
F T F T ¬𝑋 ∧ 𝑌 ∧ ¬𝑍
F T T F
T F F F
T F T T 𝑋 ∧ ¬𝑌 ∧ 𝑍
T T F T 𝑋 ∧ 𝑌 ∧ ¬𝑍
T T T F
Taking the disjunction of all the conjunctions in the last column gives a DNF ex-
pression for 𝑃.
The importance of this method is that it shows that every logical expression is
equivalent to one in DNF. So DNF can be viewed as a “standard form” into which any
logical expression can be transformed.
BUT this transformation comes at a price. The DNF expression has as many parts
as there are truth table rows where the expression is True. An expression with 𝑘 vari-
ables has 2𝑘 rows in its truth table, which is exponential in the number of variables. If
the expression is True for a large proportion of its truth table rows, then the number of
parts in the DNF expression may also be exponentially large in the number of variables.
So it may be too large and unwieldy to be useful, unless the number of variables is very
small. (If a Boolean expression is actually provided in the form of its entire truth table,
then constructing its equivalent DNF expression by the above method is ok, since the
expression you get won’t be any larger than the truth table. But if the Boolean expres-
sion is provided as a compact formula, then the size of its equivalent DNF expression
may be exponential in the size of the formula you started with.)
One apparent attraction of DNF is that it is easy to tell if a DNF expression is
satisfiable, that is, if there is some assignment of truth values to its variables that makes
the whole expression True. In fact, if you take any of the parts of a DNF expression, the
pattern of the literals (i.e., whether each appears plainly or negated) tells you a truth
assignment that makes that part True, and then the whole disjunction must also be True
because a disjunction is True precisely when at least one of its parts is True. In effect,
the parts of a DNF expression yield a kind of encoded listing of all the truth assignments
that make the whole expression True.
On the other hand, it does not seem so easy to tell if a DNF expression is a tautology.
For a DNF expression to not be a tautology, there would have to be some truth assign-
ment to its variables that makes it False. There is no known way to do test for this
efficiently.
The big problem with DNF, though, is that, in real life, logical rules are not usually
specified in a form that is amenable to DNF. They are typically described by listing
138 PROPOSiTiONAL LOGiC
conditions that must be satisfied together. In other words, they are described in a way
that lends itself to expression as a conjunction rather than a disjunction.
There is a close relationship — a kind of logical duality — between CNF and DNF.
Suppose you have an expression 𝑃 in CNF, and suppose you negate it, giving ¬𝑃. So
¬𝑃 is a negation of a conjunction of a disjunction of literals. But, by De Morgan’s Law,
a negation of a conjunction is a disjunction of negations. So ¬𝑃 will then be expressed
as a disjunction of negations of disjunctions of literals. But, again by De Morgan’s Law,
each negation of a disjunction is equivalent to a conjunction of negations. So ¬𝑃 is
now a disjunction of conjunctions of negations of literals. But the negation of a literal is
always equivalent to another literal (since ¬¬𝑋 = 𝑋 ). So we see that ¬𝑃 is a disjunction
of conjunctions of literals. In other words, it’s in DNF.
We now have a way to convert any logical expression 𝑃 in truth table form to a CNF
expression. To do so, we just
1. negate all the truth values in the output column (turning it into a truth table for
¬𝑃),
2. use the method of the previous subsection to construct a DNF expression for ¬𝑃,
3. then negate the expression (so that it will now be an expression for 𝑃 again),
4. and use De Morgan’s Laws to transform the negated DNF expression into CNF.
This establishes the important theoretical point that every expression is equivalent
to a CNF expression. But this is usually not a good way to construct CNF expressions
in practice, because:
• Truth tables are too large. As discussed earlier, their size is exponential in the
number of variables. Complex logical conditions in real computational problems
usually contain enough variables for truth tables to be unusable. Furthermore,
even for more modest-sized problems, the truth table approach for CNF uses a lot
of time and space and, for manual work, is quite error-prone.
4.18 R E P R E S E N T i N G L O G i C A L S TAT E M E N T S 139
• Logical conditions are usually expressed in a way that makes CNF a natural way
to represent them.
4.18 R E P R E S E N T i N G L O G i C A L S TAT E M E N T S
Suppose we are given a set of rules or conditions that we want to model as a logical
expression. These are often expressed as conditions that must be satisfied together. For
example, the rules of a game must all be followed, to play the game correctly; you can’t
just ignore the rules you don’t like on the grounds that you’re following some of the
rules! A software specification typically stipulates a set of conditions that must all be
met, and similarly for legal contracts, acts of parliament, traffic regulations, itineraries,
and so on. While some rules may offer choices as to how they can be satisfied, at the
highest level a set of rules is usually best modelled as a conjunction.
So, if you are given some specifications and you want to model them by a logical
expression, one first step you can take (working “top-down”) is to identify how the rule
is structured at the top level as a conjunction. What are the parts of this conjunction?
You can then keep working top-down and try to decompose those parts as conjunctions
too. A conjunction of conjunctions is just one larger conjunction.
Working from the other direction (“bottom-up”), you also have to think about what
your most elementary logical “atoms” are. In other words, think about the simplest,
most basic assertion that could hypothetically be made about this situation, without
worrying about whether it might be True or False. In fact, in this kind of situation,
you won’t initially know what values your logical variables might have; you’re merely
encoding your problem in logical form, without solving it yet, so you avoid thinking
about actual truth values for any of your variables. You are just trying to identify the
kinds of “atomic assertions” that are needed to describe any hypothetical situation in
your scenario.
Example:
You are planning a dinner party. Your guest list must have:
• at least one of: Harry, Ron, Hermione, Ginny
• Hagrid only if it also has Norberta
• none, or both, of Fred and George
• no more than one of: Voldemort, Bellatrix, Dolores.
Note that we’re not trying to convert everything to logic at once. The four parts of
our conjunction are not yet expressed in logical form; they’re still just written in English
text. That’s ok, in this intermediate stage.
Early in the process, we should think about what Boolean variables to use, and what
they should represent. In this case, that is fairly straightforward. The simplest logical
statement we can make in this situation is that a specific person is on your guest list.
So, for each person, we’ll introduce a Boolean variable with the intended interpretation
that the person is on your guest list. So, the variable Harry is intended to mean that
the person Harry is on your guest list, and so on. This gives us eleven variables, one
for each of our eleven guests. As far as we know at the moment, each of the variables
might be True or False; it is the role of the logical expression we are constructing to
ensure that the combinations of truth values for these eleven variables must correspond
to valid guest lists. We will do that by properly representing the rules in logic. We
will not try, at this stage, to enumerate all possible guest lists, or even to find one valid
guest list. Our current task is to encode the problem’s rules in logic, not to solve the
problem. (That can be tackled later, and is a different skill.)
Now, let’s look at each of the four parts of our conjunction, in turn, and see how
they may be logically expressed using our variables.
Hagrid ⇒ Norberta
Fred ⇔ George
Voldemort & Dolores is forbidden, and the pair Bellatrix & Dolores is forbidden.
See the logical structure emerging: “…and …and …”. So we have:
We are now getting to the point where we can use logical manipulations (Boolean
algebra) to transform each of the four parts into a disjunction of literals. The first part
is already a disjunction. The second part can be written as the disjunction ¬Hagrid ∨
Norberta, as we have already seen. The third part can be written
For the fourth part, requiring that Voldemort and Bellatrix are not both True is the
same as requiring at least one of them to be False, which is the same as requiring at
least one of their negations to be True, which is captured by ¬Voldemort ∨ ¬Bellatrix.
Treating the other two pairs the same way gives the conjunction of disjunctions
4.19 S TAT E M E N T S A B O U T H O W M A N Y VA R i A B L E S A R E T R U E
Given a collection of Boolean variables, we are often interested in how many of them
are True. We might want to state that at least two of them are True, or that at most
two of them are True, or exactly two of them are True. We might want to make similar
statements with “two” replaced by some other number. Conditions of this kind are
examples of cardinality contraints, since they are only about the number of variables
with a given value.
We saw examples of this in the previous section. We wanted at least one of Harry,
Ron, Hermione and Ginny to be True. We also wanted at most one of Voldemort,
Bellatrix and Dolores to be True. So, to build your intuition about dealing with these
situations, it would be worth pausing now and spending a few minutes reading that
analysis again, and thinking through how the reasoning given there could be extended
to situations where the number of True variables involved is some other number (i.e.,
not one).
If ever we want to specify that exactly 𝑘 variables are True, we can express this as a
conjunction:
(at least 𝑘 are True) ∧ (at most 𝑘 are True).
So we now focus our discussion on just the “at least” and “at most” cases.
Suppose we want to state that at most 𝑘 of the 𝑛 variables 𝑥1 , 𝑥2 , … , 𝑥𝑛 are True.
This means that, for every set of 𝑘 +1 variables, at least one of them is False, or in other
words, at least one of their negations is True. So, for every set of 𝑘 + 1 variables, we
form a disjunction of their negations (to say that at least one of these negations is True),
and then we combine all these disjunctions into a larger conjunction. The number of
disjunctions we use is just the number of subsets of 𝑘 + 1 variables chosen from our 𝑛
𝑛
variables, which is ⒧𝑘+1 ⒭.
For example, suppose we want to say that at most two of the four variables 𝑤, 𝑥, 𝑦, 𝑧
is True (i.e., 𝑘 = 2 and 𝑛 = 4). This means that, for every three of the variables, at least
one of them is False. So, at least one of 𝑤, 𝑥, 𝑦 is False, and at least one of 𝑤, 𝑥, 𝑧 is
False, and so on. But saying that at least one of a set of variables is False is the same as
saying that at least one of their negations is True. For example, at least one of 𝑤, 𝑥, 𝑦 is
False if and only if at least one of ¬𝑤, ¬𝑥, ¬𝑦 is True. This is now a job for disjunction:
at least one of ¬𝑤, ¬𝑥, ¬𝑦 is True if and only if ¬𝑤 ∨ ¬𝑥 ∨ ¬𝑦 is True. So, we create all
disjunctions of triples (𝑘 + 1 = 3) of negated literals, which gives the disjunctions
Now suppose we want to state that at least 𝑘 of the 𝑛 variables 𝑥1 , 𝑥2 , … , 𝑥𝑛 are True.
This means that at most 𝑛 − 𝑘 of them are False. This means that, for every set of
𝑛 − 𝑘 + 1 variables, at least one of them is True. So, for every set of 𝑛 − 𝑘 + 1 variables,
we form a disjunction of them (to say that at least one of them is True), and then we
combine all these disjunctions into a larger conjunction.
For example, suppose we want to say that at least two of the four variables 𝑤, 𝑥, 𝑦, 𝑧 is
True (i.e., 𝑘 = 2 and 𝑛 = 4). We create all disjunctions of triples (𝑛−𝑘+1 = 4−2+1 = 3)
of literals (unnegated, this time), which gives the disjunctions
𝑤 ∨ 𝑥 ∨ 𝑦, 𝑤 ∨ 𝑥 ∨ 𝑧, 𝑤 ∨ 𝑦 ∨ 𝑧, 𝑥∨𝑦 ∨𝑧.
(𝑤 ∨ 𝑥 ∨ 𝑦) ∧ (𝑤 ∨ 𝑥 ∨ 𝑧) ∧ (𝑤 ∨ 𝑦 ∨ 𝑧) ∧ (𝑥 ∨ 𝑦 ∨ 𝑧).
Finally, if we want to say that exactly two of the four variables are True, then we
take the conjunction of the expressions for “at least” and “at most”, giving
4.20 U N i V E R S A L S E T S O F O P E R AT i O N S
We saw in § 4.16 that every logical expression is equivalent to a DNF expression. Later, in
§ 4.17, we observed that every logical expression is also equivalent to a CNF expression.
One consequence of this is that every logical expression is equivalent to an expression
that only uses the operations from the operation set {∧, ∨, ¬}.
We say that a set of operations 𝑋 is universal if every logical expression 𝑃 is equiv-
alent to an expression 𝑄 that only uses operations in that set (together with variables
from the original expression). So the set of operations that actually appear in 𝑄 must
be a subset of the operation set 𝑋 .
Our observations above, about DNF and CNF, demonstrate that the operation set
{∧, ∨, ¬}
is universal.
One reason for considering universal sets of operations relates to the construction of
electronic circuits that compute the values of logical expressions. It is easier to make
these circuits if we can put them together from simple components of only a few different
types. Simple components are easier to construct, and the complexity of manufacturing
is reduced if we don’t have to make too many different types of components. We also
gain economies of scale from making large numbers of these simple components instead
of smaller numbers of more complex components.
144 PROPOSiTiONAL LOGiC
The operation set {∧, ∨, ¬} is not the smallest possible universal set of operations.
De Morgan’s Laws (§ 4.8) show how to express ∧ in terms of ∨ and ¬, and dually how
to express ∨ in terms of ∧ and ¬. So either ∧ or ∨ could be dropped from our universal
set, as long as we retain the other as well as ¬. So the operation set
{∧, ¬}
{∧, ∨}
universal?
Having found universal sets of just two operations, we can ask if this is the smallest
possible. Does there exist a universal set of just one operation?
Clearly the set {¬} is not universal, since ¬ is a unary operation that cannot be
used to combine logical variables together. So, if we seek a single universal operation,
we should start by looking at binary operations.
4.21 EXERCiSES
Use 𝐵, 𝑀 and 𝑆 to write a Boolean expression which is true if and only if you visit
Brazil or both the other countries (but not all three).
¬(𝑃 ∨ 𝑄) = ¬𝑃 ∧ ¬𝑄,
¬(𝑃 ∧ 𝑄) = ¬𝑃 ∨ ¬𝑄,
as simply as possible.
5. Do you need parentheses in the expression 𝑎∧𝑏∨𝑐? Investigate the two expressions
𝑎 ∧ (𝑏 ∨ 𝑐) and (𝑎 ∧ 𝑏) ∨ 𝑐.
Are they logically equivalent? If not, what can you say about the relationship between
them? Does either imply the other?
6.
Prove that
A disjunction of the form ¬𝑃1 ∨ ⋯ ∨ ¬𝑃𝑛 ∨ 𝐶 is called a Horn clause. These play a big
role in the theory of logic programming.
8. This question is about using Boolean algebra to describe the algebra of switching.
An electrical switch is represented diagrammatically as follows.
The state of a switch can be represented by a Boolean variable, with the states Off
and On being represented by False and True respectively. Let 𝑥 be a Boolean variable
representing the state of a switch. Then 𝑥 = True represents the switch being On, so
that electrical current can pass through it; 𝑥 = False represents the switch being Off, so
that there is no electrical current through the switch.
146 PROPOSiTiONAL LOGiC
We can put switches together to make more complicated circuits. Those circuits can
then be described by Boolean expressions in the variables that represent the switches.
In the following circuits, 𝑣, 𝑤, 𝑥, 𝑦, 𝑧 are Boolean variables representing the indicated
switches.
(a) For each of the two switching circuits below, write a Boolean expression in 𝑥 and 𝑦
for the proposition that current flows between the two ends of the circuit. (The ends are
shown as 𝐴 and 𝐵 in the diagram. In effect, the diagrams only show part of a complete
circuit. The rest of the circuit would contain a power supply, such as a battery, and an
electrical device that operates when the current flows, such as a light or an appliance.)
𝐴 𝐵 𝑦
𝑥
𝑥 𝑦
(b) For each of the next two circuits, again construct a Boolean expression to represent
the proposition that current flows between the two ends of the circuit.
Compare the two expressions you construct. What can you say about the relationship
between them?
4.21 E X E R C i S E S 147
𝑦 𝑣 𝑦
𝑣
𝑤 𝑧 𝑤 𝑧
(c) Similarly, construct Boolean expressions for the next two circuits.
Compare the two expressions you construct. What can you say about the relationship
between them?
𝑣 𝑦
𝑣 𝑦
𝑥
𝑥
𝑤 𝑧
𝑤 𝑧
(d) For each of our circuit pairs in (a)–(c), discuss how they are related to each other,
and what this means for how the Boolean expressions derived from them are related to
each other.
9. Does logically negating an implication reverse it? In other words, are the
expressions
¬(𝑃 ⇒ 𝑄) and 𝑃 ⇐𝑄
logically equivalent? If so, prove it; if not, determine (with proof) if either implies the
other.
148 PROPOSiTiONAL LOGiC
12.
13. We saw two distributive laws for ∨ and ∧ in § 4.14, and contrasted that with the
situation for ordinary numbers under + and ×, when only one distributive law holds.
Investigate the situation for ⊕ and ∧. Do you have two distributive laws, or one
(and which one?), or none? Give explanations for your answers.
14. Consider the following truth table for a proposition 𝑃 in terms of 𝑋 , 𝑌 and 𝑍.
𝑋 𝑌 𝑍 𝑃
F F F F
F F T F
F T F F
F T T T
T F F F
T F T T
T T F T
T T T T
(c) Using (b) as your starting point, express 𝑃 in Conjunctive Normal Form (CNF).
(d) An alternative way to try to express 𝑃 in CNF might be to start from the DNF
for 𝑃 you found in (a) and then expand the whole thing using the Distributive Laws. If
you start with your expression from (a) and do all this expansion without yet doing any
further simplification, how many parts are combined together in the large conjunction
4.21 E X E R C i S E S 149
you construct? How many literals does it have? How does this approach compare with
(c), with respect to ease and efficiency?
15.
A meeting about moon mission software is held at NASA in 1969. Participants may
include Judith Cohen (electrical engineer), Margaret Hamilton (computer scientist), and
Katherine Johnson (mathematician). Let Judith, Margaret and Katherine be propositions
with the following meanings.
16.
Recall the logical expression given on p. 141 in § 4.18 for your dinner party guest list:
How long would an equivalent DNF expression be? Specifically, how many disjuncts —
smaller expressions combined using ∨ to make the whole expression — would it have?
𝐴
𝐶
𝐵
Figure 4.1: An AND gate. 𝐴 and 𝐵 are the inputs and 𝐶 is the output (which is therefore true
if and only if both 𝐴 and 𝐵 are true).
So, for example, the following expressions are in algebraic normal form:
In Exercise 12(a), you expressed the disjunction of two variables in algebraic normal
form.
Prove that every Boolean function can be written in algebraic normal form. (Use
Exercise 12(a) and what you know about DNF.)
18. Suppose you have an unlimited supply of AND gates, which are logic gates for the
binary operation ∧. Each gate has two inputs, for the two arguments, and one output,
which gives the conjunction of the inputs. The output of one gate can be used for the
input of another gate.
To represent an AND gate in a circuit diagram we will use the symbol in Figure 4.1.
Show how to put AND gates together to compute the conjunction of eight logical
arguments 𝑥1 , 𝑥2 , … , 𝑥8 .
How many AND gates do you need?
Suppose each AND gate computes its output as soon as both its inputs are available,
and that it takes time 𝑡 to compute the output. Assume that all the initial inputs
𝑥1 , 𝑥2 , … , 𝑥8 are available simultaneously, at time 0. How long does your combination of
AND gates take to compute its output?
Try and put your AND gates together to minimise the total time taken to compute
the final output.
𝑥1 ⊕ 𝑥 2 ⊕ ⋯ ⊕ 𝑥 𝑛
or
𝑥1 ⊕ 𝑥2 ⊕ ⋯ ⊕ 𝑥𝑛 ⊕ True.
Prove, by induction on 𝑛, that for all 𝑛 ∈ ℕ, the number of satisfying truth assignments
of an affine Boolean expression with 𝑛 variables is 2𝑛−1 .
It follows that half of all truth assignments are satisfying for the expression, and half
are not.
Here, a truth assignment is just an assignment of a truth value to each variable,
and it is satisfying if the assignment makes the whole expression True.
23. Prove or disprove the claim that there exists a universal operation set containing
just a single operation.
You need not restrict consideration to the three operations ∧, ∨, ¬. You can use
another operation of at most two arguments (in other words, a binary operation) if you
wish.
What type of proof did you use? (Recall the proof types discussed in Chapter 3.)
24. Computer circuits actually perform the very same Boolean logic that we have
studied using logic gates (§ 4.4𝛼 ). The idea is that you have one or more wires coming
into a gate, and usually one output wire. If an input wire has current flowing through,
the variable it represents is set to True, and if not, then False. The logic gate then takes
those signals and gives off an output signal if the result should be True, and no signal if
the result is False. Below are the most common logic gates seen in circuits.
152 PROPOSiTiONAL LOGiC
AND OR
a a
Output Output
b b
NOT XOR
a
a Output Output
b
NAND NOR
a a
Output Output
b b
XNOR
a
Output
b
The AND, OR, and NOT gates function the same as what we have seen previously.
If both signals of an AND gate are on, it will output a signal. For OR, only one input
signal is required to be on, and for NOT, there will only be an output signal if the input
signal is off (False).
XOR is “EXCLUSIVE OR”, which outputs a signal only if a or b are on, but not both.
NAND, NOR, and XNOR are simply the inversions of AND, OR, and XOR respectively
(i.e. NAND is “not both”, NOR is “neither”, and XNOR is “both or neither”). These
gates can also be connected in series, like in the example below, which is equivalent to
the logical expression (¬𝑎 ∨ ¬𝑏).
(¬𝑎 ∨ ¬𝑏)
(a) What single logic gate is the above circuit equivalent to?
4.21 E X E R C i S E S 153
(b) We saw in § 4.20, that {∨, ∧, ¬} is a universal set of operations, meaning that
any logical expression can be expressed with only these operations. How would
you express the function of an XOR gate as a Boolean expression using only these
operations?
(c) Draw a circuit diagram which performs the same functionality as an XOR gate using
only AND, OR, and NOT gates.
(d) Do we even need three types of gates? Given that {∧, ∨, ¬} is a universal set of
operations, use De Morgan’s Law to prove that {∧, ¬} and {∨, ¬} are also a universal
sets of operations. (Hint: Can you express ∨ with the other two operators?)
(e) Draw a circuit diagram that performs an OR operation using only AND and NOT
gates, and also one which performs AND using only OR and NOT gates.
(f) Can you perform the function of a NOT gate, or an AND gate, with only NAND
gates? What does this tell you about NAND gates?
5
P R E D I C AT E L O G I C
5.1𝛼 R E L AT i O N S , P R E D i C AT E S , A N D T R U T H - VA L U E D F U N C T i O N S
We have already met predicates. We mentioned in § 2.13 and § 2.17 that the term
“predicate” is just a synonym for “relation”. Restating this as a definition, we have:
A predicate is a relation.
So, when discussing predicates, we can draw on all the terminology and theory that we
developed for relations. For example, we can talk about the arguments of a predicate,
155
156 P R E D i C AT E L O G i C
and the domain of each of its arguments. We can call it a 𝑘-ary predicate if it has 𝑘
arguments.
Although a predicate is nothing more or less than a relation, the term tends to
indicate an intention to do logic with it. So, we are interested in when things are true
or false, and we might want to combine things using logical operations.
Consider the relation Parent from § 2.13. This is a subset of ℙ×ℙ, where ℙ is the set
of all people who have ever lived. Some ordered pairs of people belong to this relation,
others do not.
For the pairs that belong to it, we may state that fact using prefix notation:
This is a true statement. In logic, we need to be able to work with both true and false
statements. So we want to be able to make assertions of both types.
In fact, if we give each argument of Parent a specific value from its domain, then we get
a specific statement about those values which is either true or false. In other words, we
get a proposition.
It is a short step from here to treat Parent as returning either True or False, so we
can write
In summary, we started with a relation, Parent, treating it as a subset of ℙ×ℙ, and then
treated it as a function that takes any pair of people and returns either True or False.
So, we can view it as a truth-valued function, i.e., a function that returns a truth value.1
In this case, the domain of the function is ℙ × ℙ and its codomain is {True, False}.
These are just different ways of viewing the same thing. When we use the word
“predicate” for a relation, we are not really introducing anything new, but rather just
signalling our intention that we want to use the relation in a logical context, and to use
it to make logical statements, and maybe to build more complex logical statements from
it.
1 Predicates are occasionally called propositional functions, since they yield specific propositions about
specific values of the various arguments.
5.2𝛼 VA R i A B L E S A N D C O N S TA N T S 157
Since predicates enable us to make true or false statements about their arguments,
we can combine them using our usual logical operations: ¬, ∧, ∨, ⇒, ⇔.
statement truth value
¬ Parent(George Boole, George Everest) True
Parent(Plato, Ariston) ∧ Parent(Hypatia, Theon) True
Parent(Plato, Ariston) ∧ Parent(Ariston, Plato) False
Parent(Plato, Ariston) ∨ Parent(Ariston, Plato) True
When defining new predicates, it is conventional to use prefix notation, as we have
done with Parent. But for some predicates based on binary relations, particularly those
relating to ordering, containment or equivalence of some kind, infix notation is used.
So, when using the predicate <, we place the name of the predicate, <, in between its
arguments. (Recall the discussion of prefix and infix notation on p. 47 in § 2.5𝛼 .)
The equality predicate, =, is always considered to be available. This is because you
can’t really say anything about a class of objects if you can’t even tell when two objects
are really the same object! You do not need to be told that you are allowed to use it;
you always can, no matter what type of objects you are working with. It is the only
predicate for which we make this sweeping assumption.
A predicate with one argument (i.e., a unary predicate) is also called a property.
It corresponds naturally to a specific set, namely the set of elements of the domain of
the argument for which the predicate is True.
For example, define the predicate isNegative(𝑋 ) to be a unary predicate whose
variable 𝑋 has domain ℝ. It captures the property of being negative, and corresponds
to the set ℝ− of negative real numbers.
Exercise: what could a predicate with no arguments be?
5.2𝛼 VA R i A B L E S A N D C O N S TA N T S
A variable is a name for an object that allows us to refer to the object without specifying
it exactly.
• Often, the name is a letter from some alphabet, like 𝑥 or 𝑃 or 𝜃. It might also
have subscripts, to distinguish it from other variables where we would like to use
the same letter; the subscript is considered part of the name, so that 𝑥1 and 𝑥2
are different variables. We can also use words for names, like myFavouriteColour
or dateToday. Using words for names is usual in programming; it is much less
common in mathematics.2
• Often, we use a name for something when we don’t know it, but still want to reason
about it. This was one reason for introducing variables for unknown quantities
2 This is for at least a couple of reasons. Using entire words for variable names can tend to lead to expressions
that are cumbersome or cluttered, potentially obscuring the mathematical structure. Multi-letter names can
sometimes look like products of many variables, due to the mathematical convention of denoting products
of variables by juxtaposition (i.e., placing them next to each other).
158 P R E D i C AT E L O G i C
when you first started doing algebra at school. But this need not be the only
reason for introducing variables.
• Variables can be used when we want to talk about any member of some set in a
general way. If we want to say that the logarithm of a product is the sum of the
logarithms, we can let variables 𝑥 and 𝑦 represent any positive real numbers and
write log(𝑥𝑦) = log 𝑥 + log 𝑦.
• Variables in programming and mathematics are similar in nature, but not exactly
the same thing. In programming, a variable is associated with a piece of memory
in which the object is stored. This does not happen in mathematics. In program-
ming, variables can only refer to finite objects, but there is no such restriction in
mathematics.
We have been using variables throughout this unit. We have done so every time we
introduce a name for some object we are discussing. We often referred to sets 𝐴, 𝐵, 𝐶, …,
functions 𝑓, 𝑔, ℎ, …, and propositions 𝑃, 𝑄, …. In the previous chapter we made extensive
use of Boolean variables, which each refer to a truth value, False or True.
Every variable has an associated domain
domain, which is the set of objects which that
variable might represent. If we say, “Let 𝑥 ∈ ℤ”, then 𝑥 is a variable with domain
ℤ. Every Boolean variable has domain {False, True}, in fact that’s the definition of a
Boolean variable.
We can think of a variable as being able to take values
values, which are just the objects in
its domain. But, when we use a variable, we are not implying that it has some specific
value. If we define a variable 𝑥 with domain ℤ, then we are not assuming that 𝑥 stands
for some specific integer.
We also use the term constant for an object which belongs to the domain of some
variable. So, if we have a variable 𝑥 with domain ℤ, then any specific integer is a
constant. So, our constants include −273, −1, 0, 1, 8, 1729, and of course every other
integer.
5.3𝛼 P R E D i C AT E S A N D VA R i A B L E S
We saw in § 5.1𝛼 that, for any predicate, you can obtain specific propositions by giving
specific values to all its arguments.
In order to make more general logical statements, we can also giving variables to
some, or all, of its arguments. When doing so, we must ensure that the domain of the
variable equals the domain of that argument, keeping in mind that every variable has a
domain (p. 158) and every argument of every predicate has a domain (p. 75).
For example: if 𝑋 is a variable with domain ℙ, then we can write
The variable 𝑋 in this statement is freefree, meaning that no value is (yet) given to it and
it is available for values to be given it. Because it contains a free variable, this statement
does not yet have a truth value, so it is not yet a proposition.
We can, if we wish, assign values to one or more of the free variables in a logical
statement. As usual, any value given to a variable must belong to its domain. If
every free variable has been given a value from its domain, the statement becomes a
proposition. Each combination of values you give to the variables in a statement creates
a different specific proposition, potentially with different truth values.
In the statement (5.1), we can create specific propositions, with truth values, by
giving values from ℙ to the variable 𝑋 .
value of 𝑋 proposition truth value
Sara Turing Parent(Alan Turing, Sara Turing) True
Julius Turing Parent(Alan Turing, Julius Turing) True
Alonzo Church3 Parent(Alan Turing, Alonzo Church) False
For another example, consider the binary predicate <. We suppose its two arguments
each have domain ℝ. This is just the usual < relation on the real numbers. We can
create propositions by giving values to all its arguments:
statement truth value
5 < 15 True
−5 < −15 False
√2 < 1.618 True
𝜋 <𝑒 False
Suppose we have real variables 𝑥, 𝑦 (i.e., variables with domain ℝ). If we plug these into
the < predicate’s two arguments, then we get a logical statement about those variables:
𝑥 < 𝑦.
But the two variables remain free so the statement is not a proposition and has no truth
value.
We can also use a variable for one argument and a value for the other. For exam-
ple, using the real variable 𝑥 and the value −3.7 for the first and second arguments
respectively, we obtain the statement
𝑥 < −3.7.
Although it does include a substitution of a value for an argument, it also still has a free
variable, so it is not a proposition and does not have a truth value.
For another example, consider the predicate designedFirstComputerIn, which has two
arguments, the first having domain ℙ and the second having as its domain the set 𝔸 of
3 Alonzo Church (1903–1995) was Alan Turing’s PhD supervisor at Princeton, 1936–1938. So he might be
described informally as Turing’s academic father! But he is not a parent of Turing’s in the usual sense, so
(Alan Turing, Alonzo Church) ∉ Parent.
160 P R E D i C AT E L O G i C
all countries. As a relation, it consists of all pairs (𝑝, 𝑐) such that person 𝑝 designed the
first computer that was designed and built in country 𝑐. We can create some specific
propositions by assigning values to these arguments:
designedFirstComputerIn(Trevor Pearcey, Australia) True
designedFirstComputerIn(Maston Beard, Australia) True
designedFirstComputerIn(Xià Péisù, China) True
designedFirstComputerIn(Blaise Pascal, France) False
⋮ ⋮
If variable 𝑃 has domain ℙ and variable 𝑄 has domain 𝔸, we can make statements like
statement free variables
designedFirstComputerIn(𝑃, 𝑄) 𝑃, 𝑄
designedFirstComputerIn(Trevor Pearcey, 𝐶) 𝐶
designedFirstComputerIn(𝑃, France) 𝑃
But, in this case, it would be an error to write designedFirstComputerIn(𝑄, 𝑃), with 𝑃
and 𝑄 the other way round, because the domains do not match: variable 𝑃, with domain
ℙ, is given to the second argument, which has domain 𝔸 (and a similar mis-match for
variable 𝑄 in the first argument).
5.4 A R G U M E N T S O F P R E D i C AT E S
When using a predicate, we need to put something into each of its arguments.
We have seen in § 5.1𝛼 and § 5.3𝛼 that we can put constants and/or variables (or a mix
of these) into the arguments of a predicate, provided the domains match appropriately.
But we also want to be able to put more complex expressions into the arguments.
For example, we may want to make statements like
𝑎2 + 𝑏2 < 𝑐 2
𝑒𝑖𝜋 + 1 = 0
Parent(Caroline Herschel, MotherOf(William Herschel))
knows(X, MotherOf(FatherOf(Y))).
To do this, the expressions we use for predicate arguments need to be able to use func-
tions as well as constants and variables. But which functions can we use? Usually, we
will work in settings where the available functions are specified up-front. But we still
need to ensure that the expressions that use functions are properly constructed.
Informally, the expressions that can be put into an argument of a predicate can be
any expression that makes sense in the domain of that argument and which only uses
functions that are known to be available.
For example, if we are dealing with the predicate < with real arguments, then the
first statement above, 𝑎 2 + 𝑏 2 < 𝑐 2 , is ok provided that
• the squaring operation is taken to be a function from ℝ to ℝ, so that its codomain
matches the domain of the second argument of <;
5.4 A R G U M E N T S O F P R E D i C AT E S 161
• the variables 𝑎, 𝑏, 𝑐 each have domain ℝ, which ensures they can be used as argu-
ments for the real squaring function;
• variable 𝑋 has domain ℙ (the set of all people who have ever lived), so that its
domain matches the domain of the first argument of knows;
• the functions MotherOf and FatherOf each have ℙ as their domain and codomain,
to ensure that the composition MotherOf ∘ FatherOf is defined, and to ensure that
the composition’s codomain matches the domain of the second argument of knows
(which it gets pluggged into);
• variable 𝑌 also has domain ℙ, to match the domain of the function FatherOf, since
𝑌 is plugged into the argument of the FatherOf.
In general, apart from constants and variables from appropriate domains, the other
things we can put into a predicate arguments are all expressions that use a function.
In fact, such an expression might contain several uses of functions, like in 𝑎 2 + 𝑏 2 or
MotherOf(FatherOf(𝑌))). But there will always be one function in the expression which
was applied last, i.e., which we think of as producing something to be given to the
argument of the predicate.
• In the expression 𝑎 2 + 𝑏 2 , the addition function is applied last, after both the
squarings have been done. We think of the addition as producing something which
goes into the first argument of the predicate <.
But, as we saw from our examples above, it is not enough that the function’s codomain
matches the domain of the predicate’s argument. The function also has arguments,
each of which also has a domain, and anything put there must match the domain of the
argument where it is put. What can we put into a function’s arguments? The same rules
apply here as apply to arguments of predicates. The things we can put into a function’s
arguments are constants from that argument’s domain, variables with the same domain,
and (again) functions whose codomain equals the domain of the argument where it is
put. And those functions, too, have arguments, and the same rule applies to them! And
so on, and so on, …although we are not allowed to go on forever and construct infinite
expressions! So, following these rules, we can construct, from constants, variables and
162 P R E D i C AT E L O G i C
functions, any expression that is allowed to be put into the argument of any other
function, and ultimately, any expression that is allowed to be put into the argument of
a predicate.
Expressions that are allowed to be put into the arguments of predicates can be
defined more formally, and we do so now for completeness. Such an expression is called
a term.
We suppose, at the outset, that we have the following ingredients:
• a set of functions.
• a constant from 𝐷;
• a function with codomain 𝐷, in which each argument has a term whose domain is
the same as the domain of the argument.
If we are working with real numbers and we have functions +, −, ×, /, √ and real
variables 𝑥 and 𝑦, then some examples of terms are:
• The following six “expressions” are not terms because they have been incorrectly
formed (either by breaking the rules for how to use the provided functions, or
using objects, variables or functions that are not available):
5.5 B U i L D i N G L O G i C A L E X P R E S S i O N S W i T H P R E D i C AT E S
Once we have predicates whose arguments are terms with appropriate domains, we
can combine them using all the Boolean operations (logical connectives) we studied in
Chapter 4. So we can construct expressions like those in the left column of the following
table (with free variables listed in the right column).
logical expression free variables
(1 < 𝑖) ∧ (𝑖 < 𝑛) 𝑖, 𝑛
1 1+2 1+2 2
⒧ < ⒭∧⒧ < ⒭ none
3 3+5 3+5 5
((𝑥 > 0) ∧ (𝑦 > 0)) ⇒ (log(𝑥𝑦) = log 𝑥 + log 𝑦) 𝑥, 𝑦
knows(𝑃, 𝑄) ∧ knows(𝑄, 𝑅) ∧ ¬knows(𝑃, 𝑅) 𝑃, 𝑄, 𝑅
Parent(𝑋 , 𝑌) ⇔ (𝑌 = MotherOf(𝑋 )) ∨ (𝑌 = FatherOf(𝑋 )) 𝑋, 𝑌
• If terms of the required domains are plugged into the arguments of a predicate,
then the result is a predicate logic expression.
(𝐸), ¬𝐸, 𝐸 ∧ 𝐹, 𝐸 ∨ 𝐹, 𝐸 ⇒ 𝐹, 𝐸 ⇔ 𝐹.
164 P R E D i C AT E L O G i C
We can manipulate predicate logic expressions using the usual rules of Boolean
algebra. This extends the realm of those rules. They were introduced in Chapter 4 in the
context of propositions, but now we are using them for predicate expressions, and these
may have variables and therefore might not be propositions. But every assignment of
values to all the variables makes a predicate expression True or False, so on that basis we
can combine these expressions using logical operations just as we combined propositions
using logical operations in Chapter 4.
For example, consider again the first expression in the above table,
which is really just a more detailed way of writing 1 < 𝑖 < 𝑛 (in fact, it makes clear
the exact logical meaning of that chain of two inequalities). If we want to say that the
expression is not satisfied, then we can negate it, using De Morgan’s Law:
which accords with our understanding that if 𝑖 does not lie between 1 and 𝑛 then it
must be ≤ 1 or ≥ 𝑛.
At the start of this chapter, we mentioned that predicate logic enables us to make
statements about variables that are always true, or sometimes true, or never true. We
can do this with quantifiers. We now look at the one of these, which covers “sometimes
true”.
The existential quantifier is written ∃ and read as “there exists”. It is placed before
a variable to mean that there exists some value of that variable, within the variable’s
domain, that makes the subsequent statement True.
For example, consider the statement
∃𝑋 ∶ (𝑋 is a fly) ∧ (𝑋 is in my soup).
Suppose we have two unary predicates, Fly and InMySoup. For each of these, suppose
that the domain of its sole argument is the set of everything on Earth, and that the
variable 𝑋 has this domain too. So Fly(𝑋 ) is a predicate logic expression meaning that
“𝑋 is a fly”, and InMySoup(𝑋 ) is a predicate logic expression meaning that “𝑋 is in my
soup”.
5.6 E X i S T E N T i A L Q U A N T i F i E R 165
The conjunction of any two predicate logic expressions is another predicate logic
expression (§ 5.5). So
Fly(𝑋 ) ∧ InMySoup(𝑋 )
is another predicate logic expression, meaning that “𝑋 is a fly and 𝑋 is in my soup”.
For any specific object on Earth, plugging it into 𝑋 in this expression gives a specific
proposition, which may be True or False. In this sense, this predicate logic expression
may be viewed as representing many possible statements, one for each object on Earth.
Putting ∃𝑋 in front of (i.e., at the very left of) this expression gives the statement
This is just a rewording of the statement we started with. And note that it is just one
single statement, rather than representing many possible statements.
The colon after “∃𝑋 ” in (5.2) is merely punctuation. It is usually read as “such that”,
and provides convenient visual separation between “∃𝑋 ” and the condition that 𝑋 must
satisfy. But it is fine to omit the colon, if the expression is still clear. Sometimes a full
stop is used instead of the colon. So either of the following is also correct:
∃𝑋 Fly(𝑋 ) ∧ InMySoup(𝑋 ),
∃𝑋 . Fly(𝑋 ) ∧ InMySoup(𝑋 ).
The domain of each variable is often clear from the context. Alternatively, it might be
specified as part of the written expression by specifying domain membership immediately
after the variable being quantified, as in, ∃𝑌 ∈ ℚ ⋯ (so in this case the domain of 𝑌 is
ℚ). The domain of a variable certainly affects the meaning of the expression it is part
of, in general; changing the domain might change the truth value of the expression.
For example, consider the following expression, written as text on the left and as a
logical expression on the right.
There is an analogy between the existential quantifier and disjunction. In each case,
the expression that uses them is True if and only if at least one of its “possibilities” is
True. For a disjunction, we require that at least one of the parts of the disjunction is
True; for an existential quantifier applied to some variable 𝑋 , we require that at least
166 P R E D i C AT E L O G i C
one value of 𝑋 makes the entire expression True. In other words, at least one member
of the domain of 𝑋 may be assigned to 𝑋 to make the expression True.
For example, suppose we have the statement “Someone did it”. We may write this
as
∃𝑋 ∶ 𝑋 did it. (5.3)
Suppose the domain of 𝑋 is a large set of people,
⋯ ⋯ ∨ (Annie did it) ∨ (Edward did it) ∨ (Henrietta did it) ∨ (Radhanath did it) ∨ ⋯ ⋯
However, the existential quantifier is not just a shorthand notation for disjunction.
Firstly, if the domain of a variable is infinite, then existential quantification over that
variable cannot be replaced by a disjunction because a disjunction is only allowed to
have finitely many parts (and, indeed, logical expressions in general must be of finite
size). Secondly, variables and their quantifiers allow us to do some reasoning that cannot
be done in propositional logic.
Once a variable in an expression has had a quantifier applied to it, so that all
occurrences of the variable come after the quantifier and are subject to it, the variable
is said to be bound. You can no longer give specific values to the variable. So you can
no longer create specific propositions by giving specific values to every free variable.
So, for example, in our statement “there’s a fly in my soup”, formalised in (5.2), there
is no free variable. The variable 𝑋 is bound. We now have one specific proposition,
which has a specific truth value. The variable 𝑋 has not been given any value, but that
does not mean it is available to have values plugged into it. Now that 𝑋 is bound, it is
no longer free, and no longer available to receive values.
Normally, we put a quantifier in front of an expression containing the quantified
variable, as in the example
∃𝑤 ∶ 𝑤 < 0.
It’s also legal to put a quantifier, with its variable, in front of an expression that does
not contain that variable. Here are some examples of this.
∃𝑤 ∶ 𝑧 < 0, ∃𝑤 ∶ 1 + 1 = 2, ∃𝑤 ∶ 1 + 1 = 3.
In each case, the quantifier may as well not be there. The variable 𝑤 is irrelevant to the
truth of 𝑧 < 0, so whether this first statement is true or not depends solely on the truth
or otherwise of 𝑧 < 0. In the second and third examples, 𝑤 is also irrelevant, and in
those cases there are no other variables so we can discard the quantifiers and conclude
5.7 R E S T R i C T i N G E X i S T E N T i A L LY Q U A N T i F i E D VA R i A B L E S 167
that the second statement is True and the third is False. So these three statements are
equivalent to
𝑧 < 0, 1 + 1 = 2, 1 + 1 = 3,
respectively. For another couple of examples, where the expression after ∃𝑥 is just a
logical constant:
Note that quantifiers can only be used with variables. Using them with constant
objects makes no sense. It is an error to write something like ∃5, ∃Annie.
5.7 R E S T R i C T i N G E X i S T E N T i A L LY Q U A N T i F i E D VA R i A B L E S
To begin with, suppose the domain of 𝑋 is { computers }, and that we have the
predicate human(𝑋 ) which is intended to mean that 𝑋 is human.
Then our statement may be written
∃𝑋 ∶ human(𝑋 ).
Correct: Incorrect:
∃𝑋 ∶ computer(𝑋 ) ∧ human(𝑋 ) ∃𝑋 ∶ computer(𝑋 ) ⇒ human(𝑋 )
• “There exists something that is both • “There exists something which is not
computer and human.” a computer or is human.”
The general principle at work here is as follows. Let 𝑃 be a unary predicate whose
sole argument has domain 𝐷, and let 𝑋 be a variable with the same domain, 𝐷. The
existential statement
∃𝑋 ∶ 𝑃(𝑋 )
says that there is at least one 𝑋 ∈ 𝐷 for which 𝑃(𝑋 ) holds. But suppose we want to
make this assertion only for those 𝑋 that also satisfy 𝑅(𝑋 ) (for some other predicate
𝑅, and with 𝑋 still having domain 𝐷). In other words, we want to say that there is
at least one 𝑋 satisfying 𝑅(𝑋 ) that also satisfies 𝑃(𝑋 ). Then we can do this with the
statement
∃𝑋 ∶ 𝑅(𝑋 ) ∧ 𝑃(𝑋 ).
So, the restriction to those 𝑋 that satisfy 𝑅(𝑋 ) is done by putting 𝑅(𝑋 ) in conjunction
with 𝑃(𝑋 ).
Now we consider how to assert that a statements about variables is “always true”.
The universal quantifier is written ∀ and read as “for all” (or “for every” or “for
each” 4 ). It is placed before a variable to mean that for all values of that variable, within
the variable’s domain, the subsequent statement is True.
For example, consider the statement
Now that we’ve applied quantifiers to the variables in these expressions, the variables
are all bound
bound. Again, once a variable is bound, we can no longer assign values to it.
As with the existential quantifier, if we use the universal quantifier with a variable
and then follow it with an expression that does not include that variable, then the
quantication is irrelevant and can be dropped. It’s like saying, “for every dog, 𝑛 is prime”,
which is equivalent to just saying that “𝑛 is prime” since the primality or otherwise of 𝑛
has nothing to do with dogs. So the statements
∀𝑤 ∶ 𝑧 < 0, ∀𝑤 ∶ 1 + 1 = 2, ∀𝑤 ∶ 1 + 1 = 3.
are equivalent to
𝑧 < 0, 1 + 1 = 2, 1 + 1 = 3,
respectively.
5.9 R E S T R i C T i N G U N i V E R S A L LY Q U A N T i F i E D VA R i A B L E S
∀𝑋 ∶ human(𝑋 ).
Incorrect: Correct:
∀𝑋 ∶ computer(𝑋 ) ∧ human(𝑋 ) ∀𝑋 ∶ computer(𝑋 ) ⇒ human(𝑋 )
The general principle is as follows. Let 𝑃 be a unary predicate whose sole argu-
ment has domain 𝐷, and let 𝑋 be a variable with the same domain, 𝐷. The universal
statement
∀𝑋 ∶ 𝑃(𝑋 )
says that every 𝑋 ∈ 𝐷 satisfies 𝑃(𝑋 ). But suppose we want to make this assertion only
for those 𝑋 that also satisfy 𝑅(𝑋 ) (for some other predicate 𝑅, and with 𝑋 still having
domain 𝐷). In other words, we want to say that every 𝑋 satisfying 𝑅(𝑋 ) also satisfies
𝑃(𝑋 ). Then we can do this with the statement
∀𝑋 ∶ 𝑅(𝑋 ) ⇒ 𝑃(𝑋 ).
So, the restriction to those 𝑋 that satisfy 𝑅(𝑋 ) is done by making 𝑅(𝑋 ) imply 𝑃(𝑋 )
in the expression after the quantified variable.
5.10 M U LT i P L E Q U A N T i F i E R S
After the first quantifier and its variable, ∃𝑃, we have the expression
∀𝐶 hasVisited(𝑃, 𝐶) (5.5)
which asserts that 𝑃 has visited every country. Two variables — 𝑃 and 𝐶 — appear
in (5.5), but they play different roles, since 𝑃 is free in (5.5) but 𝐶 is bound by the ∀𝐶
at the start. So the expression (5.5) has only one free variable. The presence of this free
variable still prevents this expression (5.5) from being a proposition. In fact, we could
use it as the definition of a new unary predicate, hasVisitedEveryCountry: if 𝑃 is any
person, then hasVisitedEveryCountry(𝑃) means that 𝑃 has indeed visited every country:
Then, we put ∃𝑃 in front of (5.5) to make (5.4). This means 𝑃 is now quantified
and therefore bound, so the full expression in (5.4) has no free variable and becomes a
proposition. That full expression has the same meaning as
∃𝑃 hasVisitedEveryCountry(𝑃).
The following table shows how we might represent various statements involving
the predicate hasVisited, including the one we have just discussed in detail, with the
statements on the left and the predicate logic expressions on the right.
You should think about each of these examples carefully. Think about what they
each mean and how they differ from each other. Some are obviously true, some are
obviously false, some seem likely to be true although you may not know that for a fact.
One important issue to think about is whether the order of the quantifiers matters.
• We have given one example with two existential quantifiers, and one with two
universal quantifiers. In each case, we could try putting them the other way
round, e.g., we could try ∃𝑌∃𝑋 ⋯ instead of ∃𝑋 ∃𝑌 ⋯. Would that give us a new
statement, or is it just another way of saying the same thing?
• We have given four examples that use a mix of our two quantifier types. Are they
all logically different, or are some of them equivalent?
172 P R E D i C AT E L O G i C
5.11 P R E D i C AT E L O G i C E X P R E S S i O N S
Now that we have quantifiers, we are at last able to give a complete definition of predicate
logic expressions, extending and finishing the work we began on p. 163 in § 5.5.
A predicate logic expression is any of the following.
• The truth values True and False are predicate logic expressions.
• If terms of the required domains are plugged into the arguments of a predicate,
then the result is a predicate logic expression.
(𝐸), ¬𝐸, 𝐸 ∧ 𝐹, 𝐸 ∨ 𝐹, 𝐸 ⇒ 𝐹, 𝐸 ⇔ 𝐹.
∃𝑋 ∶ 𝐸
∀𝑋 ∶ 𝐸
But, in each case, 𝑋 is no longer a free variable in the new expression. So, if 𝑉 is
the set of free variables of the expression 𝐸, then the set of free variables of ∃𝑋 ∶ 𝐸,
and the set of free variables of ∀𝑋 ∶ 𝐸, are each 𝑉 ∖ {𝑋 }.
5.12 D O i N G L O G i C W i T H Q U A N T i F i E R S 173
All the rules of Boolean algebra (§ 4.15) are available to us when working with predicate
logic. We discussed this at the end of § 5.5.
There are also some other rules for doing logic involving quantifiers.
Let 𝑃(𝑋 ) be a predicate logic expression with a free variable 𝑋 . If we know that
∀𝑋 𝑃(𝑋 )
and obj is any specific object (in the domain of 𝑋 ), then we can deduce that
𝑃(obj).
In other words, if 𝑃(𝑋 ) is True for all 𝑋 , then it’s certainly true for any specific value
from the domain of 𝑋 :
(∀𝑋 𝑃(𝑋 )) ⟹ 𝑃(obj)
In similar vein, if it’s True for a specific value from the domain of 𝑋 , then its certainly
True for some 𝑋 .
𝑃(obj) ⟹ (∃𝑋 𝑃(𝑋 )) (5.6)
Universal quantifiers and conjunction mix in a natural way. For any predicates 𝑃
and 𝑄,
The expressions on the right, in (5.7) and (5.8), raise the important issue of the
scope of variables. These expressions each contain two separate quantifications over 𝑋 .
Specifically, (5.7) contains ∀𝑋 twice, and (5.8) contains ∃𝑋 twice. To which parts of the
entire expression do each of these quantifiers apply? Does the first ∀𝑋 in (5.7) apply
to every appearance of 𝑋 in the rest of the expression? If so, how does that first ∀𝑋
interact with the second ∀𝑋 ? If not, how do we know that?
The scope of a quantified variable in a predicate logic expression is the portion of
the expression (i.e., the “sub-expression”) in which that variable has meaning, and goes
from its quantifier to either
• the end of the innermost pair of enclosing parentheses, if such a pair of parentheses
exists, or
• the end of the entire expression, if there is no enclosing pair of parentheses.
So, in the left expression in (5.7), the scope of 𝑋 is the entire expression, since ∀𝑋 is
not enclosed by any parentheses. (There are parentheses in the expression, but they do
174 P R E D i C AT E L O G i C
not enclose ∀𝑋 so they have no bearing on the scope of 𝑋 .) But in the right expression
in (5.7), we actually have two variables with separate scopes. The first ∀𝑋 is enclosed
in a pair of parentheses, and its scope is sub-expression ∀𝑋 𝑃(𝑋 ) on the left of ∧. The
second ∀𝑋 is enclosed in a different, and completely separate, pair of parentheses, and its
scope is the sub-expression ∀𝑋 𝑄(𝑋 ) on the right of ∧. These two scopes do not overlap;
there is no appearance of 𝑋 that belongs to both scopes, so there is no ambiguity over
which quantifier governs each appearance of 𝑋 . In effect, the two appearances of 𝑋 ,
each with its own scope separate from the other’s scope, are local to those scopes. It is
up to the reader to see that these two variables are different, even though they have the
same name, and keep track of their different scopes. Such variables in predicate logic
are like local variables in programs, which many programming languages provide for,
including Python.
Summarising for the examples in (5.7) and (5.8), we have
∀𝑋 (𝑃(𝑋 ) ∧ 𝑄(𝑋 ))
( ∀𝑋 𝑃(𝑋 ) ) ∧ ( ∀𝑋
𝑄(𝑋 ) )
scope of 𝑋 scope of first 𝑋 scope of second 𝑋
∃𝑋 (𝑃(𝑋 ) ∨ 𝑄(𝑋 ))
( ∃𝑋 𝑃(𝑋 ) ) ∨ ( ∃𝑋
𝑄(𝑋 ) )
scope of 𝑋 scope of first 𝑋 scope of second 𝑋
If we change the first 𝑋 to 𝑊, taking care to do it throughout the scope of that first 𝑋
and nowhere else, then we get the equivalent expression
Having discussed how ∀ mixes well with ∧, and how ∃ mixes well with ∨, it is natural
to ask: how well do the other pairings mix? How does ∀ mix with ∨? How does ∃ mix
with ∧?
In detail, what can we say about the logical relationship between …
…?
We consider this question further in Exercise 14.
If you have negation immediately to the left of a quantifier, then you may move it to
the right of the quantifier (and its associated variable) provided you “flip” the quantifier
as you do so (∃ ⟷ ∀).
¬ ∀𝑌 means the same as ∃𝑌 ¬
¬ ∃𝑌 means the same as ∀𝑌 ¬
So, for any predicate logic expression 𝑃(𝑋 ) in which the variable 𝑋 is free, we have
the two laws
¬ ∀𝑋 𝑃(𝑋 ) = ∃𝑋 ¬ 𝑃(𝑋 ),
¬ ∃𝑋 𝑃(𝑋 ) = ∀𝑋 ¬ 𝑃(𝑋 ).
Similarly,
¬ ∀𝑌 ¬ means the same as ∃𝑌
¬ ∃𝑌 ¬ means the same as ∀𝑌
5.14 S U M M A RY O F R U L E S F O R L O G i C W i T H Q U A N T i F i E R S
When doing predicate logic, we can use all the rules of propositional logic (§ 4.15)
together with the principles for doing logic with quantifiers that we have introduced in
this chapter. We summarise these principles below.
176 P R E D i C AT E L O G i C
• 𝑃 and 𝑄 can be any predicates, and more generally they can be any predicate
logic expressions in which the indicated variables appear (e.g., 𝑃(𝑥) can be any
predicate logic expression in which 𝑥 appears);
• 𝑅 can be any predicate logic expression in which 𝑥 does not appear as a free
variable;
Here is the list of principles, with the sections discussing them on the right.
∃𝑥 ∈ 𝐴 𝑃(𝑥) is equivalent to ∃𝑥 (Π𝐴 (𝑥) ∧ 𝑃(𝑥)) § 5.7
∀𝑥 ∈ 𝐴 𝑃(𝑥) is equivalent to ∀𝑥 (Π𝐴 (𝑥) ⇒ 𝑃(𝑥)) § 5.9
∃𝑥 𝑅 is equivalent to 𝑅 § 5.7
∀𝑥 𝑅 is equivalent to 𝑅 § 5.9
Suppose we have the property Prime, with domain ℕ, which is True if its argument is a
prime number and False otherwise. Suppose also that we have the binary predicate ≤,
and that any variables we use must have domain ℕ.
How might we use these to state Theorem 17, that there are infinitely many primes?
At first glance, this might look like an existential statement, so we reach for an
existential quantifier. The trouble is, we are asserting the existence of infinitely many
5.15 S O M E E X A M P L E S 177
numbers of a particular type, but we are not allowed to use infinitely many quantifiers
(or to do anything else that makes the statement infinitely long). Furthermore, the
ingredients available to us here is quite limited; they do not allow us to describe arbi-
trarily long sequences. (There are richer settings in which that can be done, but that’s
a different puzzle!) So, what do we do?
Think about what it means for a set to be infinite. This means it goes on forever,
i.e., it’s unbounded; in other words, no matter what bound you might try to put on the
numbers in this set, they eventually get bigger than the bound.
Let’s focus on this, for the set of primes:
Every bound is exceeded by some prime.
We begin to see hints of quantifiers emerge:
Every bound is exceeded by some prime.
Rewording a bit:
For every bound, there exists a prime that is greater than the bound.
This is getting close enough to precise logical language that we can try writing it sym-
bolically:
∀𝑏 ∃𝑛 ∈ {primes} 𝑛 > 𝑏.
Let us move the condition on ∃𝑛 to later in the expression, so there is no qualification
on ∃𝑛, and so that we can use our predicate Prime rather than using other symbols and
relations we haven’t been given in this scenario. To do this, we use the method of § 5.7
(see the first line of the list in § 5.14):
We’re almost there! The remaining detail is that > has not been given to us as an
available predicate (and nor has <). So we need to find a way of saying the same thing
using ≤, which is one of our ingredients. This is straightforward, because 𝑛 > 𝑏 if and
only if 𝑛 ≰ 𝑏, which may be written using logical negation and ≤. So our final statement
in predicate logic is
∀𝑏 ∃𝑛 Prime(𝑛) ∧ ¬(𝑛 ≤ 𝑏). (5.9)
Now, think about the order of quantifiers here. We have seen previously that the
order of different quantifiers does matter (§ 5.10). So, in this case, it would be a mistake
to write the two quantifiers the other way round, as in
We will now look at this second statement in detail, to understand how it differs from
(5.9), and so gain insight into the effect of the different orders of quantifiers. In the
process, we will apply some other principles we have learned.
178 P R E D i C AT E L O G i C
What does this second statement Equation 5.10 say? There exists a number 𝑛 such
that, for every number 𝑏, 𝑛 is prime and 𝑛 > 𝑏. This certainly sounds different but its
meaning may not yet be clear.
The rules of predicate logic can help make it clear.
After the existential quantifier in (5.10), we have ∀𝑏 Prime(𝑛) ∧ ¬(𝑛 ≤ 𝑏), which is
of the form ∀𝑥(𝑃(𝑥) ∧ 𝑄(𝑥)) so we can apply one of our principles of doing logic with
quantifiers to deduce that
∀𝑏 Prime(𝑛) ∧ ¬(𝑛 ≤ 𝑏) is equivalent to (∀𝑏 Prime(𝑛)) ∧ (∀𝑏 ¬(𝑛 ≤ 𝑏)).
Now, consider
∀𝑏 Prime(𝑛).
The truth value of this depends entirely on the truth value of Prime(𝑛), because Prime(𝑛)
does not depend on 𝑏. This is an instance of the principle that quantifying an expression
over a variable that does not appear in the expression makes no difference to it, logically.
You can include such a quantifier, or not, according to your preference; the expressions
you get are equivalent. (See the end of § 5.9.) So, omitting this one superfluous universal
quantifier from ∀𝑏 Prime(𝑛), we see that
(∀𝑏 Prime(𝑛)) ∧ (∀𝑏 ¬(𝑛 ≤ 𝑏)) is equivalent to Prime(𝑛) ∧ (∀𝑏 ¬(𝑛 ≤ 𝑏)).
Substituting this back into the portion of (5.10) from ∀𝑏 onwards, we see that it is
equivalent to
∃𝑛 (Prime(𝑛) ∧ (∀𝑏 ¬(𝑛 ≤ 𝑏))).
Another way to write this, using the rule about restricting existential quantifiers (but
going backwards), is
∃𝑛 ∈ {primes} ∀𝑏 ¬(𝑛 ≤ 𝑏).
Rewriting ¬(𝑛 ≤ 𝑏) might make the meaning clearer:
∃𝑛 ∈ {primes} ∀𝑏 𝑛 > 𝑏.
So, this is saying that there is a prime number that is greater than every number!
This is clearly false, and in any case we see that the meanings of (5.9) and (5.10) are
very different. This illustrates the fact that the order of two different quantifiers (one
universal, one existential) does matter.
5.16 EXERCiSES
2. Suppose you have the predicates prolog and elvish, with the following meanings:
prolog(𝑋 ): 𝑋 knows the Prolog language.
elvish(𝑋 ): 𝑋 knows the Elvish language.
5.16 E X E R C i S E S 179
(b) Suppose that the statement in (a) is False. Starting with its negation, derive an
existential statement meaning that someone knows both these languages.
3. The predicate supervised has two arguments, both of which are people. The
meaning of supervised(𝑋 , 𝑌) is that person 𝑋 supervised the PhD of person 𝑌.
Express each of the following three sentences in predicate logic.
(a) Trevor Pearcey and Maston Beard designed the first computer in Australia.
Write this claim in predicate logic, using just the predicate knows.
180 P R E D i C AT E L O G i C
6. Suppose you have the equality predicate for sets, the symmetric difference function
for sets, and set variables 𝐴 and 𝐵. Write the statement of Theorem 4 in predicate logic.
7. For each of the following predicate logic statements, (i) identify the predicates,
functions, variables (including a suggested domain for each) and constants used; (ii)
state whether or not it is a proposition, and if it is, whether it is true or false.
(a) ∀𝑥 𝑥2 ≥ 0
(b) 𝑥2 < 0
(c) ∀𝑥 2𝑥 = 𝑥2
What predicates, functions and variables does this statement use? Restate, in words,
the theorem that is being stated here.
9. Suppose that
WordContainsLetter(quizzical, z) is True,
WordContainsLetter(quizzical, e) is False.
(b) Now suppose you also have a unary predicate Vowel whose domain is the set of
English letters and which is True when its argument is a vowel.
Now write the statement of Theorem 15 in predicate logic again, but using this new
predicate to write it more compactly this time.
5.16 E X E R C i S E S 181
10. Suppose you have the positive integer properties Even and Prime, which are true
precisely for even numbers and prime numbers respectively, and that you also have the
binary predicate ≤ on ℕ. Write a predicate logic expression for the statement
There are infinitely many odd prime numbers but only finitely many even
prime numbers.
11. The ternary predicate Date is defined for any (𝑑, 𝑚, 𝑦) ∈ ℕ × ℕ × ℕ, and is True
if (𝑑, 𝑚, 𝑦) represents a valid date in day-month-year format in the Gregorian calendar,
and is False otherwise.
Using Date and <, write an expression in predicate logic which is True if and only if
(𝑑1 , 𝑚1 , 𝑦1 ) and (𝑑2 , 𝑚2 , 𝑦2 ) are both valid dates and the first date comes chronologically
before the second.
• the function 𝑡 ∶ {programs} × 𝐴 ∗ → ℕ defined for all programs 𝑃 and input strings
𝑥 ∈ 𝐴 ∗ (where 𝐴 is an alphabet (§ 1.5)) by
• a string variable 𝑥 ∈ 𝐴 ∗ ;
13. There are many theorems that assert that there is a unique object that satisfies
certain conditions. In this exercise, we look at how to make such statements using
predicate logic.
Suppose you want to say that there is a unique 𝑥 with property 𝑃(𝑥). It’s not
enough to just write
∃𝑥 𝑃(𝑥),
182 P R E D i C AT E L O G i C
because that only enforces existence without guaranteeing uniqueness. Sometimes, peo-
ple use “∃!” as a shorthand for “there exists a unique”. With that shorthand,
∃!𝑥 𝑃(𝑥)
means
How would you write this statement just using the tools available to us in predicate
logic, i.e., without using “!”?
14.
(a) Prove that
in two ways: (i) reason it through directly, (ii) use the result from part (a) and what
you’ve learned about the relationship between existential and universal quantifiers.
6
SEQUENCES & SERIES
A sequence may be regarded as a list of objects. Their defining characteristic is that the
objects are in some order, so there is a first object, then a second object, then a third
object, and so on. We can use sequences to represent:
• items in a file, ordered by their position in the file, such as the words in a text file,
or the lines in a text file, or the frames in a movie file, or the rows in a spreadsheet;
• items ordered by time, such as the children in a family (by birth order), or a
person’s posts to one of their social media platforms, or the population of a city
in each year, or the dishes served in the successive courses of a banquet, or the
world’s early computers in order of when they first ran a program;
• items ordered in space along a line or curve, such as the houses along one side of
a street, or the floors of a building, or the waterfalls along a river, or the amino
acids along a protein molecule;
• items in some order of rank, such as the top ten songs according to some popularity
poll, or the world’s highest mountains in order, or the planets of the solar system
in order of mean distance from the Sun, or the world’s fastest computers in order
of their speed on some suite of benchmark problems;
• the successive letters of a string, in fact a string is just a finite sequence whose
members happen to be letters.
Inside a computer, data is always stored in some order, even in cases where the order
may not be important. So, for example, if a set — in which order does not matter —
is to be represented on a computer, then it must ultimately be represented as some
sequence of items somewhere in the computer’s memory.
Computation is a process that takes place over time. Any characteristic of a compu-
tation — such as the amount of memory it uses, or the amount of energy consumed, or
some aspect of the information displayed — gives rise to a sequence that specifies how
that characteristic changes over time.
We are also fundamentally concerned with how the time taken by some program
depends on the size of the input. A sequence of time measures, in order of input size,
shows how the time grows as the input size increases.
183
184 SEQUENCES & SERiES
Sequences are therefore one of the most fundamental of all abstract models.
6.1𝛼 D E F i N i T i O N S A N D N O TAT i O N
A sequence is a function whose domain is the set of positive integers or some finite
initial portion of it. So the domain of a sequence is either ℕ or [1, 𝑛]ℕ for some 𝑛 ∈ ℕ.
An infinite sequence is a sequence with domain ℕ.
A finite sequence is a sequence with domain [1, 𝑛]ℕ for some 𝑛. This is really the
same as an 𝑛-tuple, and finite sequences are often written in tuple notation:
This is fine if the sequence is finite and very short, and it can be useful even for long
sequences and infinite sequences provided the pattern is clear. But this way of writing a
sequence is really just for informal exposition, not for defining the sequence. A sequence
cannot be defined without either listing all its members in order (only practical if the
sequence is very short) or giving a precise rule by which, for each 𝑛 ∈ ℕ, the 𝑛-th term
can be determined.
If 𝑓 is a sequence, then its 𝑛-th term 𝑓(𝑛) is often denoted by 𝑓𝑛 . This can be
thought of as just a variation on the usual notation for function values, but one that is
used more often in the context of sequences. With this notation, we might rewrite (6.1)
as
𝑓1 , 𝑓2 , 𝑓3 , … .
We can always define a sequence over 𝐴 by using the fact that it is simply a function
𝑓 ∶ ℕ → 𝐴 and using one of our ways of defining functions (§ 2.3𝛼 ).
There is another common convention for defining sequences, which has a couple of
variants:
The first of these is reminiscent of the way we used formulas to define sets, on p. 3 in
§ 1.2𝛼 , except that now we use parentheses rather than curly braces because the order
matters. This convention also applies to finite sequences:
1, 4, 9, 16, 25, …
( 𝑛2 ∶ 𝑛 ∈ ℕ ) or ( 𝑛2 )∞
𝑛=1 .
We have seen how to define sequences by expressing each term as a function of its
position in the sequence. So, the 𝑛-th term 𝑓𝑛 is given by a formula in 𝑛.
Another way to define sequences is to give an expression that uses a previous term
in the sequence, or possibly several previous terms. For example, we could write
𝑓𝑛 = 𝑓𝑛−1 + 2.
But this alone is not sufficient, because we have to specify how to get started. If we
write
𝑓1 = 1, 𝑓𝑛 = 𝑓𝑛−1 + 2,
then we have defined the sequence of odd numbers
1, 3, 5, 7, … (6.2)
0, 2, 4, 6, … .
2, 4, 8, 16, … .
Previously, we might have defined this sequence as (2𝑛 ∶ 𝑛 ∈ ℕ). Now, we can define it
by
𝑓1 = 2, 𝑓𝑛 = 2𝑓𝑛−1 . (6.4)
In general, a recursive definition of a family of objects consists of:
The base cases, together with the general rule, must be sufficient so that, used together,
any object in the family is defined uniquely, precisely and clearly.
For sequences, the base case gives some initial terms explicitly, and then the general
case is given as an expression using previous terms in the sequence. A recursive definition
of a number sequence is called a recurrence relation.
relation We have seen three recurrence
relations so far: for the odd numbers in (6.2), for the even numbers in (6.3), and for the
powers of 2 in (6.4).
These sequences are all familiar and could be defined either by a formula in 𝑛 or
by a recurrence relation, according to taste or the needs of the situation in which they
are being used. But there are many situations where a recurrence relation is the most
natural way to define a sequence. For example, consider the factorials
1, 2, 6, 24, 120, … .
𝑓1 = 1, 𝑓𝑛 = 𝑛𝑓𝑛−1 . (6.5)
This also has a formulaic definition, (𝑛! ∶ 𝑛 ∈ ℕ), but 𝑛! is really just a standard ab-
breviation for 𝑛(𝑛 − 1)(𝑛 − 2) ⋯ 3 ⋅ 2 ⋅ 1, so the most succinct definition without using
special abbreviations is the recurrence relation. This sequence also illustrates the point
that, in the rule of a recurrence relation, the position 𝑛 does not have to be confined
to the subscripts; it can also be used, in its own right, in the expression, as it is on the
right-hand side of the second equation in (6.5).
6.3𝛼 A R i T H M E T i C S E Q U E N C E S 187
Exercise: Give recurrence relations for each of the following sequences: the positive
integers; the negative even integers; the squares; the sequence whose 𝑛-th term is the
sum of the first 𝑛 positive integers; the sequence whose 𝑛-th term is the sum of the
reciprocals of the first 𝑛 positive integers.
So far, all our recurrence relations have rules that only use the previous term in the
sequence. But rules can use terms that appear earlier that that. Consider the recurrence
relation
𝑓1 = 2, 𝑓2 = 4, 𝑓𝑛 = 4𝑓𝑛−2 . (6.6)
With the rule 𝑓𝑛 = 4𝑓𝑛−2 , it is not sufficient to just define one base case, such as 𝑓1 = 2.
If we did that, then we could only compute every second term:
We need another base case to specify 𝑓2 , and then the recurrence will give us values at
all even positions too. If we set 𝑓2 = 4, as in (6.6), then we again get the powers of 2, so
in this case we have only come up with a more complicated way to define that sequence.
But we can do other things too. For example, try
𝑓1 = 2, 𝑓2 = 0, 𝑓𝑛 = 4𝑓𝑛−2 . (6.7)
2𝑛 , if 𝑛 is odd;
𝑓𝑛 =
0, if 𝑛 is even.
An arithmetic sequence,
sequence also called an arithmetic progression,
progression is a number se-
quence in which the difference between every pair of consecutive terms is the same.
This means there is some common difference 𝑑 such that every term is obtained by
adding 𝑑 to its predecessor. This, together with its first term which we’ll call 𝑎, gives
the following recurrence relation:
𝑓1 = 𝑎, 𝑓𝑛 = 𝑓𝑛−1 + 𝑑.
An arithmetic sequence is specified by giving its first term 𝑎, its common difference
𝑑, whether it is finite or infinite and, if it is finite, its number of terms 𝑛. In the finite
case, the sequence looks like
an account with one fixed regular 220, 120, 20, −80, −180 220 −100 5
bill
annual balances ($) of a 5-year 100, 110, 120, 130, 140, 150 100 1.1 6
investment with initial bal-
ance $100 and simple interest
accruing at 10% p.a.
A geometric sequence,
sequence also called a geometric progression,
progression is a number sequence
in which the ratio between every pair of consecutive terms is the same. This means
there is some common ratio 𝑟 such that every term is obtained by multiplying its
predecessor by 𝑟.
Again, we call the first term 𝑎. We have the following recurrence relation:
𝑓1 = 𝑎, 𝑓𝑛 = 𝑓𝑛−1 ⋅ 𝑟.
1 The year 2100 does indeed belong to this century, which goes from 2001 to 2100 inclusive. But 2100 is not
a leap year, despite being a multiple of 4.
6.5 H A R M O N i C S E Q U E N C E S 189
𝑓1 = 1, 𝑓𝑛 = 2𝑓𝑛−1 + 1. (6.10)
This sequence is nether arithmetic nor geometric, although it has both a constant mul-
tiplier and an additive constant, so in a way it seems like a mix of both types. Can we
find, and prove, a formula for it?
We use this example to illustrate a very common approach:
2. study those terms, look for patterns, and try and conjecture an expression that
fits the pattern you have observed so far;
3. prove
prove, by induction on 𝑛, that your formula works for all 𝑛.
6.6 F R O M R E C U R S i V E D E F i N i T i O N S T O F O R M U L A S 191
1. explore
explore:
𝑓1 = 1 (given in (6.10))
𝑓2 = 2𝑓1 + 1 = 2 ⋅ 1 + 1 = 3
𝑓3 = 2𝑓2 + 1 = 2 ⋅ 3 + 1 = 7
𝑓4 = 2𝑓3 + 1 = 2 ⋅ 7 + 1 = 15
2. conjecture
conjecture:
We can see that the values of 𝑓𝑛 , for 𝑛 ≤ 4, are each one less than a power of 2. This
trend looks likely to continue! We write this down as a formula, taking care to
get the exponent of 2 correct and that the formula works correctly for the known
initial case 𝑛 = 1. So we propose
𝑓𝑛 = 2𝑛 − 1. (6.11)
3. prove
prove:
Now we prove (6.11) holds for all 𝑛 ∈ ℕ. It is natural to prove this by induction.
Inductive Basis:
If 𝑛 = 1, then we know from the initial condition in (6.10) that 𝑓1 = 1, and for
𝑛 = 1 the formula (6.11) gives 𝑓1 = 21 − 1 = 2 − 1 = 1, so the formula in (6.11) is
correct for 𝑛 = 1.
Inductive step:
Let 𝑘 ≥ 1.
Assume that (6.11) holds for 𝑛 = 𝑘, i.e., that 𝑓𝑘 = 2𝑘 − 1. (This is the Inductive
Hypothesis.)
Now consider 𝑓𝑘+1 . The recursive rule from (6.10) gives
So we have established that (6.11) holds for 𝑛 = 𝑘 + 1 too. This completes the
Inductive Step.
Conclusion:
Therefore, by Mathematical Induction, (6.11) holds for all 𝑛 ∈ ℕ.
Now we consider another recursive definition, this time one where each term depends
on the two previous terms:
It is difficult to determine the behaviour of the Fibonacci sequence just from its
recursive definition. So, again, we take the explore-conjecture-prove approach. Our
exploration gives the terms shown above, and you can explore further using, say, a
6.7 T H E F i B O N A C C i S E Q U E N C E 193
The maximum ratio between any two consecutive terms is 𝑓3 /𝑓2 = 2/1 = 2, and our
explorations suggest that it is likely to stay well below that. So we can conjecture the
upper bound
𝑓𝑛 ≤ 2𝑛 . (6.13)
We now prove this by induction on 𝑛.
Inductive Basis:
• We need another base case too, for 𝑛 = 2. The reason for this will become
evident shortly. When 𝑛 = 2, we have 𝑓2 = 1, and our upper bound is 22 = 4,
so we have 𝑓2 ≤ 22 (which is an even looser upper bound than for 𝑛 = 1).
Inductive Step:
Let 𝑘 ≥ 2.
Assume that 𝑓𝑙 ≤ 2𝑙 for all 𝑙 ≤ 𝑘. (This is the Inductive Hypothesis.)
Comment: We can now see why we needed the Inductive Basis to cover
𝑛 = 2 as well as 𝑛 = 1: our recurrence only works for 𝑛 ≤ 3, so smaller
cases need to be dealt with separately.
Conclusion:
Therefore, by Mathematical Induction, our upper bound (6.13) holds for all 𝑛.
There are several points worth noting about this upper bound and its proof.
Firstly, we can make the upper bound tighter by putting 𝑛 − 1 in the exponent
instead of 𝑛:
𝑓𝑛 ≤ 2𝑛−1 . (6.14)
The proof would go through again with very little change, in fact the main changes
needed are to the Inductive Basis. Now, 𝑓1 = 1 ≤ 21−1 and 𝑓2 = 1 ≤ 22−1 , so this tighter
upper bound still holds when 𝑛 = 1 and 𝑛 = 2. The rest of the proof, which only uses
the recursive rule, is unchanged except that, when applying the Inductive Hypothesis,
we insert the new upper bounds instead of the old ones.
Secondly, another way to try to refine the upper bound is to put a constant factor
in front. Let’s try
𝑓𝑛 ≤ 𝛼 ⋅ 2𝑛 . (6.15)
What is the best factor 𝛼 to use? We need the Inductive Basis to still work, so we want
These tell us that 𝛼 ≥ 1/2, so putting 𝛼 = 1/2 should work. This gives an upper bound
𝑓𝑛 ≤ 12 2𝑛 .
But the right-hand side here equals 2𝑛−1 , so this upper bound is really just a restatement
of (6.14).
This use of a constant factor should be kept in mind, as it is a common technique
for refining formulas for bounds for sequences given by recurrence relations.
Thirdly, the only algebraic property of 2𝑛 that we used is
2 + 1 ≤ 22 . (6.17)
This indicates that the Inductive Step would still work if, instead of using the upper
bound 2𝑛 , we used 𝑟 𝑛 where 𝑟 is a smaller number than 2, provided we still have an
inequality like (6.17) except for 𝑟 rather than 2. So we could choose 𝑟 to satisfy
𝑟 + 1 ≤ 𝑟2. (6.18)
𝑓𝑛 ≤ 𝑟 𝑛 . (6.20)
In presenting the new proof by induction, we give it in full, even though most of it is
unchanged. The parts that are changed are given in blue; this is really just replacing 2
by 𝑟 throughout. (We have also abbreviated some of the explanations.)
Inductive Basis:
Inductive Step:
Let 𝑘 ≥ 2.
Assume that 𝑓𝑙 ≤ 𝑟 𝑙 for all 𝑙 ≤ 𝑘. (This is the Inductive Hypothesis.)
Now consider 𝑓𝑘+1 . Since 𝑘 ≥ 2, we have 𝑘 +1 ≥ 3, so we can apply the recursive
rule (6.12):
Conclusion:
Therefore, by Mathematical Induction, our upper bound (6.20) holds for all 𝑛.
We can actually do even better by choosing 𝑟 to satisfy (6.18) with equality, i.e., by
requiring it to satisfy
𝑟 + 1 = 𝑟2.
Rearranging, this is equivalent to
𝑟 2 − 𝑟 − 1 = 0. (6.21)
So 𝑟 should be a root of this quadratic equation, and the usual formula for these roots
gives
1 ± √5
𝑟 =
2
so the two roots are
1 + √5 1 − √5
𝑟1 = = 1.61803 … and 𝑟2 = = −0.61803 … . (6.22)
2 2
Either of these would enable the Inductive Step to work, but only 𝑟1𝑛 works as an upper
bound, since it works for the Inductive Basis too whereas 𝑟2𝑛 does not work as an upper
bound for either 𝑛 = 1 or 𝑛 = 2.
The approach we have taken to upper bounds, in § 6.7.1, is readily adapted to lower
bounds, with a little adjustment.
The minimum ratio 𝑓𝑛+1 /𝑓𝑛 between consecutive terms is 𝑓2 /𝑓1 = 1, but using 1𝑛 as
a lower bound for 𝑓𝑛 is not very useful, even though it is clearly correct! So we look for
a higher ratio and see what we can do with it.
Exploration shows that, from the second term onwards, the ratio 𝑓𝑛+1 /𝑓𝑛 between
consecutive terms is ≥ 1.5. It is tempting to go from this ratio to a proposed lower
bound for 𝑓𝑛 of 1.5𝑛 . After all, we did something similar for upper bounds at the start
of the previous section (§ 6.7.1), when we observed that the ratio of consecutive terms
is always ≤ 2 and then found that an upper bound of 2𝑛 would work.
But bounds on ratios of consecutive terms do not always immediately yield bounds
on the sequence itself. In this case, 1.5𝑛 does not work as a lower bound, because it
fails for 𝑛 = 1 and 𝑛 = 2. We need to get a lower bound that works for those cases too.
Then, when we prove our lower bound later, we will be able to get the Inductive Basis
to work.
6.7 T H E F i B O N A C C i S E Q U E N C E 197
𝑓𝑛 ≥ 49 ⋅ 1.5𝑛 . (6.23)
We now prove this by induction on 𝑛. Again, we give the proof in full, with the
new/changed parts in blue.
Inductive Basis:
4
• 𝑛 = 1: we have 𝑓1 = 1, and our lower bound is 9 ⋅ 1.5 = 23 ≤1, so the lower
bound holds.
4
• 𝑛 = 2: we have 𝑓2 = 1, and our lower bound is 9 ⋅ 1.52 = 49 ⋅ 94 =1, so the lower
bound holds (with equality).
Inductive Step:
Let 𝑘 ≥ 2.
Assume that 𝑓𝑙 ≥ 49 ⋅ 1.5𝑙 for all 𝑙 ≤ 𝑘. (This is the Inductive Hypothesis.)
Now consider 𝑓𝑘+1 . Since 𝑘 ≥ 2, we have 𝑘 +1 ≥ 3, so we can apply the recursive
rule (6.12):
Conclusion:
Therefore, by Mathematical Induction, our lower bound (6.23) holds for all 𝑛.
The Inductive Step would still work for any bound of the form
𝑓𝑛 ≥ 𝛼 𝑟 𝑛
provided 𝑟 satisfies
𝑟 𝑘 + 𝑟 𝑘−1 ≥ 𝑟 𝑘+1 .
Dividing each side by 𝑟 𝑘−1 , this is equivalent to
𝑟 + 1 ≥ 𝑟2. (6.24)
If we had equality, we would have the same quadratic equation we had previously, in
(6.21), which has the two roots 𝑟1 and 𝑟2 given in (6.22). To ensure that the inequality
(6.24) is satisfied, we require that 𝑟 lies between those two roots:
𝑟2 ≤ 𝑟 ≤ 𝑟 1 .
We can let 𝑟 be equal to 𝑟1 or 𝑟2 , and still the inequality is satisfied, which means the
Inductive Step should still work, but we must still then choose 𝛼 so that the Inductive
Basis works too.
Suppose we use 𝑟1 = (1 + √5)/2 = 1.61803 … and let us determine 𝛼 so that the
proposed lower bound
𝑓𝑛 ≥ 𝛼 𝑟1𝑛
works. For the Inductive Basis, we require
Now the fact that 𝑟1 > 1 implies that 𝑟1−2 < 𝑟1−1 , so in fact the second of the two inequal-
ities (6.25) implies the first, so we only need to require that
𝛼 ≤ 𝑟1−2
Now,
−2
1 + √5 3 − √5
𝑟1−2 = ⒧ ⒭ = = 0.381966 … .
2 2
𝑓𝑛 ≥ 𝑟1−2 𝑟1𝑛
6.7 T H E F i B O N A C C i S E Q U E N C E 199
which is just
𝑓𝑛 ≥ 𝑟1𝑛−2 . (6.26)
We now give the proof by induction that this holds for all 𝑛 ∈ ℕ.
Inductive Basis:
Inductive Step:
Let 𝑘 ≥ 2.
Assume that 𝑓𝑙 ≥𝑟1 𝑙−2 for all 𝑙 ≤ 𝑘. (This is the Inductive Hypothesis.)
Now consider 𝑓𝑘+1 . Since 𝑘 ≥ 2, we have 𝑘 +1 ≥ 3, so we can apply the recursive
rule (6.12):
Conclusion:
Therefore, by Mathematical Induction, our lower bound (6.26) holds for all 𝑛.
𝑟1𝑛−2 ≤ 𝑓𝑛 ≤ 𝑟1𝑛 .
𝑓𝑛
𝑟1−2 ≤ ≤ 1.
𝑟1𝑛
This tells us that 𝑓𝑛 always lies within a constant ratio of 𝑟1𝑛 . Although the exact ratio
𝑓𝑛 /𝑟1𝑛 varies, it is constrained to lie between lower and upper bounds that are constant,
i.e., these bounds do not depend on 𝑛.
This tells us that the growth of 𝑓𝑛 , as 𝑛 increases, is very like the growth of 𝑟1𝑛 .
200 SEQUENCES & SERiES
We have managed to “sandwich” 𝑓𝑛 quite tightly between lower and upper bounds that
grow in the same way as 𝑛 increases. Encouraged by this success, we try now to get the
two bounds to coincide, so that we obtain an exact formula for 𝑓𝑛 .
When we looked for upper and lower bounds of the form 𝑟 𝑛 , we used the quadratic
equation
𝑟 2 − 𝑟 − 1 = 0. (6.27)
The intuition behind this is that any 𝑟 that satisfies this equation also satisfies
so that 𝑟 𝑛 actually satisfies the Fibonacci recurrence (where each term is the sum of the
two previous terms), although dealing with the base case is a separate matter.
We took the largest root 𝑟1 = (1 + √5)/2, which is also the only positive root. For
upper bounds, we chose 𝑟 to be ≥ 𝑟1 , while for lower bounds, we chose 𝑟 to be ≤ 𝑟1 .
(For the lower bound, we also needed 𝑟 ≥ 𝑟2 , but this constraint did not rear its head at
us, as the values we considered were so close to 𝑟1 .) In each case, we observed that we
could actually take 𝑟 to be equal to 𝑟1 , thereby using the same value, 𝑟1 , in both upper
and lower bounds. The distinction between upper and lower bounds came down to our
choice of constant factors, 𝛼, and these were determined by the need for our bounds to
work on the base cases 𝑛 = 1 and 𝑛 = 2.
This suggests that an expression of the form
𝛼1 𝑟1𝑛 ,
with suitable choice of constant factor 𝛼1 , might be a very good estimate for 𝑓𝑛 . But it
won’t be exact, so we need something else too.
We have neglected the smaller root, 𝑟2 , so far. But it does provide us with something
else that satisfies the recurrence relation, because
by (6.28). So, if we add some multiple of 𝑟2𝑛 to any expression satisfying the Fibonacci
recurrence, then the new expression will also satisfy the Fibonacci recurrrence (although
not necessarily the base cases).
Suppose, then, that our exact formula has contributions not just of the form 𝑟1𝑛 ,
using the larger root 𝑟1 of (6.31), but also of the form 𝑟2𝑛 , using the smaller root 𝑟2 .
Give each of these contributions its own factor; call these factors 𝛼1 and 𝛼2 , respectively.
Then we seek a formula of the form
𝑓𝑛 = 𝛼1 𝑟1𝑛 + 𝛼2 𝑟2𝑛 .
6.7 T H E F i B O N A C C i S E Q U E N C E 201
We now have two constants to vary, namely 𝛼1 and 𝛼2 , in our quest for an exact
expression for 𝑓𝑛 . This seems like progress, because the two base cases give us two
conditions to satisfy. Previously, when we only looked for expressions of the form 𝛼1 𝑟1𝑛 ,
we had only one constant to play with, yet we had two base cases to attend to, for 𝑓1
and 𝑓2 . That was ok when we only wanted bounds (lower or upper); in that situation,
it’s fine if we only get inequality (rather than equality) in one of the base cases, as long
as the inequality goes in the right direction (and we were always able to achieve this).
But if we want an exact formula for 𝑓𝑛 , then we need the formula to work exactly for
both the base cases. That’s two equations (𝑓1 = 1, 𝑓2 = 1), so it’s a good idea to have
two variables to solve for. Let’s see what 𝛼1 and 𝛼2 can do for us.
What should 𝛼1 and 𝛼2 be? Consider the two base cases. When 𝑛 = 1, we have
𝑓1 = 1, so we need
𝛼1 𝑟1 + 𝛼2 𝑟2 = 1.
When 𝑛 = 2, we have 𝑓2 = 1, so we need
𝛼1 𝑟12 + 𝛼𝑟22 = 1.
Here we have two linear equations in the two unknowns 𝛼1 and 𝛼2 . We then use our
favourite technique for solving such systems. Once we have done so, we find that
1 −1
𝛼1 = , 𝛼2 = .
√5 √5
We now prove by induction that it works. Once again, the new or modified parts of the
proof are in blue.
Inductive Basis:
−1 −1 −1
• 𝑛 = 1: we have 𝑓1 = 1, and our formula gives √5 𝑟11 − √5 𝑟21 = √5 (𝑟1 −
−1
𝑟2 ) = √5 √5 = 1, so the formula works exactly.
−1 −1 −1
• 𝑛 = 2: we have 𝑓2 = 1, and our formula gives √5 𝑟12 − √5 𝑟22 = √5 (𝑟12 −
−1 −1
𝑟22 ) = √5 (𝑟1 + 𝑟2 )(𝑟1 − 𝑟2 ) = √5 ⋅ 1 ⋅ √5 = 1, so the formula works exactly
again.
202 SEQUENCES & SERiES
Inductive Step:
Let 𝑘 ≥ 2.
−1 −1
Assume that 𝑓𝑙 = √5 𝑟1𝑙 −√5 𝑟2𝑙 for all 𝑙 ≤ 𝑘. (This is the Inductive Hypothe-
sis.)
Now consider 𝑓𝑘+1 . Since 𝑘 ≥ 2, we have 𝑘 +1 ≥ 3, so we can apply the recursive
rule (6.12):
Conclusion:
Therefore, by Mathematical Induction, our formula (6.29) holds for all 𝑛.
We now have an exact formula for the 𝑛-th term of the Fibonacci sequence. The
formula itself is certainly not obvious. In fact, if you only know the standard recursive
definition and then this formula is presented to you “out of the blue”, then it would
probably seem mysterious and complicated, and you would wonder how anyone came
up with it. The appearance of the irrational numbers
1 + √5 1 − √5
,
2 2
would probably seem strange, especially as all the terms in the sequence are positive
integers.
Hopefully this section has removed some of the mystery, and equipped you with the
skills to derive formulas for some other recursively-defined sequences. You can see now
that those two strange irrational numbers are just the two roots of the equation
𝑟 2 − 𝑟 − 1 = 0, (6.31)
which we used in (6.21) and (6.31). And this equation comes directly from the recurrence
relation defining the sequence: a simple rearrangement of the recurrence 𝑓𝑛 = 𝑓𝑛−1 +𝑓𝑛−2
gives
𝑓𝑛 − 𝑓𝑛−1 − 𝑓𝑛−2 = 0. (6.32)
6.7 T H E F i B O N A C C i S E Q U E N C E 203
Compare (6.31) and (6.32). Although they are different kinds of equations (the
former is a quadratic equation in a single real variable, the latter is a linear equation for
any three successive terms in a sequence), they also have a common pattern. In each
case, the coefficients on the left-hand side are the same: 1, −1, −1, and the right-hand
side is 0. In each case, the three terms may be viewed as being “counted” or “indexed” by
three consecutive decreasing integers: for (6.31), the exponents are 2,1,0, while for (6.32),
the subscripts are 𝑛, 𝑛 − 1, 𝑛 − 2. We say that (6.31) is the characteristic equation
for the recurrence relation (6.32).
This is one instance of a general method for solving recurrences: first convert the
recurrence to its characteristic equation, then find the roots of that equation, and then
try to express the 𝑛-th term as a sum, with suitable coefficients, of 𝑛-th powers of the
roots of the characteristic equation. It does not always work quite like this; in particular,
there are some complicating details to attend to when some root of the characteristic
equation occurs more than once. But this link, between the recurrence and the roots
of the characteristic equation, lies at the heart of a general method that can be used to
solve any linear recurrence relation. We will not give the general method in full here; it
is covered in more advanced courses on discrete mathematics.
An important aspect of the expression for the 𝑛-th Fibonacci number, (6.30), is the
dominant role played by
𝑛
1 + √5
⒧ ⒭ .
2
Of the two roots (1 ± √5)/2 of (6.31), this one is the greater in size, so its 𝑛-th power
will grow much larger than the 𝑛-th power of the other root. Furthermore, the size
of that other root is < 1, so its 𝑛-th power rapidly approaches 0, so it really provides
only a small adjustment to the number calculated. So Fibonacci numbers will be well
approximated, for large 𝑛, by
𝑛
1 1 + √5
𝑓𝑛 ≈ ⒧ ⒭ .
√5 2
This quantity is never an integer, but it does make the nature of the growth of 𝑓𝑛 clear.
It could be said that (𝑓𝑛 ∶ 𝑛 ∈ ℕ) is approximately geometric for large 𝑛, and it shows
that the ratio between successive terms, as 𝑛 → ∞, tends to (1 + √5)/2.
These observations align with our discussion in § 6.7.3, where we observed that the
sequence grows like 𝑟1𝑛 , where 𝑟1 = (1 + √5)/2.
This limiting ratio of the Fibonacci numbers, (1 + √5)/2, is the famous quantity
known as the golden ratio,
ratio often denoted by 𝜑 or 𝜏. In decimal form, it is
1.6180339887 … .
It is rich in mathematical properties and pops up all over the place, not only in computer
science but also in nature and in art. It has an important geometric interpretation: if
204 SEQUENCES & SERiES
(−1)𝑛 ∞ 1 1 1 1
⒧ ⒭ −1, , − , , − , … approaches 0 arbitrarily closely
𝑛 𝑛=1 2 3 4 5
∞
1 1 2 3 4
⒧1 − ⒭ 0, , , , , … approaches 1 arbitrarily closely
𝑛 𝑛=1 2 3 4 5
∞
1 1 3 7 15
⒧1 − ⒭ 0, , , , , … approaches 1 arbitrarily closely
2𝑛 𝑛=1 2 4 8 16
(−1)𝑛 ∞
𝑛 1 2 3 4 5 6 odd terms approach −1,
⒧(−1) − ⒭ 0, , − , , − , , , …
𝑛 𝑛=1 2 3 4 5 6 7 even terms approach 1.
you take a rectangle whose side lengths are in this ratio (i.e., with the golden ratio as its
aspect ratio), and you trim it by one straight cut to create a square based on the shorter
side of the rectangle, then the smaller rectangle you trim off has the same proportions
as the one you started with. Such a rectangle is called a golden rectangle.
rectangle
Given a number sequence, we can ask, how does it behave in the long run? What can
we say about how it grows (or declines) as 𝑛 gets very large? We have considered this
already for the sequence of reciprocals of positive integers (§ 6.5) and the Fibonacci
sequence (§ 6.7).
Sequences are very diverse, and various kinds of behaviour are possible. Some se-
quences increase without bound, such as (𝑛)∞ 2 ∞ ∞
𝑛=1 or (𝑛 )𝑛=1 or (log 𝑛)𝑛=1 . Others decrease
∞ 2 ∞ ∞
without bound, such as (−𝑛)𝑛=1 or (−𝑛 )𝑛=1 or (− log 𝑛)𝑛=1 . Others jump around; these
may be bounded both above and below, like ((−1)𝑛 )∞ 𝑛=1 , which starts −1, 1, −1, 1, …, or
unbounded above and below, like ((−1)𝑛 𝑛)∞ 𝑛=1 , which starts −1, 2, −3, 4, …, or bounded
above and unbounded below, or unbounded above and bounded below.
Some sequences eventually “settle down” in such a way that, beyond a certain point,
the terms look like approximations to some specific number, with these approximations
getting arbitrarily close. Consider the examples in Table 6.1.
Intuitively, the limit of a sequence is a number which the sequence’s terms get closer
and closer to, forever, and the terms get arbitrarily close, meaning that however close
6.8 L i M i T S O F i N F i N i T E S E Q U E N C E S 205
you want them to be, they eventually get that close or closer, and stay that close or
closer from some point onwards.
Informally, the limit of a sequence (𝑎𝑛 )∞ 𝑛=1 is a number ℓ such that, however close
to ℓ you want to get, there is a position in the sequence such that all terms beyond that
position are within that distance of ℓ. You can set your required distance from ℓ, which
we call 𝜀, to be as small as you like (except it can’t be 0), and there is always some
position 𝑁 such that all terms from that position onwards are within that distance 𝜀
of ℓ.
Formally, the sequence (𝑎𝑛 )∞𝑛=1 has limit ℓ, and we write
lim 𝑎𝑛 = ℓ,
𝑛→∞
if for all 𝜀 > 0 there exists 𝑁 ∈ ℕ such that for all 𝑛 ≥ 𝑁 we have
|𝑎𝑛 − ℓ| < 𝜀.
It might help to study carefully how the various parts of this formal definition align with
the intuitive description we gave earlier.
lim𝑛→∞ 𝑎𝑛 = ℓ means:
in symbols in words
For example, suppose 𝑎𝑛 = 1/𝑛. (See § 6.5 and, later, § 6.15 for discussion of
this important sequence.) We claim that its limit, as 𝑛 → ∞, is 0. This makes sense
intuitively, as these numbers 1/𝑛 get smaller and smaller, getting as close to 0 as you like.
Let us see how this notion aligns with the definition given in the previous paragraph.
Pick any positive real number 𝜀, as our measure of how close to 0 we want to get. How
far along the sequence do we have to go, to get and stay that close, or closer? This is
the role of 𝑁 , and this depends on 𝜀. In this case, we can pick 𝑁 to be any positive
integer greater than 1/𝜀:
1
𝑁 > , 𝑁 ∈ ℕ. (6.33)
𝜀
206 SEQUENCES & SERiES
The last inequality here says that the distance between the term 𝑎𝑛 = 1/𝑛 and 0 is < 𝜀.
So we have shown that
1
lim = 0. (6.34)
𝑛→∞ 𝑛
For another example, suppose 𝑎𝑛 = 𝑛/(𝑛 +1). What is its limit as 𝑛 → ∞? Consider
the first few terms:
1 2 3 4
, , , ,….
2 3 4 5
From these, it looks like the terms approach 1 from below. In general, the numerator
and denominator of 𝑛/(𝑛 + 1) are very similar, differing only by 1, and this difference
should matter less and less as 𝑛 gets larger and larger. So, intuitively, we might expect
the limit as 𝑛 → ∞ to be 1. We now show that this is indeed the case.
Let 𝜀 > 0. We want to make 𝑛 large enough to ensure that
𝑛
− 1 < 𝜀. (6.35)
𝑛+1
On the left-hand side here, we are taking the absolute value of a negative quantity, since
𝑛/(𝑛 + 1) < 1. This observation helps us remove the absolute value function and then
we can simplify using algebra:
𝑛 𝑛
− 1 = 1 −
𝑛+1 𝑛+1
(𝑛 + 1) − 𝑛
=
𝑛+1
1
= .
𝑛+1
6.8 L i M i T S O F i N F i N i T E S E Q U E N C E S 207
1
< 𝜀.
𝑛+1
But this is equivalent to
1
𝑛+1 >
𝜀
which in turn is equivalent to
1
𝑛> − 1.
𝜀
We can now write our limit proof. Given 𝜀, we define 𝑁 to be a positive integer >
(1/𝜀) − 1. Then, for any 𝑛 ≥ 𝑁 , we have 𝑛 > (1/𝜀) − 1 too. But, as we have just seen,
this is equivalent to
1
<𝜀
𝑛+1
which in turn is equivalent to
𝑛
− 1 < 𝜀.
𝑛+1
Since this inequality holds for all 𝑛 ≥ 𝑁 , we have completed the proof that
𝑛
lim = 1.
𝑛→∞ 𝑛+1
For yet another example, suppose 𝑎𝑛 = 1/2𝑛 . Again, we claim the limit as 𝑛 → ∞
is 0. Pick any 𝜀 > 0. To get 1/2𝑛 to be < 𝜀, we need 1/2𝑛 < 𝜀, which is equivalent to
2𝑛 > 1/𝜀. Taking logarithms of each side, we obtain 𝑛 > log2 (1/𝜀). So we can take any
positive integer 𝑁 satisfying
Having chosen this 𝑁 , any 𝑛 ≥ 𝑁 satisfies 𝑛 > log2 (1/𝜀), and therefore satisfies 1/2𝑛 < 𝜀.
So
1
𝑛 ≥ 𝑁 ⇒ 𝑛 − 0 < 𝜀.
2
So we have
1
lim = 0. (6.37)
𝑛→∞ 2𝑛
Now let 𝑟 be any real constant in the range −1 < 𝑟 < 1, and suppose 𝑎𝑛 = 𝑟 𝑛 . Now
what is lim𝑛→∞ 𝑎𝑛 ? The previous example is the case 𝑟 = 1/2, and the argument given
there can be adapted to handle any 𝑟 in the range −1 < 𝑟 < 1.
The algebra is neater when 0 < 𝑟 < 1, so we do that first. Given 𝜀 > 0, choose
positive integer 𝑁 > log1/𝑟 (1/𝜀). Then, for any 𝑛 ≥ 𝑁 , we have
Taking the reciprocal of each side, which entails reversing the inequality, gives
𝑟𝑛 < 𝜀
lim 𝑟 𝑛 = 0. (6.38)
𝑛→∞
The case 𝑟 = 0 is trivial, since then we have 𝑟 𝑛 = 0𝑛 = 0 for all 𝑛, so the limit is
clearly 0 because the value is always 0.
If −1 < 𝑟 < 0, then 0 < −𝑟 < 1, so we can apply our earlier observation (6.38), with
−𝑟 instead of 𝑟, to deduce that
lim (−𝑟)𝑛 = 0.
𝑛→∞
This sequence ((−𝑟)𝑛 )∞𝑛=1 consists of positive terms that decrease towards 0, approaching
0 in the limit as 𝑛 → ∞. It follows that the sequence of its negations, (−(−𝑟)𝑛 )∞ 𝑛=1 ,
consists entirely of negative terms that increase towards 0, with
lim −(−𝑟)𝑛 = 0.
𝑛→∞
−(−𝑟)𝑛 ≤ 𝑟 𝑛 ≤ (−𝑟)𝑛 .
It follows that the terms 𝑟 𝑛 have no choice but to also converge on 0 as 𝑛 → ∞. They
alternate between being positive and negative, but their sizes decrease towards 0 and
approach it in the limit.:
lim 𝑟 𝑛 = 0.
𝑛→∞
In conclusion, we have
Theorem 25.
25 If −1 < 𝑟 < 1 then
lim 𝑟 𝑛 = 0.
𝑛→∞
6.8 L i M i T S O F i N F i N i T E S E Q U E N C E S 209
We will use this fact later, when we consider sums of infinite geometric sequences
(§ 6.14).
Another important limit, which takes a bit more effort to prove, is the limit of an
𝑛-th root (rather than an 𝑛-th power, as above). If 𝑟 > 0 then we use 𝑟 1/𝑛 to refer to
the sole positive 𝑛-th root of 𝑟.
Theorem 26.
26 If 𝑟 > 0 then
lim 𝑟 1/𝑛 = 1.
𝑛→∞
The condition 𝑟 > 0 is needed because for 𝑟 < 0 taking 𝑛-th roots can involve com-
plex numbers and there may be no positive root or no real root at all.
Once we have established some basic limits like these, we can use them to derive
other limits using standard principles for combining limits. One of these principles is:
lim 𝑎𝑛 = 𝑎, lim 𝑏𝑛 = 𝑏,
𝑛→∞ 𝑛→∞
then the sequence (𝑎𝑛 + 𝑏𝑛 ∶ 𝑛 ∈ ℕ), formed by adding the corresponding terms in those
two sequences together, has limit 𝑎 + 𝑏:
lim (𝑎𝑛 + 𝑏𝑛 ) = 𝑎 + 𝑏.
𝑛→∞
Similarly,
lim (𝑎𝑛 − 𝑏𝑛 ) = 𝑎 − 𝑏,
𝑛→∞
lim (𝑎𝑛 𝑏𝑛 ) = 𝑎𝑏,
𝑛→∞
lim (𝑎𝑛 /𝑏𝑛 ) = 𝑎/𝑏,
𝑛→∞
with the last one requiring that 𝑏𝑛 > 0 for all 𝑛 and also 𝑏 > 0.
For example,
Some sequences increase without bound. They have no finite limit, and not even
any finite upper bound. In such cases, we can say that their limit, as 𝑛 → ∞, is ∞. For
example,
lim 𝑛2 = ∞.
𝑛→∞
If a sequence at some point goes below 0 and keeps decreasing without any finite lower
bound, then we can say that its limit, as 𝑛 → ∞, is −∞. For example,
We are often interested in sequences that grow without bound. This is the typical
situation for the running time of a nontrivial algorithm, as a function of some positive
integer measure 𝑛 of the input size. (For example, 𝑛 might be the number of bytes in
an input file, or the number of names in a list to be sorted, or the number of digits in
a number to be factorised.) In this kind of situation, just saying that the limit is ∞
doesn’t say much and isn’t very useful. For example, in different situations you may
𝑛
have algorithms that take time 𝑛, or 𝑛2 , or √𝑛, log𝑛 , or 𝑛 log 𝑛, or 2𝑛 , or 𝑛!, or 22 .
These are wildly different running times, but stating their limits does not reflect this:
lim log 𝑛 = ∞,
𝑛→∞
lim √𝑛 = ∞,
𝑛→∞
lim 𝑛 = ∞,
𝑛→∞
lim 𝑛 log 𝑛 = ∞,
𝑛→∞
lim 𝑛2 = ∞,
𝑛→∞
lim 2𝑛 = ∞,
𝑛→∞
lim 𝑛! = ∞,
𝑛→∞
𝑛
lim 22 = ∞.
𝑛→∞
To talk intelligently about the running time of an algorithm as 𝑛 grows, it’s not enough
to just say “it keeps growing without bound”, which is all we are saying here. We need
to be able to say something that indicates how it grows.
6.9 B i G - O N O TAT i O N
Many sequences we encounter in computer science describe quantities that grow without
bound as 𝑛 increases. As we saw at the end of the previous section, just stating that
their limit is ∞ doesn’t really capture how they grow. In this section, we describe
6.9 B i G - O N O TAT i O N 211
standard notation used throughout computer science for the really important part of
the growth behaviour of a sequence.
Functions like 𝑛2 and log 𝑛 are simple and well-known, so that the growth of se-
quences based on them is well understood. But sequences in computer science are often
significantly more complicated. In particular, the exact running times of a program
can be very complicated indeed, in fact it is often not possible to write down an exact
formula for it. But we we still need to quantify how the costs of running the program
grow as the input gets larger and larger.
Suppose we have a sequence (𝑡𝑛 ∶ 𝑛 ∈ ℕ) whose 𝑛-th term is given by
This might conceivably be the running time of a program that has four successive stages,
with these stages taking time 100𝑛, 10𝑛2 , 2𝑛 and log 𝑛, respectively (where 𝑛 is the input
size). This expression for 𝑡𝑛 has four summands, each contributing to the growth of this
sequence. You can use a spreadsheet or program to study how each summand grows as
𝑛 increases. You’ll find that, once 𝑛 ≥ 10, the summand 2𝑛 is greater than each of the
others. As 𝑛 grows beyond that, 2𝑛 rapidly dwarfs the others. For 𝑛 = 20, this summand
is > 1, 000, 000, but the other summands are each ≤ 4, 000.
So we can say that the growth of 𝑡𝑛 as 𝑛 → ∞ is dominated by the growth of 2𝑛 .
If this summand had been 3 ⋅ 2𝑛 instead of 2𝑛 , then it would be even more dominant.
But the constant factor 3 is relatively unimportant compared with 2𝑛 . In fact, regardless
of the constant, this would still have been the dominant summand. Even if the constant
factor had been 0.01, as in
that summand 0.01 ⋅ 2𝑛 would eventually dominate the others. (Try 𝑛 ≥ 20.)
For sequences representing running times of programs, it might be infeasible to pin
down the exact running time. It might just be too complicated, or it might depend
on some information we don’t know in advance, like the exact way the various data
structures are laid out in memory. But we still need to be able to make well-founded
quantitative statements about how long programs take.
It is important to be able to make positive statements, like
This identifies something the program can be guaranteed to do, and in particular it
guarantees that a certain quantity of a key resource — in this case, time — is sufficient
for the program to do its job, even if it might actually use less. Such statements are
useful when budgeting computational resources and when estimating how much more
time the program might take as input sizes grow.
Another issue is that we don’t want to be unduly distracted by behaviour for small
𝑛. It is common for some small inputs to need some kind of special treatment, which
212 SEQUENCES & SERiES
means that the time taken to deal with them is not typical of behaviour on larger inputs.
Sequences often take some time to “settle down” into their eventual pattern of behaviour;
we have seen this already with the Fibonacci sequence (§ 6.7).
So, in summary, given a sequence of growing terms, we would like to make statements
about its growth that
We observed earlier that each of the other summands is ≤ 2𝑛 when 𝑛 ≥ 10. So, for
𝑛 ≥ 10, we have
Referring to the definition of big-O notation, we can take 𝑁 = 10 and 𝑐 = 4, because for
all 𝑛 > 10, we have 𝑡𝑛 ≤ 4 ⋅ 2𝑛 . So we can write
𝑡𝑛 = 𝑂(2𝑛 ). (6.39)
These are mathematically correct, but poor use of big-O notation because they include
constant factors. One key point about big-O notation is that it “swallows up” constant
factors, so there’s no need to state such factors explicitly. In this case, observe that
2𝑛 = 𝑂(0.001 ⋅ 2𝑛 ),
because 2𝑛 ≤ 1000 ⋅ (0.001 ⋅ 2𝑛 ), so the definition of big-O is satisfied in this case with
𝑐 = 1000. Since any constant factor can be included inside the big-O, i.e.,
𝑡 𝑛 = 𝑂(𝑏 ⋅ 2𝑛 )
for any constant 𝑏, it is simplest to just omit the constant factor (or use 𝑏 = 1) and focus
on the part that depends on 𝑛.
It is also correct to write
𝑡𝑛 = 𝑂(3𝑛 ),
since 2𝑛 ≤ 3𝑛 . This is unnecessarily loose, and in general we prefer tighter big-O bounds
since they give us stronger claims. But it is sometimes very hard to give very tight
upper bounds, so sometimes we have to be content with looser bounds (which is good
for big-O because it is so forgiving towards constant factors).
Although we can replace 2𝑛 by 3𝑛 in (6.39) and still have a true statement (albeit a
weaker one), we cannot replace 2𝑛 by, say, 1.9𝑛 (or use any other base smaller than 2).
This is because there is no constant 𝑐 such that 𝑐 ⋅ 1.9𝑛 eventually gets (and says) bigger
than 2𝑛 .
Coming up with big-O expressions for sequences is often a matter of picking the
dominant summand and dropping its constant factor. We saw this in our study of 𝑡𝑛
above, (6.39). Some other examples of this type:
1 3 1 2 1
8𝑛 + 4𝑛 + 2𝑛 +1 = 𝑂(𝑛3 ),
1/3 1/2
3𝑛 + 2𝑛 = 𝑂(𝑛1/2 ),
log 𝑛 + log(log 𝑛) + log(log(log 𝑛)) = 𝑂(log 𝑛),
2 2
100𝑛 + 2𝑛 = 𝑂(2𝑛 ),
5 + 𝑛−1 = 𝑂(1),
1 1 1 1
4
+ 3 + 2 = 𝑂 ⒧ 2 ⒭.
𝑛 𝑛 𝑛 𝑛
If the sequence term is a product of sums, then you can find the dominant term in
each sum and then multiply them, simplifying as appropriate. For example:
6.10 S U M S A N D S U M M AT i O N
For number sequences, we are often interested in the sizes and behaviour of sums of
terms, as well as just the individual terms themselves. For example, if the 𝑛-th term 𝑠𝑛
of a sequence gives the energy consumed during the 𝑛-th second of some computation,
then we may also want to study the total energy consumed so far, at each time. This can
be done using the sum, 𝑆𝑛 = 𝑠1 + 𝑠2 + ⋯ + 𝑠𝑛 , of the first 𝑛 terms of the sequence. These
cumulative sums 𝑆𝑛 give us a new sequence, with each term 𝑆𝑛 of the new sequence
giving the total energy used during all the first 𝑛 seconds of the computation.
Let 𝑀 be a set of indices for a sequence, so 𝑀 = ℕ for an infinite sequence and
𝑀 = [1, 𝑘]ℕ for a finite sequence of 𝑘 terms.
If (𝑠𝑛 )𝑛∈𝑀 is any number sequence, then its sequence of partial sums is the se-
quence (𝑆𝑛 )𝑛∈𝑀 where, for each 𝑛, the term 𝑆𝑛 is defined by
𝑆𝑛 = 𝑠 1 + ⋯ + 𝑠 𝑛 . (6.40)
Since we will be working with partial sums like this a lot now, it is time to introduce
summation notation. The expression
𝑛
𝑠𝑖
𝑖=1
is an abbreviation for the sum on the right-hand side of (6.40). Let us study it closely.
• The is a large upper-case Greek sigma. It is called the summation sign and
stands for sum.2
• At the base of the summation sign we see “𝑖 = 1”. On the left-hand side of this equa-
tion we have 𝑖, which is the index of summation or variable of summation.
summation
This will be varied as part of a specification of what things are to be added up.
On the right of “𝑖 = 1” we have 1, which gives the initial value of 𝑖, which means
the very first value we give to 𝑖 when we are specifying the things to be added
up.
• At the top of the summation sign we see 𝑛, which is the very last value we give
to 𝑖 when forming our sum.
2 This notation was introduced by Leonhard Euler in 1755. The Greek letter sigma is the Greek equivalent
of the English ‘s’, the first letter of “sum”.
6.10 S U M S A N D S U M M AT i O N 215
• But how is each of these values of 𝑖 to be used in forming the sum? This is
specified by the summand (i.e., the thing being added), which comes after the
summation sign (i.e., to its right), and in this case is 𝑠𝑖 . The summand includes
the variable of summation, in this case in its subscript.
• By substituting all numbers in the range of summation into the variable of sum-
mation, we obtain different values of the summand, and we add all these different
values up in order to obtain the whole sum.
In this case, the range of summation is {1, 2, … , 𝑛}, so the variable of summation 𝑖 in
the summand 𝑠𝑖 is given each of these values. This gives the summand values
𝑠1 , 𝑠2 , … , 𝑠𝑛 ,
sum := 0
for each 𝑖 from 1 to 𝑛:
sum := sum + 𝑠𝑖
𝑆𝑛 := sum
• a variable of summation 𝑖,
• a summand 𝑠𝑖 .
• What we have just given is really a short algorithm (or part of a program in some
programming language), so it does more than just define a sum mathematically;
it also specifies how to compute it. In particular, it specifies a particular order in
which the summands are added, and uses a name (sum) for the partial sums, and
initialises the partial sum at the start. By contrast, the summation notation does
not specify an order of addition. Even though it is natural to add the summands
in order of increasing 𝑖, there is no requirement to do it in that order, or any other
order. The summation specifies the mathematical result of doing the sum without
assuming anything about how it is computed. Since addition is associative and
commutative, the sum is the same regardless of the order of the additions.
The name used for the variable of summation is not important, as long as it is
used consistently. The variable used under the summation sign (the 𝑖 in “𝑖 = 1”) and
the variable of that same name in the summand (the subscript 𝑖 in 𝑠𝑖 ) are the same
variable, as you’d expect. If you want to change the variable name, say to 𝑗, then you
can, provided you change it everywhere: both under the summation sign, and in the
summand. So the sum
𝑛
𝑠𝑖
𝑖=1
we really have two different variables called 𝑖: the local variable of summation used in
the sum, and the 𝑖 on the very left. In this case, we have not made the meaning of the
first 𝑖 clear, but even if we had, there is possibility of confusion about what the variables
refer to, and this is to be avoided. There is no shortage of variable names: there are
26 letters of the English alphabet and 24 in the Greek alphabet, each with upper case
versions too!3 The same issue arises if we write
𝑛
⒧ 𝑠𝑖 ⒭ + 𝑖. (6.41)
𝑖=1
Here, the 𝑖 on the right is not the variable of summation. If for some reason we must
use 𝑖 on the right here, then it would be better to use a different variable of summation,
such as 𝑗, and this is easily done. This particular expression (6.41) also raises another
issue. The parentheses here make clear that the summand is only 𝑠𝑖 . But what if the
parentheses are omitted?
𝑛
𝑠𝑖 + 𝑖
𝑖=1
Is this the same as (6.41), or is it a different sum in which the summand is now 𝑠𝑖 + 𝑖?
As it is, the expression is ambiguous, so we should use parentheses to make it clear. This
was done in (6.41), imposing one interpretation, and if we wanted to impose the other
possible interpretation, then we could write
𝑛
⒧𝑠𝑖 + 𝑖⒭ .
𝑖=1
𝑎1 , 𝑎2 , 𝑎3 , … , 𝑎𝑛
𝑎, 𝑎 + 𝑑, 𝑎 + 2𝑑, … , 𝑎 + (𝑛 − 1)𝑑.
𝑆𝑛 = 𝑎 + (𝑎 + 𝑑) + ⋯ + (𝑎 + (𝑛 − 2)𝑑) + (𝑎 + (𝑛 − 1)𝑑)
𝑆𝑛 = (𝑎 + (𝑛 − 1)𝑑) + (𝑎 + (𝑛 − 2)𝑑) + ⋯ + (𝑎 + 𝑑) + 𝑎
Now we add these two equations. The sum of the left-hand sides is 2𝑆𝑛 . What is the sum
of the right-hand sides? We have arranged the terms on the right so that each column of
two terms adds to the same sum, namely 2𝑎 +(𝑛 −1)𝑑. For example, in the first column
on the right, the sum is 𝑎 + (𝑎 + (𝑛 − 1)𝑑) = 2𝑎 + (𝑛 − 1)𝑑; in the second column on the
right, the sum is (𝑎 + 𝑑) + (𝑎 + (𝑛 − 2)𝑑) = 2𝑎 + (𝑛 − 1)𝑑; and so on. We do this for each
of the 𝑛 columns on the right. Each of these 𝑛 column sums is 2𝑎 +(𝑛 −1)𝑑, so the sum
of the two right-hand sides of the two equations is 𝑛(2𝑎 + (𝑛 − 1)𝑑). Equating the sum
of the two left-hand sides and the sum of the two right-hand sides, we obtain
Therefore
(𝑛 − 1)𝑑 𝑛(𝑛 − 1)
𝑆𝑛 = 𝑛 ⒧𝑎 + ⒭ = 𝑛𝑎 + 𝑑. (6.43)
2 2
At this point, we should check the sanity of our answer.
• In the special case 𝑛 = 1, when the series just has the term 𝑎, then we should have
𝑆𝑛 = 𝑎, and that is indeed what our expression for 𝑆𝑛 in (6.43) gives.
• In the special case 𝑑 = 0, when the series is constant, with every term equal to 𝑎,
its sum should be 𝑆𝑛 = 𝑛𝑎, and our expression agrees with this too.
6.13 F i N i T E G E O M E T R i C S E R i E S 219
• We have already seen an important finite arithmetic series, namely the sum of the
first 𝑛 positive integers, in (3.5) and Theorem 19. In that case, the sum should be
𝑆𝑛 = 𝑛(𝑛 + 1)/2. If we put 𝑎 = 1 and 𝑑 = 1 in (6.43), then we obtain
You should get in the habit of doing checks like these — against very simple special
cases, and particular cases where you already know the answer — whenever you derive
a new expression for something you are trying to compute. Passing these checks doesn’t
prove that your expression is correct, but in practice it often detects errors and helps
you correct them.
In fact, the expression for the sum of the first 𝑛 positive integers (Theorem 19), while
only one particular case of a finite arithmetic series, can nonetheless be used to derive
the expression for the general case. Look again at (6.42), and consider the coefficients
of 𝑑 throughout the right-hand side. These coefficients are 0, 1, 2, …, 𝑛 − 1. So let us
rearrange the right-hand side of (6.42) and collect all the parts involving 𝑑 together.
𝑆𝑛 = 𝑎 + (𝑎 + 𝑑) + (𝑎 + 2𝑑) + ⋯ + (𝑎 + (𝑛 − 1)𝑑)
= (𝑎
+ 𝑎 + ⋯ + 𝑎 ) + (𝑑 + 2𝑑 + ⋯ + (𝑛 − 1)𝑑)
𝑛 copies
(collecting all the copies of 𝑎 together, then multiples of 𝑑)
= 𝑛𝑎 + (1 + 2 + ⋯ + (𝑛 − 1))𝑑
𝑛(𝑛 − 1)
= 𝑛𝑎 + 𝑑 (by Theorem 19).
2
What happens to the expression for 𝑆𝑛 as the number of terms grows? The two parts
of the expression, 𝑛𝑎 and (𝑛(𝑛 − 1)/2)𝑑, grow at different rates; the first is linear in 𝑛,
while the second is quadratic in 𝑛. For large 𝑛, quadratic functions of 𝑛 grow faster
in size than linear functions of 𝑛. What happens for large 𝑛 depends on whether 𝑑 is
positive or negative. When 𝑑 > 0, the sum 𝑆𝑛 is eventually positive (even if 𝑎 < 0) and
becomes larger and larger, and is unbounded. When 𝑑 < 0, the sum 𝑆𝑛 is eventually
negative (even if 𝑎 > 0) and, although it grows in size, it grows in the negative direction,
going lower and lower, with no lower bound.
A finite geometric series is a finite series obtained from a geometric sequence. If the
geometric sequence has 𝑛 terms, first term 𝑎, and common ratio 𝑟, then, as in (6.9), its
terms are
𝑎, 𝑎𝑟, 𝑎𝑟 2 , … , 𝑎𝑟 𝑛−2 , 𝑎𝑟 𝑛−1 .
220 SEQUENCES & SERiES
𝑎 + 𝑎𝑟 + 𝑎𝑟 2 + ⋯ + 𝑎𝑟 𝑛−2 + 𝑎𝑟 𝑛−1 .
𝑟𝑆𝑛 = 𝑎𝑟 + 𝑎𝑟 2 + 𝑎𝑟 3 + ⋯ + 𝑎𝑟 𝑛−1 + 𝑎𝑟 𝑛 .
The only difference with the earlier series is that we have now lost the first term, 𝑎, and
gained a new last term, 𝑎𝑟 𝑛 . Therefore
𝑟𝑆𝑛 = 𝑆𝑛 − 𝑎 + 𝑎𝑟 𝑛 .
𝑟𝑆𝑛 − 𝑆𝑛 = −𝑎 + 𝑎𝑟 𝑛 .
• If 𝑛 = 1, then the sequence has the single term 𝑎 so 𝑆1 = 𝑎, and our expression for
𝑆𝑛 agrees with this.
• If 𝑎 = 0 then all terms are 0 so 𝑆𝑛 = 0, and our expression agrees with this.
1 + 2 + 22 + 23 + ⋯ + 2𝑛 = 2𝑛+1 − 1.
The sum on the left is a finite geometric series with 𝑛 + 1 terms, first term 𝑎 = 1
and common ratio 𝑟 = 2. Our formula gives
2𝑛+1 − 1
𝑆𝑛+1 = 1 ⋅ ⒧ ⒭ = 2𝑛+1 − 1,
2−1
which is correct.
6.14 i N F i N i T E G E O M E T R i C S E R i E S 221
Consider our expression in (6.44) for the sum 𝑆𝑛 of a finite geometric series of 𝑛 terms.
What happens to 𝑆𝑛 as 𝑛 → ∞?
If 𝑟 > 1 then 𝑟 𝑛 > 0 and 𝑟 𝑛 grows without bound, so the quotient inside the paren-
theses in (6.44) is also positive and grows without bound. The sign of 𝑆𝑛 is the same
as the sign of 𝑎. So, if 𝑎 > 0 then 𝑆𝑛 is positive and increases without bound, while if
𝑎 < 0 then 𝑆𝑛 is negative and decreases without bound (giving larger and larger negative
numbers with no lower bound).
If 𝑟 < −1 in (6.44) then 𝑟 𝑛 alternates in sign as 𝑛 increases, while increasing in size.
The same happens to the numerator of the quotient in (6.44). But the denominator 𝑟 −1
is always negative when 𝑟 < −1. For even 𝑛, we have 𝑟 𝑛 > 1 and 𝑟 𝑛 − 1 > 0, with 𝑟 𝑛
increasing without upper bound as 𝑛 increases, while for odd 𝑛, we have 𝑟 𝑛 < −1 and
𝑟 𝑛 − 1 < −2, with 𝑟 𝑛 decreasing without lower bound as 𝑛 increases. The sign of the
whole expression for each 𝑛 will also depend on the sign of 𝑎, but the general character
of 𝑆𝑛 as 𝑛 increases is clear: alternating between positive and negative numbers, with
each getting larger in size without bound, so 𝑆𝑛 has no upper or lower bound.
We have already seen that, if 𝑟 = 1, then 𝑆𝑛 = 𝑛𝑎. As 𝑛 increases, this increases
without upper bound if 𝑎 > 0 and decreases without lower bound if 𝑎 < 0. i If 𝑟 = −1,
we have 𝑆𝑛 = 𝑎 − 𝑎 + 𝑎 − 𝑎 + ⋯ + (−1)𝑛 𝑎, which is either 𝑎 or 0 according as 𝑛 is odd or
even, respectively. Putting 𝑟 = −1 in (6.44) gives
(−1)𝑛 − 1
𝑆𝑛 = 𝑎 ⒧ ⒭.
−2
When 𝑛 is even, this is 0, and when 𝑛 is odd, it is 𝑎, all as expected. So in this case,
the sequence has no limit, although it is now bounded both above and below.
It remains to consider what happens if −1 < 𝑟 < 1.
We saw in Theorem 25 that, in this case, 𝑟 𝑛 → 0 as 𝑛 → ∞. So, in our expression for
𝑆𝑛 in (6.44), the 𝑟 𝑛 in the numerator vanishes as 𝑛 → ∞. This means that, if −1 < 𝑟 < 1,
then
𝑎
𝑆∞ = lim 𝑆𝑛 = , (6.45)
𝑛→∞ 1−𝑟
where 𝑆∞ represents the sum of the infinite geometric series.
This expression is of fundamental importance and appears throughout computer
science, mathematics, other sciences, engineering, economics, finance, and indeed every
quantitative discipline.
• There are many sequences that diminish to 0 and whose partial sums are bounded:
no matter how large 𝑛 is, the partial sum stays below some fixed upper bound, so
in fact the sum of all terms in the sequence is finite. For example, the sequences
(1/𝑛2 ∶ 𝑛 ∈ ℕ) and (1/2𝑛 ∶ 𝑛 ∈ ℕ) both go to 0 as 𝑛 → ∞, and both have finite sums
too. The finite-sum claim is not obvious for the former sequence, but we know the
latter sequence has a finite sum from § 6.14.
• On the other hand, if a sequence of positive numbers does not diminish to 0, then
it can be shown that its partial sums are unbounded. For example, the terms in
the sequence (1 + 𝑛−1 ∶ 𝑛 ∈ ℕ) tend to 1 as 𝑛 → ∞, and their partial sums grow
without bound, since the 𝑛-th partial sum is ≥ 𝑛.
But the sequence of reciprocals (1/𝑛 ∶ 𝑛 ∈ ℕ) is not in either of these camps. Its terms
tend to 0, yet it has unbounded partial sums. In fact, it is one of the more rapidly
diminishing positive sequences with this property. (There are some sequences with this
property that diminish even more quickly than (1/𝑛 ∶ 𝑛 ∈ ℕ), but they do not diminish
much more quickly. Can you find some?)
We saw in Exercise 3.11 that the harmonic numbers behave very much like log 𝑛.
We proved that 𝐻𝑛 ≥ log𝑒 (𝑛 + 1), and mentioned in the solutions (as a suggested further
exercise) that 𝐻𝑛 ≤ (log𝑒 𝑛) + 1. It is known (although beyond the scope of these Course
Notes to prove) that the difference between 𝐻𝑛 and log𝑒 𝑛 converges to a constant:
𝐻𝑛 ≥ (log𝑒 𝑛) + 𝛾,
6.16 E X E R C i S E S 223
giving a stronger lower bound than the one from Exercise 3.11. For an upper bound, it
is known that
1
𝐻𝑛 ≤ (log𝑒 𝑛) + 𝛾 + , (6.47)
2𝑛
although we do not prove this. We can see that the difference between these two bounds
diminishes to 0 as 𝑛 → ∞. For very large 𝑛, we have an approximation:
𝐻𝑛 ≈ log𝑒 𝑛 + 𝛾. (6.48)
6.16 EXERCiSES
(d) the sequence whose 𝑛-th term is the sum of the first 𝑛 positive integers;
(e) the sequence whose 𝑛-th term is the sum of the reciprocals of the first 𝑛 positive
integers.
𝑎1 = 2, 𝑎𝑛 = 2𝑛 𝑎𝑛−1
𝑞1 = 3, 𝑞2 = 5, 𝑞𝑛 = 3𝑞𝑛−1 − 2𝑞𝑛−2 .
𝑔1 = 0, 𝑔𝑛 = 𝑔𝑛−1 + log 𝑛.
𝑔𝑛 ≤ 𝑛 log 𝑛 − 𝑛 + 𝐻𝑛 ,
1 1
where 𝐻𝑛 = 1 + + ⋯ + (see Exercise 3.11).
2 𝑛
To do this, it will help to use the following inequality, which holds for all 𝑥 ≥ −1:
log(1 + 𝑥) ≤ 𝑥
The following technique can be helpful in relating log 𝑛 and log(𝑛 − 1):
𝑛−1 1 1
log(𝑛 − 1) = log ⒧𝑛 ⋅ ⒭ = log ⒧𝑛⒧1 − ⒭⒭ = log 𝑛 + log ⒧1 − ⒭ .
𝑛 𝑛 𝑛
6. Use the explore-conjecture-prove method to develop the best upper bound you
can for the 𝑛-th term 𝑟𝑛 of the sequence defined by the recurrence relation
𝑟1 = 1, 𝑟𝑛 = 3𝑟𝑛−1 − 1,
Adapt the approach of § 6.7.4 in order to develop a formula for 𝑙𝑛 and then prove by
induction that it works.
1
𝑡1 = 1, 𝑡𝑛 = 3 − 𝑡𝑛−1 .
2
(a) Consider the following theorem, which is correct, and its “proof” by induction,
which is incorrect.
Find the errors in this incorrect proof.
The lines of the proof are numbered, so you can refer to them.
1. For each 𝑛 ∈ ℕ, let 𝑆(𝑛) be the statement that 𝑡𝑛 ≤ 5/2. We must prove that, for
every 𝑛 ∈ ℕ, the statement 𝑆(𝑛) is true.
3. Inductive step:
4. Assume that for some 𝑘 the statement 𝑆(𝑘) holds. (This is the Inductive Hypoth-
esis.)
11.
“□”
226 SEQUENCES & SERiES
5
1 ≤ 𝑡𝑛 ≤ .
2
So this is a case where it’s actually easier to prove a stronger statement.
(ii) Prove firstly that 𝑆(𝑛) holds for all even 𝑛, and then that it holds for all odd 𝑛.
1
𝜑1 = 1, 𝜑𝑛 = 1 + .
𝜑𝑛−1
(a) Explore this sequence. Find the first 20 terms, using a spreadsheet or program.
(c) Roughly speaking, if 𝜑𝑛 ≈ 𝜑𝑛−1 , then we would expect that they are close to the
limit, and so the limit 𝜑 would be expected to satisfy
1
𝜑 = 1+ .
𝜑
Solve this equation, in order to derive a conjectured exact value for the limit.
(d) Suppose you want to prove that all sufficiently large 𝑛, the 𝑛-th term is within 0.001
of this limit. How large is “sufficiently large”? In other words, determine 𝑁 so that,
for all 𝑛 ≥ 𝑁 ,
|𝜑𝑛 − 𝜑| < 0.001.
It’s fine to do this computationally.
• When studying how a quantity changes under a small addition, it can some-
times be useful to convert that additive change to a multiplicative change. For
example,
0.001
𝑤 + 0.001 = 𝑤 ⒧1 + ⒭.
𝑤
1 3
< 1 + 𝑥.
1−𝑥 2
10. The Limit Game is a simple two-player game that can be played on any sequence.
We call the two players Lim and Una. Let (𝑎𝑛 )∞ 𝑛=1 be a sequence. Roughly speaking,
Lim’s aim is to approximate a limit using a term from the sequence, while Una tries to
ensure that the approximation is not as good as it should be. Lim and Una play the
game as follows, with Lim having the first turn, and the two players then taking turns
as follows.
1. Lim’s first turn. Lim chooses a number 𝑎, which is intended to specify a limit.
2. Una’s first turn. Una chooses a real number 𝜀 > 0, which is intended to specify
how closely 𝑎 must be approximated by terms in the sequence.
228 SEQUENCES & SERiES
3. Lim’s second turn. Lim chooses a positive integer 𝑁 , which is intended to specify
some position in the sequence.
For example, suppose the sequence is ((−1)𝑛+1 /𝑛 ∶ 𝑛 ∈ ℕ). Here are one possible play of
the Limit Game for this sequence.
1. Lim chooses 𝑎 = 12 .
2. Una chooses 𝜀 = 14 .
3. Lim chooses 𝑁 = 3.
1 1 3 1
|𝑎𝑛 − 𝑎| = |𝑎5 − 𝑎| = − = ≥ = 𝜀.
5 2 10 4
(a) In this specific play of the game, could Una have possibly lost on her last move? In
other words, is there a different choice she could have made at that stage, which
would have lost her the game?
(b) Now let’s go back one move. So we just assume that Lim’s and Una’s first turns are
as above. Now consider Lim’s second turn. Could Lim have possibly chosen a value
of 𝑁 which would have guaranteed that he would win the game, no matter what
Una did next?
(c) Now let’s go back further. Assume that Lim has just made his first move, as above.
Consider Una’s options. Are all her options equally good? Is she certain to win, no
matter what 𝜀 she chooses? What advice would you give her about choosing 𝜀?
(d) Now let’s go back to the very first move. What should Lim choose, and why?
(e) Play the game, with a classmate or someone else you know. You can pick whatever
sequences you like.
(f) In general, what property does a sequence need to have in order for Lim to have a
winning strategy in the Limit Game?
6.16 E X E R C i S E S 229
(g) Propose a sequence for which Una has a winning strategy in the Limit Game, and
describe her winning strategy.
(h) In general, what property does a sequence need to have in order for Una to have a
winning strategy in the Limit Game?
11. The Big-O Game is another, shorter two-player game that can be played on
any sequence. For this game, the two players are Oh-Yes and Oh-No. Let (𝑎𝑛 )∞ 𝑛=1 be a
+
sequence and let 𝑓 ∶ ℕ → ℝ0 be a function. The aim of Oh-Yes is to help establish that
𝑎𝑛 = 𝑂(𝑓(𝑛)),
while the aim of Oh-No is to frustrate Oh-Yes. They play the game as follows, with
Oh-Yes having the first turn, and each player having one turn as follows.
For example, suppose the sequence is (𝑛2 ∶ 𝑛 ∈ ℕ) and the function is 𝑓(𝑛) = 𝑛1.5 . Here
is one possible play of the Big-O Game for this sequence.
𝑎𝑛 = 𝑎1 000 000 = 1 000 0002 = (106 )2 = 1012 > 1011 = 102 (106 )1.5 = 100(1 000 000)1.5 = 𝑐 𝑓(𝑛).
(a) In this play of the game, could Oh-No have possibly lost on their move?
(b) Now let’s go back to the first move. What could Oh-Yes have chosen in order to win
the game, and why?
(c) Play the game, with a classmate or someone else you know. You can pick whatever
sequences and functions you like.
230 SEQUENCES & SERiES
(d) In general, what relationship must hold between the sequence and the function in
order for Oh-Yes to have a winning strategy in the Big-Oh Game?
(e) Propose a sequence and function for which Oh-No has a winning strategy in the
Big-O Game, and describe their winning strategy.
(f) In general, how do the sequence and function have to be related in order for Oh-No
to have a winning strategy in the Big-O Game?
12. Give the simplest big-O expression you can for each of the following.
(e) 2𝑛 + 2𝑛−1 + ⋯ + 2 + 1
(f) (𝑛!)1/𝑛
(g) 𝑛1/𝑛
13.
(a) How can the number
999 … 99
𝑛 digits
be interpreted as the sum of a finite geometric series? For that series, what are 𝑎, 𝑟 and
𝑛? What does the formula for 𝑆𝑛 give in this case? Does this make sense?
(b) What about the number
0.999999 … …
14.
(a) Using (6.47), show that
𝐻𝑛 = 𝑂(log 𝑛).
(b) Hence prove that
𝑛 𝑛 𝑛 𝑛 𝑛
+ + +⋯+ + = 𝑂(𝑛 log 𝑛).
1 2 3 𝑛−1 𝑛
6.16 E X E R C i S E S 231
15. Recall the sequence (𝑔𝑛 ∶ 𝑛 ∈ ℕ) from Exercise 5. Its 𝑛-th term has the closed-form
formula
𝑛
𝑔𝑛 = log 𝑖.
𝑖=1
𝑔𝑛 ≤ 𝑛 log 𝑛 − 𝑛 + log 𝑛 + 𝑐
Numbers were arguably the first abstract objects on which people did computations.
They are so central to computation that, for centuries, the term “computation” was
assumed to refer to numerical computation. Although we now compute with many other
abstract objects too, numbers remain fundamental. They appear in most algorithms and
data structures in one way or another. They are essential to the design and analysis
of algorithms and data structures, even those that don’t themselves contain numbers.
Computational problems on numbers span the full range of difficulty, from very easy
to very difficult or even totally intractable. For some hard problems, their difficulty is
actually an asset, enabling them to be used to help keep information secure.
In this chapter we study the basics of number theory, both in a general way and as
a specific tool used in modern cryptography.
7.1𝛼 M U LT i P L E S A N D D i V i S O R S
𝑑 ∣𝑛
𝑛 is even ⟺ 2 ∣ 𝑛.
You would have seen these ideas previously, except perhaps for the notation 𝑑ℤ and
𝑑 ∣ 𝑛, and would have seen it illustrated using a picture of the number line, with the
length representing 𝑛 being divided into 𝑞 equal segments each of length 𝑑:
233
234 N U M B E R T H E O RY
0 𝑑 2𝑑 (𝑞 − 1)𝑑 𝑞𝑑 = 𝑛
𝑑 ∣𝑚 ∧ 𝑑 ∣𝑛 ⟹ 𝑑 ∣ (𝑚 + 𝑛). (7.1)
𝑚 𝑛
0 𝑑 2𝑑 𝑚 𝑚+𝑑 𝑚+𝑛
𝑑 ∣𝑚 ∧ 𝑑 ∣𝑛 ⟹ 𝑑 ∣ (𝑚 − 𝑛). (7.2)
For some positive integers 𝑑, there are simple tests you can use to determine if a
number 𝑛 is a multiple of 𝑑, using the decimal representation of 𝑛. For example
You don’t even need to look at the rest of 𝑛; you only need to look at its rightmost (i.e.,
least significant) digit. Similarly,
There are other divisibility tests that require looking at all digits but which still save
time over actually doing the division. Some of these are considered in Exercise 10.
Divisibility tests along these lines are not restricted to the decimal number system.
Other divisibility tests for binary numbers are possible (Exercise 10).
2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, …
Theorem 27.
27 Every positive integer except 1 is a product of primes.
Proof. We prove, by induction on 𝑛, that for every positive integer 𝑛 ≥ 2 can be ex-
pressed as a product of primes.
Inductive basis:
If 𝑛 = 2 then 𝑛 is itself prime so it is a product of one prime, namely 2. So the claim
holds in this case.
Inductive Step:
Let 𝑛 > 2.
Assume that every 𝑘 ≤ 𝑛 is a product of primes. (This is our Inductive Hypothesis.)
Now consider 𝑛 + 1. We consider two cases, according to whether 𝑛 + 1 is prime or
composite.
Case 1: If 𝑛 + 1 is prime, then it is already a product of primes (with just one prime
in the product, namely 𝑛 + 1 itself).
Case 2: If 𝑛 + 1 is composite, then it has a factor 𝑎 which is neither 1 nor 𝑛 + 1.
Therefore 𝑏 = (𝑛 + 1)/𝑎 is also a positive integer, so it is also an integer factor of 𝑛 + 1,
so 𝑛 +1 = 𝑎𝑏 is the product of these two positive integers. Furthermore, 𝑏 is also neither
1 nor 𝑛 + 1 (because if 𝑏 = 1 then 𝑎 = 𝑛 + 1, and if 𝑏 = 𝑛 + 1 then 𝑎 = 1, so either way we
get a contradiction with what we already know about 𝑎).
236 N U M B E R T H E O RY
So far, we have not ruled out the possibility that some positive integers can be
expressed as a product of primes in more than one way. But, in fact, that is impossible.
Every positive integer can be expressed uniquely as a product of primes. We prove this
later, in Theorem 33 in § 7.9.
The theory of prime numbers has been developed over thousands of years. It has
become one of the deepest and richest fields of mathematics. For most of its history,
it was regarded as being among the “purest” areas of pure mathematics, fascinating
and beautiful but very far removed from practical applications. The great Cambridge
mathematician G.H. Hardy, a major contributor to the theory of prime numbers in the
early 20th century, delighted in how “useless” his work was.1 Yet today prime numbers
are widely used in computer science and are central to information security. Whenever
you do a secure electronic transaction, its security is probably based on the theory of
prime numbers.
7.3 R E M A i N D E R S A N D T H E M O D O P E R AT i O N
If 𝑛 is a multiple of 𝑑, then we can divide 𝑛 by 𝑑 exactly, with nothing left over; the
remainder is 0. In that case, the quotient
𝑛
𝑞=
𝑑
is an integer, and we have
𝑛 = 𝑞𝑑.
If, however, 𝑛 is not a multiple of 𝑑, then dividing 𝑛 by 𝑑 does not give an integer.
In that case, there is a nonzero remainder, which is the part of 𝑛 that is left over after
subtracting from 𝑛 the highest multiple of 𝑑 we can, without going over.
Formally, the remainder after division of 𝑛 by 𝑑 is the positive integer 𝑟 that
satisfies
𝑛 = 𝑞𝑑 + 𝑟, 0 ≤ 𝑟 ≤ 𝑑 − 1. (7.3)
𝑞𝑑 𝑟
𝑛
0 𝑑 2𝑑 (𝑞 − 1)𝑑 𝑞𝑑 (𝑞 + 1)𝑑
𝑛 mod 𝑑
and we read it as “𝑛 mod 𝑑” or “𝑛 modulo 𝑑”. Note that, just like other arithmetic
operations +, −, ×, /, this operation uses infix notation, with “mod” placed in between
its arguments.
We have, for example:
6 mod 4 = 2
7 mod 4 = 3
6 mod 2 = 0
7 mod 2 = 1
6 mod 5 = 1
144 mod 100 = 44
174 mod 48 = 30
−6 mod 5 = 4,
since the largest multiple of 5 which is ≤ −6 is −2 × 5 = −10 (with 𝑞 = −2), which gives
𝑟 = 4, since −10 + 4 = −6.
Although we allow negative 𝑛, we always require 𝑑 to be positive.
Programming languages have their own conventions for operations like mod. In
many languages, % is used for applying mod to positive integers, so that, for example,
6 % 4 returns 2. But the meaning of % when one or both arguments is negative can
vary across different languages or even different implementations of the same language.
So it cannot be assumed to always mean the same thing as the mathematical remainder
operation “mod”.
238 N U M B E R T H E O RY
If we calculate 𝑛/𝑑 exactly, it has integer part 𝑞 and fractional part 𝑟/𝑑 ∈ [0, 1):
𝑛 𝑟 𝑛 mod 𝑑
= 𝑞+ = 𝑞+ .
𝑑 𝑑 𝑑
7.4 PA R i T Y
Remainders modulo 2 are particularly simple and also particularly important. For any
𝑥 ∈ ℤ, its remainder mod 2 indicates whether 𝑥 is even or odd:
1, if 𝑥 is odd;
𝑥 mod 2 = (7.4)
0, if 𝑥 is even.
In fact, we can regard it as the indicator function of the set of odd numbers.
The remainder 𝑥 mod 2 is called the parity of 𝑥. The term “parity” is also used
more descriptively, to mean the identification of whether an integer is even or odd.
The parity of the sum, difference, product or quotient of two integers is completely
determined by the parity of those two integers. The precise behaviour of arithmetic
operations with respect to parity is given by the following tables:
+ even odd × even odd
even even odd even even even
odd odd even odd even odd
For example, the sum (or difference) of two odd numbers is even, while their product
is odd. The table for subtraction has the same entries as for addition, while a table for
division would look the same as that for multiplication with the caveat that division by
zero is undefined.
Let us rewrite those tables with parities represented by 0 or 1, as in (7.4).
+ 0 1 × 0 1
0 0 1 0 0 0
1 1 0 1 0 1
Addition here is the same as ordinary addition except that 1 + 1 is not 2; instead, it is
2 mod 2, i.e., 0. Multiplication is exactly the same as ordinary integer multiplication
restricted to {0, 1}.
Let us lay these tables out differently, with headings along the top and each quan-
tity having its own column, to help make links with some concepts we have studied
previously.
𝑥 𝑦 (𝑥 + 𝑦) mod 2 𝑥 𝑦 𝑥𝑦 mod 2
0 0 0 0 0 0
0 1 1 0 1 0
1 0 1 1 0 0
1 1 0 1 1 1
7.5 T H E G R E AT E S T C O M M O N D i V i S O R 239
Do either of these tables look familiar? See if you can find tables in earlier chapters that
have the same patterns of entries, even if the entries themselves are different.
It is worth developing the skill of recognising when two structures are the same
except that things have been renamed. This skill is used throughout computer science
and mathematics, and has been formalised as the concept of isomorphism. We will
define and discuss isomorphism in one specific context later.
Now compare the left table above (for mod 2 addition) with the table on p. 129
(§ 4.11) for exclusive-or, ⊕. Compare the right table (for mod 2 multiplication) with
the table on p. 124 (§ 4.6) for conjunction, ∧.
We see, then, that the logical operations ⊕ and ∧, using truth values False and True,
may be treated as just addition and multiplication, respectively, modulo 2, using 0 and
1 to represent False and True respectively. This resolves a question we raised on p. 122
in § 4.1𝛼 . It enables us to relate logic to arithmetic, indeed it underpins the use of logic
to perform arithmetic in computers.
7.5 T H E G R E AT E S T C O M M O N D i V i S O R
The greatest common divisor (gcd) of two integers 𝑎 and 𝑏, written gcd(𝑎, 𝑏), is the
greatest integer 𝑑 such that 𝑑 ∣ 𝑎 and 𝑑 ∣ 𝑏.2 This is well defined provided 𝑎 and 𝑏 are
not both 0.3
If 𝑏 = 1 then clearly we have gcd(𝑎, 1) = 1.
More generally, if 𝑏 ∣ 𝑎 then gcd(𝑎, 𝑏) = 𝑏.
If 𝑝 is prime, then gcd(𝑎, 𝑝) = 1 unless 𝑎 is a multiple of 𝑝, say 𝑎 = 𝑘𝑝, in which case
we have gcd(𝑘𝑝, 𝑝) = 𝑝.
What if neither 𝑎 nor 𝑏 is prime and neither is a multiple of the other?
Consider 4 and 6. Neither is a multiple of the other, and each is a multiple of 2.
Also, no number > 2 is a divisor of either of them. So gcd(4, 6) = gcd(6, 4) = 2.
Now can we work out the gcd, in general, in a systematic way?
The key observation is the following.
Theorem 28.
28 For any integers 𝑎, 𝑏 that are not both 0,
Proof. For any integers 𝑥, 𝑦, let CD(𝑥, 𝑦) be the set of their common divisors:
2 This is also called the highest common factor (hcf),(hcf) but that term is less common these days. The gcd
of 𝑎 and 𝑏 is often just denoted by (𝑎, 𝑏), but this seems to be unnecessary overloading of standard ordered
pair notation, so we use gcd(𝑎, 𝑏).
3 If 𝑎 = 𝑏 = 0, then every integer is a divisor of 𝑎 and 𝑏, so they have no greatest common divisor.
240 N U M B E R T H E O RY
We show that CD(𝑎, 𝑏) = CD(𝑏, 𝑎−𝑏), from which the Theorem will follow, since identical
sets of integers have identical greatest elements (provided they do indeed have a greatest
element, which they do in this case).
First, we show CD(𝑎, 𝑏) ⊆ CD(𝑏, 𝑎 − 𝑏). Suppose 𝑑 ∈ CD(𝑎, 𝑏). Then 𝑑 ∣ 𝑎 and 𝑑 ∣ 𝑏.
By (7.2), this implies that 𝑑 ∣ (𝑎−𝑏). So 𝑑 ∣ 𝑏 and 𝑑 ∣ (𝑎−𝑏). So 𝑑 ∈ CD(𝑏, 𝑎−𝑏). So every
element of CD(𝑎, 𝑏) is also an element of CD(𝑏, 𝑎 −𝑏). Therefore CD(𝑎, 𝑏) ⊆ CD(𝑏, 𝑎 −𝑏).
Second, we show CD(𝑏, 𝑎 − 𝑏) ⊆ CD(𝑎, 𝑏). Suppose 𝑑 ∈ CD(𝑏, 𝑎 − 𝑏). Then 𝑑 ∣ 𝑎 − 𝑏
and 𝑑 ∣ 𝑏. By (7.1), this implies that 𝑑 ∣ ((𝑎 − 𝑏) + 𝑏). But (𝑎 − 𝑏) + 𝑏 = 𝑎. So 𝑑 ∣ 𝑎 and
𝑑 ∣ 𝑏. So 𝑑 ∈ CD(𝑎, 𝑏). So every element of CD(𝑏, 𝑎 − 𝑏) is also an element of CD(𝑎, 𝑏).
Therefore CD(𝑏, 𝑎 − 𝑏) ⊆ CD(𝑎, 𝑏).
We established the subset relation in both directions. So, in fact,
The two sets are nonempty, since every integer has 1 as a divisor, so 1 belongs to
each set. They also have an upper bound, since divisors are bounded above by the sizes
of the numbers they are divisors of, i.e., 𝑑 ∣ 𝑛 ⟹ 𝑑 ≤ |𝑛|. The two sets therefore each
have a greatest element.4
Since these two sets are identical, and since they each do have a greatest element, we
deduce that their greatest elements are identical. Therefore gcd(𝑎, 𝑏) = gcd(𝑏, 𝑎 −𝑏).
It does not matter which way round we write 𝑎 and 𝑏 in gcd(𝑎, 𝑏), because
We will mostly adopt the habit of putting the larger number first, because that makes
the description of some of our algorithms neater. But the value of the gcd does not care
about the order of its arguments.
We illustrate the use of Theorem 28 in computing the gcd.
𝑑
= gcd(𝑚, 𝑛)
𝑚 𝑛
0 4 12 20
If 𝑎, 𝑏 ∈ ℕ and 𝑎 > 𝑏 then Theorem 28 expresses gcd(𝑎, 𝑏) in terms of the gcd of a smaller
pair of positive integers. This can be used in a recursive algorithm for computing gcd.
We also need a base case, but this can be provided by some of the special cases we gave
just after defining the gcd on p. 239. So we have the following algorithm.
1. Input: 𝑎, 𝑏 ∈ ℕ
new 𝑎 ∶= old 𝑏,
new 𝑏 ∶= old 𝑎.
3. If 𝑏 ∣ 𝑎 then Output 𝑏.
Trying it on gcd(12, 20) should give the calculation at the end of the previous section.
Then try a larger pair. Here is the calculation for gcd(48, 174) (glossing over some of the
swaps).
Consider the first few steps in this process. We repeatedly subtracted 48 from 174 until
the amount left over was < 48. But this is precisely the process of dividing by 48 a whole
number of times and determining the remainder. In this case, after three subtractions
of 48, we found that the remainder is 30. We can speed the algorithm up by doing this
remainder calculation directly instead of by repeated subtraction. We now revise the
algorithm to do this, with the modified part shown in blue.
1. Input: 𝑎, 𝑏 ∈ ℕ
new 𝑎 ∶= old 𝑏,
new 𝑏 ∶= old 𝑎.
3. If 𝑏 ∣ 𝑎 then Output 𝑏.
4. 𝑞 ∶= ⌊𝑎/𝑏⌋.
𝑎 − 𝑞𝑏 = 𝑎 mod 𝑏,
and 𝑞 plays no role except to work out 𝑎 mod 𝑏. So, in the last step, we could write the
output as gcd(𝑏, 𝑎 mod 𝑏). But we have made the role of 𝑞 explicit because it will play
a role later on, in an extension of this algorithm.
Here is the new computation for (48, 174).
The Euclidean Algorithm is over two thousand years old and has been described as
one of the oldest written mathematical algorithms.
When you meet a new algorithm, you should think about how many steps it takes, as
a function of the input. Play around with some examples and see if you can conjecture
some kind of bound on the number of steps required by the Euclidean Algorithm.
7.7 T H E G C D A N D i N T E G E R L i N E A R C O M B i N AT i O N S
So the set of common divisors of 𝑚 and 𝑛 is the same as the set of common divisors
of all combinations 𝑥𝑚 + 𝑦𝑛 with 𝑥, 𝑦 ∈ ℤ:
Since these two sets are identical, so are their largest elements. The largest element of
the left set is just gcd(𝑚, 𝑛), while the largest element of the right set is the gcd of all
integers 𝑥𝑚 + 𝑦𝑛 with 𝑥, 𝑦 ∈ ℤ. So these two gcds — one being just the gcd of two
numbers, the other being the gcd of the infinite set 𝑚ℤ + 𝑛ℤ — are equal:
Let us see how this pans out for an example. Consider the case 𝑚 = 12, 𝑛 = 20. At
the end of § 7.5, on p. 240, we worked out that gcd(12, 20) = 4 and pictured it on the
number line. In this case, what does the set of all integer linear combinations of 12 and
20 look like? For a start, we have
1 ⋅ 12 + 0 ⋅ 20 = 12,
0 ⋅ 12 + 1 ⋅ 20 = 20.
1 ⋅ 12 + 1 ⋅ 20 = 32,
−1 ⋅ 12 + 1 ⋅ 20 = 8,
2 ⋅ 12 − 1 ⋅ 20 = 4,
−2 ⋅ 12 + 1 ⋅ 20 = −4,
−5 ⋅ 12 + 3 ⋅ 20 = 0,
4 ⋅ 12 − 2 ⋅ 20 = 8,
−3 ⋅ 12 + 1 ⋅ 20 = −16.
In fact, in this case, the set 12ℤ + 20ℤ of all integer linear combinations of 12 and 20 is
the set of all multiples of 4 (including both positive and negative multiples, and zero).
Symbolically, we can write
12ℤ + 20ℤ = 4ℤ.
We illustrate this on the number line.
−5 ⋅ 12 + 3 ⋅ 20 2 ⋅ 12−1 ⋅ 20 −1 ⋅ 12 + 1 ⋅ 20 1 ⋅ 12 + 0 ⋅ 20 3 ⋅ 12−1 ⋅ 20 0 ⋅ 12 + 1 ⋅ 20
0 4 8 12 16 20
7.7 T H E G C D A N D i N T E G E R L i N E A R C O M B i N AT i O N S 245
−1 ⋅ 12 + 1 ⋅ 20 = 4 ⋅ 12 − 2 ⋅ 20 = 8.
−5 ⋅ 12 + 3 ⋅ 20 = 0. (7.5)
We can take any equation expressing a multiple of 4 (like 8) as an integer linear combi-
nation of 12 and 20, such as
−1 ⋅ 12 + 1 ⋅ 20 = 8, (7.6)
and then add our equation for 0, namely (7.5), to get another integer linear combinations
for 8:
equation (7.6): −1 ⋅ 12 + 1 ⋅ 20 = 8
plus equation (7.5): −5 ⋅ 12 + 3 ⋅ 20 = 0
equals: −6 ⋅ 12 + 4 ⋅ 20 = 8
We can do this as many times as we like, to generate an infinite family of equations
expressing 8 as an integer linear combination of 12 and 20:
−1 ⋅ 12 + 1 ⋅ 20 = 8,
4 ⋅ 12 − 2 ⋅ 20 = 8,
9 ⋅ 12 − 5 ⋅ 20 = 8,
14 ⋅ 12 − 8 ⋅ 20 = 8,
⋮ ⋮ ⋮
−6 ⋅ 12 + 4 ⋅ 20 = 8,
−11 ⋅ 12 + 7 ⋅ 20 = 8,
⋮ ⋮ ⋮
The phenomena we have observed here are general, and not just specific to 12 and
20. Let 𝑚, 𝑛 be any positive integers, and consider the set 𝑚ℤ + 𝑛ℤ of all integers of
the form 𝑥𝑚 + 𝑦𝑛 for 𝑥, 𝑦 ∈ ℤ.
Firstly, 0 ∈ 𝑚ℤ + 𝑛ℤ. This can be seen by putting 𝑥 = 𝑛 and 𝑦 = −𝑚 so that
𝑥𝑚 + 𝑦𝑛 = 𝑛𝑚 − 𝑚𝑛 = 0.
246 N U M B E R T H E O RY
The coefficients 𝑥1 − 𝑥2 and 𝑦1 − 𝑦2 are both integers too, so the expression on the right,
(𝑥1 − 𝑥2 )𝑚 + (𝑦1 − 𝑦2 )𝑛, is of the same form, and is also a member of 𝑚ℤ + 𝑛ℤ.
Let Δ be the smallest positive difference between members of the set:
Theorem 29.
29
𝑚ℤ + 𝑛ℤ = Δℤ.
Proof. This is an assertion of equality of two sets. So we divide the proof into two cases:
first, we show that 𝑚ℤ + 𝑛ℤ ⊇ Δℤ, and then we show that 𝑚ℤ + 𝑛ℤ ⊆ Δℤ.
(⊇)
To prove this superset relationship, we show that the right set, Δℤ, is a subset of
the left set, 𝑚ℤ + 𝑛ℤ. To do this, we take a general member of Δℤ and show that it
also belongs to 𝑚ℤ + 𝑛ℤ.
Let 𝑘Δ be any multiple of Δ, where 𝑘 ∈ ℤ.
Since Δ is an integer linear combination of 𝑚 and 𝑛 (as explained above), there exist
𝑥0 , 𝑦0 ∈ ℤ such that
Δ = 𝑥0 𝑚 + 𝑦0 𝑛.
Multiplying each side by 𝑘, we have
𝑘Δ = 𝑘𝑥0 𝑚 + 𝑘𝑦0 𝑛.
Since 𝑘𝑥0 and 𝑘𝑦0 are integers, this establishes that 𝑘Δ, too, is an integer linear combi-
nation of 𝑚 and 𝑛.
(⊆)
Now we take a general member of 𝑚ℤ + 𝑛ℤ and show that it also belongs to Δℤ.
Let 𝑥𝑚 + 𝑦𝑛 be an integer linear combination of 𝑚 and 𝑛.
Consider what happens when we divide 𝑥𝑚+𝑦𝑛 by Δ. The quotient 𝑞 and remainder
𝑟 are
𝑥𝑚 + 𝑦𝑛
𝑞 = ,
Δ
𝑟 = (𝑥𝑚 + 𝑦𝑛) mod Δ.
7.8 T H E E X T E N D E D E U C L i D E A N A L G O R i T H M 247
𝑞Δ < 𝑥𝑚 + 𝑦𝑛 < 𝑞Δ + Δ.
Then 𝑞Δ and 𝑥𝑚 + 𝑦𝑛 differ by < Δ, yet they both belong to 𝑚ℤ + 𝑛ℤ. (We saw
earlier that Δ ∈ 𝑚ℤ + 𝑛ℤ and that therefore any integer multiple of it is also in
𝑚ℤ+𝑛ℤ.) So we have a contradiction, because Δ is the smallest possible difference
between two members of 𝑚ℤ + 𝑛ℤ. So this case, where 𝑥𝑚 + 𝑦𝑛 is not a multiple
of Δ, cannot arise.
Theorem 30.
30
As we will soon see, it is very important to be able to determine, for given 𝑚 and 𝑛,
two integers 𝑥, 𝑦 such that
gcd(𝑚, 𝑛) = 𝑥𝑚 + 𝑦𝑛. (7.7)
To help see how to do this, consider that, at the outset, we have 𝑚 = 1 ⋅ 𝑚 + 0 ⋅ 𝑛 and
𝑛 = 0 ⋅ 𝑚 + 1 ⋅ 𝑛. So, at the very beginning, we already have simple equations expressing
𝑚 and 𝑛 as integer linear combinations of 𝑚 and 𝑛. So if we can maintain equations
of this type throughout the Euclidean algorithm, then hopefully we can end up with an
appropriate equation (7.7). At this point, see if you can repeat one of our earlier gcd
calculations — gcd(12, 20), at the end of § 7.5 on p. 240, or gcd(48, 174) at the end of
§ 7.6 on p. 243 — while keeping track of equations of this type as you go.
gcd(𝑚, 𝑛) = 𝑥𝑚 + 𝑦𝑛,
248 N U M B E R T H E O RY
it turns out to be enough to extend the Euclidean Algorithm so that it keeps track of
some extra information.
1. Input: 𝑚, 𝑛 ∈ ℕ
new 𝑚 ∶= old 𝑛,
new 𝑛 ∶= old 𝑚.
3. Initialise triples:
5. 𝑞 ∶= ⌊𝑎/𝑏⌋.
where the right-hand sides here use the old values of (𝑎, 𝑥, 𝑦) and (𝑏, 𝑧, 𝑤).
7. Go back to Step 4.
In Step 6, we work out (𝑎, 𝑥, 𝑦) − 𝑞 ⋅ (𝑏, 𝑧, 𝑤) using vector algebra. In this case, this
means that we first multiply each member of (𝑏, 𝑧, 𝑤) by 𝑞 and then subtract each
member of the resulting triple from the corresponding member of (𝑎, 𝑥, 𝑦). The result
is (𝑎 − 𝑞𝑏, 𝑥 − 𝑞𝑧, 𝑦 − 𝑞𝑤).
At each step of the Extended Euclidean Algorithm, the triples (𝑎, 𝑥, 𝑦) and (𝑏, 𝑧, 𝑤)
satisfy 𝑎 = 𝑥𝑚 + 𝑦𝑛 and 𝑏 = 𝑧𝑚 + 𝑤𝑛. It is easy to see that this holds for the initial
triples (𝑚, 1, 0) and (𝑛, 0, 1). It is not difficult to show that the property is preserved
by the updating in Step 6. It therefore holds for all triples used in the algorithm. It is
intended that, when the algorithm stops, 𝑎 = gcd(𝑚, 𝑛), so the final 𝑎-triple will give us
all the information we seek.
The final 𝑏-triple will have 𝑏 = 0, so it does not give the gcd; we already have that
from the final 𝑎-triple. But the final 𝑏-triple should still be calculated, when doing this
manually, as a check. It is usually easy to check that the final 𝑏-triple satisfies
0 = 𝑧𝑚 + 𝑤𝑛.
7.8 T H E E X T E N D E D E U C L i D E A N A L G O R i T H M 249
If this does not hold, then the earlier steps should be re-checked to identify the mistake.
Although there is more to do in the Extended Euclidean Algorithm than in the
original Euclidean Algorithm, the extra work is really just bookkeeping. We keep track,
in the triples, of exactly how the first member of each triple can be made up as an integer
linear combination of 𝑚 and 𝑛. But the decisions that we make, in the EEA, and the
calculations that we do with the first member of each triple, are exactly the same in the
two algorithms. So the EEA is really just the ordinary Euclidean Algorithm with extra
accounting tasks.
Here is an example of using the Extended Euclidean Algorithm to compute gcd(27, 40)
and also express it as an integer linear combination of its arguments. As we work out the
triples, we write them underneath each other, forming a table with three columns. The
left column has the numbers driving the calculation, the middle column has the value
of 𝑥, and the right column has the value of 𝑦. Each row (𝑡, 𝑥, 𝑦) satisfies 𝑡 = 𝑎𝑥 + 𝑏𝑦.
At each step, we do the appropriate integer division of the left numbers in the previous
two rows to work out which multiple of the previous row has to be subtracted from the
row above it.5
40 1 0
27 0 1
13 1 −1 (take previous row from the one above it)
1 −2 3 (take twice previous row from the one above it, since ⌊27/13⌋ = 2)
0 27 −40 (take 13× previous row from the one above it)
We could have stopped this calculation as soon as we obtained a row starting with 1,
since no gcd can be < 1. We continued the calculation one row further as a check, since
if our calculations are correct, we end up with a row consisting of 0 followed by the two
numbers 𝑥, 𝑦 such that 𝑎𝑥 + 𝑏𝑦 = 0, and this is usually easily checked. In the special
case when the gcd is 1 (as here), the row staring with 0 contains the two numbers we
started with, but with one of those negated. Note also the zig-zag pattern of negative
numbers going down the last two columns. Checking that these patterns are followed is
a handy way to pick up errors in manual calculations.
In this case, the calculation is correct and we find that
gcd(27, 40) = 1,
and the algorithm also expresses this as an integer linear combination (from the row
starting with 1):
1 = −2 ⋅ 40 + 3 ⋅ 27.
5 If you’ve done row operations in matrices, then you’ll recognise that this is a similar process.
250 N U M B E R T H E O RY
7.9 COPRiMALiTY
Theorem 31.
31 Integers 𝑚 and 𝑛 are coprime if and only if there exist 𝑥, 𝑦 ∈ ℤ such that
𝑥𝑚 + 𝑦𝑛 = 1.
Proof. Let 𝑚, 𝑛 ∈ ℤ. By definition, they are coprime if and only if their gcd is 1. By
Theorem 30, this in turn is true if and only if 1 is the smallest positive member of
𝑚ℤ + 𝑛ℤ. But there is no smaller positive integer than 1, so 1 is the smallest positive
member of 𝑚ℤ + 𝑛ℤ if and only if 1 ∈ 𝑚ℤ + 𝑛ℤ, which is another way of saying that
there exist 𝑥, 𝑦 ∈ ℤ such that 𝑥𝑚 + 𝑦𝑛 = 1.
For example, consider 21 and 25, which are coprime, as we noted above. So there
must exist 𝑥, 𝑦 ∈ ℤ such that 𝑥 ⋅ 21 + 𝑦 ⋅ 25 = 1. We could use 𝑥 = 6 and 𝑦 = −5, since
6 ⋅ 21 − 5 ⋅ 25 = 126 − 125 = 1.
Theorem 32.
32 Let 𝑝 be a prime and let 𝑎 and 𝑏 be integers. Then
𝑝 ∣ 𝑎𝑏 ⟹ 𝑝 ∣ 𝑎 ∨ 𝑝 ∣ 𝑏.
Proof. Let 𝑝, 𝑎 and 𝑏 be as in the statement of the theorem. Assume that 𝑝 ∣ 𝑎𝑏.
If 𝑝 ∣ 𝑎 then we are done already. So, suppose that 𝑝 ∤ 𝑎. Since 𝑝 is prime, this means
that 𝑝 and 𝑎 must be coprime (since a prime is coprime to every positive integer except
its own multiples).
Since 𝑝 and 𝑎 are coprime, there must exist 𝑥, 𝑦 ∈ ℤ such that
𝑥𝑝 + 𝑦𝑎 = 1,
7.9 C O P R i M A L i T Y 251
𝑥𝑝𝑏 + 𝑦𝑎𝑏 = 𝑏.
Consider the two summands on the left. The first summand, 𝑥𝑝𝑏, is clearly a multiple
of 𝑝 because it has 𝑝 as a factor. The second summand, 𝑦𝑎𝑏, is a multiple of 𝑎𝑏, but by
our assumption, 𝑝 ∣ 𝑎𝑏, so 𝑎𝑏 is also a multiple of 𝑝, so in fact the second summand is a
multiple of 𝑝 too. So, both summands on the left are multiples of 𝑝. So their sum is a
multiple of 𝑝 too. So the equation shows that 𝑏 equals a multiple of 𝑝.
For example, the fact that 3 ∣ (8×9) implies that 3 ∣ 8 or 3 ∣ 9; in this case, it happens
that 3 ∣ 9 (but 3 ∤ 8).
The theorem won’t work, in general, if 𝑝 is not prime, though. For example, we
know that 6 ∣ (8 × 9), because 8 × 9 = 72 and we know that 6 ∣ 72 (because 6 × 12 = 72).
But we do not have 6 ∣ 8 or 6 ∣ 9; in fact, 6 ∤ 8 and 6 ∤ 9.
We are now in a position to keep a promise made on p. 236, after proving Theorem 27,
where we showed that every integer can be expressed as a product of primes. There, we
stated that we would later show that this product of primes is always unique. We now
prove this.
Proof. We know from Theorem 27 that every positive integer can be written as a product
of primes. It remains to show that this can be done in only one way.
Assume, by way of contradiction, that there exists a positive integer 𝑛 which can be
written in two different ways as products of primes.
Among all positive integers 𝑛 that can be written as a product of primes in two
different ways, let’s choose the smallest.
Let the primes that appear in one or both of these two products be 𝑝1 , 𝑝2 , … , 𝑝𝑘 .
Then we may suppose that the two different ways of writing 𝑛 as a product of primes
are
𝑒 𝑒 𝑒
𝑛 = 𝑝11 𝑝22 ⋯ 𝑝𝑘𝑘 , (7.8)
𝑓 𝑓 𝑓
𝑛 = 𝑝1 1 𝑝2 2 ⋯ 𝑝𝑘 𝑘 , (7.9)
where 𝑒𝑖 ∈ ℕ0 and 𝑓𝑖 ∈ ℕ0 for each 𝑖 ∈ {1, 2, … , 𝑘}. We may assume that, for each 𝑖, the
𝑒 𝑓
two exponents 𝑒𝑖 and 𝑓𝑖 are not both 0, since if they were then 𝑝𝑖 𝑖 = 𝑝𝑖 𝑖 = 1, so the prime
𝑝𝑖 appears in neither of the two products, so we shouldn’t have included it in the list of
primes appearing in the products.
Since the two products are different, there must be at least one 𝑖 ∈ {1, 2, … , 𝑘} such
that 𝑒𝑖 ≠ 𝑓𝑖 .
Now, suppose one of the primes, say 𝑝𝑗 , appears with positive exponents in each
product: 𝑒𝑗 > 0 and 𝑓𝑗 > 0. (We may have 𝑗 = 𝑖 or 𝑗 ≠ 𝑖; that doesn’t matter.) We
252 N U M B E R T H E O RY
will use this prime 𝑝𝑗 to obtain, from 𝑛, a smaller number that can be expressed as a
product of primes in two different ways, contradicting our earlier assumption that 𝑛 is
the smallest number of this type.
Define
𝑔𝑗 ∶= min{𝑒𝑗 , 𝑓𝑗 },
and note that 𝑔𝑗 > 0 too (since 𝑒𝑗 , 𝑓𝑗 are both > 0).
𝑔 𝑔
We can divide each side of (7.8) by 𝑝𝑗 𝑗 to get an expression for 𝑛/𝑝𝑗 𝑗 as a product
𝑔
of all the primes in the same list. Similarly, we can divide each side of (7.9) by 𝑝𝑗 𝑗 to
𝑔
get another expression for 𝑛/𝑝𝑗 𝑗 as a product using the same primes.
• Now, if 𝑒𝑗 ≠ 𝑓𝑗 , then the prime 𝑝𝑗 still has different exponents in the two products
(but the two exponents of 𝑝𝑗 have each been reduced by 𝑔𝑗 ).
𝑔
So, either way, the two expressions for 𝑛/𝑝𝑗 𝑗 each have some prime 𝑝𝑖 with different
𝑔
exponents in the two expressions. So we have two different expressions for 𝑛/𝑝𝑗 𝑗 as
𝑔 𝑔
products of primes. Also, 𝑔𝑗 > 0 implies 𝑝𝑗 𝑗 > 1, which in turn implies 𝑛/𝑝𝑗 𝑗 < 𝑛.
The upshot of this is that we now have a smaller positive integer that can be written
as a product of primes in two different ways. This contradicts our choice of 𝑛 as the
smallest positive integer with this property.
So our assumption, that one of the primes appears with a positive exponent in both
the products in (7.8) and (7.9), is wrong. This, together with the fact that we earlier
ruled out a prime having exponent 0 in both products, implies that every prime in our
list appears with a positive exponent in one product and zero exponent in the other
product. In other words, the two products use entirely separate sets of primes.
Let 𝑝 be any prime appearing (with positive exponent) in the first expression for 𝑛,
in (7.8). Let us write 𝑞1 , 𝑞2 , … , 𝑞𝑙 for the primes that appear (with positive exponent) in
the second product, and let their exponents be ℎ1 , ℎ2 , … , ℎ𝑙 , respectively, so
ℎ ℎ ℎ
𝑛 = 𝑞1 1 𝑞2 2 ⋯ 𝑞𝑙 𝑙 .
We know now, from the reasoning in the previous paragraph, that 𝑝 ∉ {𝑞1 , 𝑞2 , … , 𝑞𝑙 },
(because every prime appearing in the first product, in (7.8), does not appear at all in
the second product, in (7.9)).
But, since 𝑝 appears in the first product, we know that 𝑝 ∣ 𝑛. By Theorem 32, this
implies that 𝑝 divides at least one of 𝑞1 , 𝑞2 , … , 𝑞𝑙 . But one prime cannot divide another
prime unless they are the same prime. So 𝑝 must equal one of 𝑞1 , 𝑞2 , … , 𝑞𝑙 . This is a
contradiction (see the end of the previous paragraph).
7.10 M O D U L A R A R i T H M E T i C 253
So our initial assumption, that there exists a positive integer that can be written as
a product of two different ways, was wrong. Therefore every positive integer can only
be written as a product of primes in one way.
We discussed remainders and the mod operation in § 7.3. There are many situations
where we are mainly interested in remainders, and all our calculations are done with
them. For example:
• The seven days of the week can be regarded as names for the remainders, modulo
7, of the number of days since some reference date in the past.
• The hours in 24-hour clock time are remainders, modulo 24, of the number of
hours since some reference time.
• When you press a toggle switch (which turns it on if it was off, and turns it off if
it was already on), then the state of the switch (using 0 for “off” and 1 for “on”)
is the remainder, modulo 2, of the number of times the switch has been pressed
since it was last known to be off.
• If you are playing Monopoly, then (ignoring special situations that cause sudden
jumps to other parts of the board, such as going to Jail or certain Chance and
Community Chest cards) your position on the board (which is a square circuit of
40 positions) is the remainder modulo 40 of the sum of your dice throws so far.
• If you are playing a game where you move around a rectangular display with
“wrap-around”, then whenever you change your coordinates by some amount, the
new coordinates are obtained from the old ones by taking remainders modulo the
appropriate dimensions of the display.
• The last digit of a nonnegative decimal number is its remainder modulo 10. The
last bit of a binary number is its remainder modulo 2.
• Sometimes, calculations with very large numbers can be checked using calculations
with some appropriate remainders of those numbers. Remainders are smaller, and
in general do not contain all the information that the original numbers contain,
so calculations with them cannot tell you everything about the larger numbers,
and they won’t detect every possible error. But, used well, they can detect some
common errors at modest cost.
Let 𝑛 be a positive integer. Two integers 𝑎 and 𝑏 are said to be congruent modulo
𝑛, or congruent mod 𝑛 for short, if they differ by a multiple of 𝑛:
∃𝑘 ∈ ℤ 𝑎 − 𝑏 = 𝑘𝑛.
𝑎 = 𝑞𝑎 𝑛 + 𝑟 𝑎 , 𝑞𝑎 ∈ ℤ, 0 ≤ 𝑟𝑎 ≤ 𝑛 − 1;
𝑏 = 𝑞𝑏 𝑛 + 𝑟 𝑏 , 𝑞𝑏 ∈ ℤ, 0 ≤ 𝑟𝑏 ≤ 𝑛 − 1.
𝑎 − 𝑏 = (𝑞𝑎 − 𝑞𝑏 )𝑛 + (𝑟𝑎 − 𝑟𝑏 ).
Note that the use of “mod” here is different (but related) to the way we used it in § 7.3.
There, it was a binary operation, with 𝑎 mod 𝑛 giving the remainder of 𝑎 after division
by 𝑛. Now, used in parenthesis after an equation, it is no longer a binary operation;
rather, it signifies that ≡ means congruence mod 𝑛 (for the 𝑛 specified after “mod”).
But the two uses of “mod” are closely related, since Theorem 34 tells us that
Every integer 𝑎 belongs to a family of all those integers that differ from 𝑎 by a
multiple of 𝑛, or in other words, all those integers that have the same remainder as 𝑎
after division by 𝑛. This family may be denoted by [𝑎], provided the modulus 𝑛 is clear
from the context. It may also be denoted by 𝑎 + 𝑛ℤ, meaning the set of all numbers
that can be obtained from 𝑎 by adding an integer multiple of 𝑛. So
[𝑎] = 𝑎+𝑛ℤ = {𝑎+𝑘𝑛 ∶ 𝑘 ∈ ℤ} = {… … , 𝑎−3𝑛, 𝑎−2𝑛, 𝑎−𝑛, 𝑎, 𝑎+𝑛, 𝑎+2𝑛, .𝑎+3𝑛, … …}.
It is useful to describe these families using our knowledge of relations from Chapter 2.
7.10 M O D U L A R A R i T H M E T i C 255
Reflexive:
For any 𝑎 ∈ ℤ, we have
𝑎 − 𝑎 = 0 = 0𝑛,
so
𝑎 ≡ 𝑎 (mod 𝑛).
Symmetric:
Suppose 𝑎, 𝑏 ∈ ℤ are congruent modulo 𝑛:
𝑎 ≡ 𝑏 (mod 𝑛).
𝑎 − 𝑏 = 𝑘𝑛.
𝑏 ≡ 𝑎 (mod 𝑛).
Transitive:
Suppose 𝑎, 𝑏, 𝑐 ∈ ℤ satisfy
𝑎 ≡ 𝑏 (mod 𝑛),
𝑏 ≡ 𝑐 (mod 𝑛).
𝑎 − 𝑏 = 𝑘𝑛,
𝑏 − 𝑐 = 𝑙𝑛.
(𝑎 − 𝑏) + (𝑏 − 𝑐) = 𝑘𝑛 + 𝑙𝑛.
Simplifying gives
𝑎 − 𝑐 = (𝑘 + 𝑙)𝑛.
256 N U M B E R T H E O RY
𝑎 ≡ 𝑐 (mod 𝑛).
[𝑎] + [𝑏] = {𝑎 + 𝑘𝑛 ∶ 𝑘 ∈ ℤ} + {𝑏 + 𝑙𝑛 ∶ 𝑙 ∈ ℤ}
= {𝑎 + 𝑘𝑛 + 𝑏 + 𝑙𝑛 ∶ 𝑘, 𝑙 ∈ ℤ}
= {𝑎 + 𝑏 + (𝑘 + 𝑙)𝑛 ∶ 𝑘, 𝑙 ∈ ℤ}
= {𝑎 + 𝑏 + ℎ𝑛 ∶ ℎ ∈ ℤ}, (since 𝑘 + 𝑙 ranges over all integers)
= [𝑎 + 𝑏].
We see, in fact, that when adding two equivalence classes [𝑎] and [𝑏] modulo 𝑛, we don’t
actually need to work out all possible sums; instead, we can take just one representative
from each class, say 𝑎 and 𝑏, and just add those two together (doing just one addition,
instead of infinitely many), and the resulting equivalence class is just the class [𝑎 + 𝑏]
that the sum 𝑎 + 𝑏 belongs to.
Similarly, it can be shown that
This is enough, for our purposes, since it means that, if we take any representatives
𝑥 ∈ [𝑎] and 𝑦 ∈ [𝑏], then their product 𝑥𝑦 belongs to [𝑎𝑏], so 𝑥𝑦 ≡ 𝑎𝑏 (mod 𝑛).
7.10 M O D U L A R A R i T H M E T i C 257
The question of division is more complex, partly because the integers are not closed
under division. We return to this later.
Our observations so far show that, when doing arithmetic mod 𝑛, we can use any
representative of each number’s equivalence class (at least for addition, subtraction and
multiplication). It is often most convenient to work with the representatives 0, 1, … , 𝑛−1,
since they seem to be the simplest possible representatives and they are the possible
values of remainders modulo 𝑛.
We write ℤ𝑛 for the set {0, 1, … , 𝑛 − 1} endowed with addition, subtraction and
multiplication but with all operations modified so that, whenever a number outside this
set is produced, its remainder modulo 𝑛 is used instead.
For example, consider ℤ4 , which uses the set {0,1,2,3}. To add 2 and 3, we start by
doing so in the usual way, obtaining 5. But this is not in the set, so we replace it by its
remainder modulo 4, which is 1. So,
2+3 = 1 in ℤ4 . (7.10)
The main message of this section is that, if we want to do a calculation with integers
involving only addition, subtraction and multiplication, and if we only want a remainder
at the end, then we can use remainders throughout. This simplifies the calculations a
lot.
For example, suppose you want to calculate
Then, instead of doing working this out using ordinary arithmetic with these quite large
integers and then finding the reminder at the end, we can take remainders as we go.
First, we take the remainders mod 10 of each number in the expression (7.12):
2360679774 mod 10 = 4,
7320508 mod 10 = 8,
41421356 mod 10 = 6,
109 mod 10 = 9.
258 N U M B E R T H E O RY
Replacing each number in the expression (7.12) by its remainder mod 10 gives the
expression
⒧4 − (8 + 6) ∗ 9⒭ mod 10
The first calculation is 8 + 6 = 14, but we take remainders as we go, so we actually
calculate
(8 + 6) mod 10 = 14 mod 10 = 4.
We keep doing these reductions. The full calculation is
We used remainders mod 10 in this example so that the remainders were clear throughout,
using the fact that the remainder mod 10 is always just the final decimal digit. But the
same principle holds for other moduli. For example,
(84×32) mod 9 = ((84 mod 9)×(32 mod 9)) mod 9 = (3×5) mod 9 = 15 mod 9 = 6,
7.11 M O D U L A R i N V E R S E S 259
but
2678 mod 9 = 5.
This check is one-sided in the sense that a failure of the check (as in this example)
indicates an error in the calculation but the converse does not hold. Not all errors in
integer calculations will be detected this check. Furthermore, if the check detects an
error, it does not tell you how to fix it.
Our choice of 9 as the modulus here was not random! The use of calculations mod 9
to check integer calculations is known as “casting out nines” and has a long history. We
study this method further in Exercise 11.
Although modular arithmetic works very smoothly for addition, subtraction and mul-
tiplication, there are some complications when doing division and when using exponents.
We consider division in the next section (§ 7.11) and exponents in § 7.14.
We omitted division from the our list of operations that you can do in ℤ𝑛 . Division is
the opposite of multiplication. In ordinary arithmetic, finding 𝑥/𝑦 is the same as finding
the product 𝑥𝑦 −1 . Here 𝑦 −1 is the multiplicative inverse of 𝑦, which is defined by the
equation
𝑦𝑦 −1 = 1.
When dealing with ordinary real numbers, we call 𝑦 −1 the reciprocal of 𝑦, and it exists
for any nonzero real number.
Now consider inverses in ℤ𝑛 . Suppose firstly that 𝑛 = 7: ℤ7 = {0, 1, 2, 3, 4, 5, 6}. What
is the inverse of, for example, 2? Since 2 × 4 = 8 ≡ 1 (mod 7), we have 2−1 = 4 in ℤ7 . It
can also be seen that: 1−1 = 1, 3−1 = 5, 4−1 = 2, 5−1 = 3, 6−1 = 6. In fact, everything in
ℤ7 except 0 has an inverse.
Things are not always so convenient.
Consider ℤ6 = {0, 1, 2, 3, 4, 5}. Which of these elements have inverses? Zero, of course,
never does, and 1 is always its own inverse. What are the inverses for 2, 3, 4, 5? Suppose
we try to find the inverse of 2. We want a number 𝑧 = 2−1 such that 2𝑧 ≡ 1 (mod 6)
— in other words, such that 2𝑧 is one plus a multiple of 6, i.e., 1, 7, 13, 19, …This is
impossible. So 2 has no inverse in ℤ6 . Neither does 3 or 4. It can be seen that 5 is its
own inverse, since 5 × 5 = 25 ≡ 1 (mod 6).
In fact, we can characterise those members of ℤ𝑛 that have inverses.
Theorem 36.
36 A positive integer 𝑥 ∈ ℤ𝑛 has an inverse in ℤ𝑛 if and only if 𝑥 and 𝑛 are
coprime.
Proof. Let 𝑥 ∈ ℤ𝑛 .
If 𝑥 has an inverse 𝑥−1 , then 𝑥𝑥−1 = 1 in ℤ𝑛 . Therefore
∃𝑘 ∈ ℤ 𝑥𝑥−1 + 𝑘𝑛 = 1.
260 N U M B E R T H E O RY
(𝑙𝑛 + 𝑦 ′ )𝑥 + 𝑧𝑛 = 1,
which means
𝑦 ′ 𝑥 + (𝑙𝑥 + 𝑧)𝑛 = 1,
Since both 𝑥 and 𝑦 ′ are in ℤ𝑛 , this means that
𝑦′𝑥 = 1 in ℤ𝑛 .
Theorem 36 shows one reason for the importance of coprimality: it determines which
elements of ℤ𝑛 have inverses. The Euclidean Algorithm can be used to determine whether
or not a member 𝑥 of ℤ𝑛 is coprime and therefore whether or not it has an inverse.
Furthermore, if we want to determine the inverse 𝑥−1 of 𝑥, then applying the Extended
Euclidean Algorithm to 𝑥 and 𝑛 gives 𝑦, 𝑧 ∈ ℤ such that
𝑦𝑥 + 𝑧𝑛 = 1.
The number of positive integers less than 𝑛 that are coprime to 𝑛 is denoted by 𝜑(𝑛),
and this function 𝜑 is called the Euler totient function.
function Putting it another way:
From the definitions of the Euler totient function and ℤ∗𝑁 , together with Theorem 36,
we have
𝜑(𝑛) = |ℤ∗𝑛 |.
The Euler totient function of a prime number 𝑝 is given by
𝜑(𝑝) = 𝑝 − 1. (7.13)
This is because every positive integer less than a prime must be coprime to it. So the
maximum possible value of 𝜑(𝑛) is 𝑛 − 1, and that this is attained if and only if 𝑛 is
prime.
We can extend (7.13) to prime powers.
Theorem 37.
37 For any prime 𝑝 and any 𝑚 ∈ ℕ,
To be able to compute 𝜑(𝑛) when 𝑛 is not a prime power, we need something more.
Fortunately, we have
𝑛: 2 3 4 5 6 7 8 9 10 11 12 13
𝜑(𝑛): 1 2 2 4 2 6 4 6 4 10 4 12
In fact, if you have a method for factorising any integer, then you can use it to
compute the Euler totient function of any integer. Given a positive integer 𝑛, you first
𝑚 𝑚 𝑚
factorise it into a product of prime powers, say 𝑝1 1 𝑝2 2 ⋯ 𝑝𝑘 𝑘 , where 𝑝1 , 𝑝2 , … , 𝑝𝑘 are
𝑚 𝑚
primes. Powers 𝑝𝑖 𝑖 and 𝑝𝑗 𝑗 of distinct primes 𝑝𝑖 , 𝑝𝑗 (𝑖 ≠ 𝑗) are coprime, so
𝑚 𝑚 𝑚 𝑚 𝑚 𝑚
𝜑(𝑛) = 𝜑 ⒧𝑝1 1 𝑝2 2 ⋯ 𝑝𝑘 𝑘 ⒭ = 𝜑 ⒧𝑝1 1 ⒭ 𝜑 ⒧𝑝2 2 ⒭ ⋯ 𝜑 ⒧𝑝𝑘 𝑘 ⒭ ,
𝑚
by (7.15). Then we can use (7.14) to work out each 𝜑 ⒧𝑝𝑖 𝑖 ⒭. This gives
𝑚1 −1 𝑚2 −1 𝑚𝑘 −1
𝜑(𝑛) = ⒧𝑝1 (𝑝1 − 1)⒭ ⒧𝑝2 (𝑝2 − 1)⒭ ⋯ ⒧𝑝𝑘 (𝑝𝑘 − 1)⒭ .
𝑚 1 𝑚 1 𝑚 1
𝜑(𝑛) = 𝑝1 1 ⒧1 − ⒭ 𝑝2 2 ⒧1 − ⒭ ⋯ 𝑝𝑘 𝑘 ⒧1 − ⒭
𝑝1 𝑝2 𝑝𝑘
𝑚 𝑚 𝑚 1 1 1
= 𝑝1 1 𝑝2 2 ⋯ 𝑝𝑘 𝑘 ⒧1 − ⒭ ⒧1 − ⒭ ⋯ ⒧1 − ⒭
𝑝1 𝑝2 𝑝𝑘
1 1 1
= 𝑛 ⒧1 − ⒭ ⒧1 − ⒭ ⋯ ⒧1 − ⒭ .
𝑝1 𝑝2 𝑝𝑘
It is in general difficult to compute 𝜑(𝑛) from scratch (i.e., if the prime factorisation
of 𝑛 is unknown).
7.13 FA S T E X P O N E N T i AT i O N
On the face of it, this means doing 𝑚 − 1 multiplications, which if 𝑚 is large means
a lot of work. Fortunately, there are much more efficient methods. If we express the
exponent as a sum of powers of 2 (which is given by its binary representation), then we
7.14 M O D U L A R E X P O N E N T i AT i O N 263
can compute 𝑎 𝑚 in a way that does not use too many multiplications. For example, to
calculate 336 , observe that 36 = 25 + 22 , so
5 +22 5 2
336 = 32 = 32 × 32 = ((((32 )2 )2 )2 )2 × (32 )2 .
Careful accounting should show that the number of multiplications required to compute
336 is 6. These are:
This compares well with the naïve method, which requires 35 multiplications.
The idea indicated here can be used as the basis of an algorithm for exponentiation.
Such an algorithm can compute 𝑎 𝑚 with at most 2⌊log2 𝑚⌋ multiplications, which can
be shown to be at most twice the number of bits in the binary representation of 𝑚.
7.14 M O D U L A R E X P O N E N T i AT i O N
32 mod 25: 32 = 3 × 3 = 9
2
3 mod 25 = 9 mod 25 = 9
𝑦1 , 𝑦2 , … , 𝑦𝜑(𝑛) . (7.18)
Now let’s form the products, in ℤ∗𝑛 , of the elements in each list, and also rearrange
the second product by collecting all the 𝑥 factors together:
Since the two products come from the same elements being multiplied, they must be
equal. Therefore
𝑦1 𝑦2 ⋯ 𝑦𝜑(𝑛) ≡ 𝑥𝜑(𝑛) 𝑦1 𝑦2 ⋯ 𝑦𝜑(𝑛) (mod 𝑛). (7.20)
Now, since all the elements 𝑦1 , 𝑦2 , … , 𝑦𝜑(𝑛) are elements of ℤ∗𝑛 , their product 𝑦1 𝑦2 ⋯ 𝑦𝜑(𝑛)
must be in ℤ∗𝑛 too. Therefore 𝑦1 𝑦2 ⋯ 𝑦𝜑(𝑛) has an inverse. Multiplying each side of (7.20)
by this inverse, we obtain
𝑥𝜑(𝑛) ≡ 1 (mod 𝑛),
which completes the proof.6
336 mod 25 = 336 mod 𝜑(25) mod 25 = 336 mod 20 mod 25 = 316 mod 25.
6 This theorem can be understood more deeply using group theory, which is beyond the scope of these
Course Notes, but we give an outline of this connection here. It can be shown that ℤ∗𝑛 is a group under
multiplication. This means that it is closed (if two numbers 𝑎, 𝑏 are coprime to 𝑛, then their product 𝑎𝑏 will
be coprime to 𝑛 as well), the multiplication operation is associative, there is a multiplicative identity (in
this case, 1), and every element has a multiplicative inverse (Theorem 36). It is a basic theorem of group
theory that, if 𝑔 is a member of the group and 𝑘 is the size of the group, then 𝑔𝑘 equals the multiplicative
identity of the group. Theorem 38 is simply the application of these results to the group ℤ∗𝑛 .
266 N U M B E R T H E O RY
We now have a much simpler problem, with an exponent of 16 instead of 36. A simpler
exponent will typically mean fewer multiplications are needed. In this case, we have
4
336 mod 25 = 316 mod 25 = 32 mod 25 = (((32 )2 )2 )2 mod 25.
32 mod 25: 32 = 3 × 3 = 9
2
3 mod 25 = 9 mod 25 = 9
This can be compared with the longer calculation on p. 263 in § 7.14. The saving
becomes much greater for higher exponents.
If, in addition,
𝑥𝑘 ≢ 1 (mod 𝑛) for all 𝑘 < 𝜑(𝑛),
then we say that 𝑥 is a primitive root of 𝑛. Such an 𝑥 has the property that its powers
𝑥, 𝑥2 , 𝑥3 , … give all the elements of ℤ∗𝑛 , in some order, reaching 1 at 𝑥𝜑(𝑛) . In this sense,
𝑥 generates ℤ∗𝑛 . If 𝑥 is not a primitive root, then its powers will go through some
proper subset of ℤ∗𝑛 .
We emphasise that, among the members of ℤ∗𝑛 , it is the primitive roots that have
the greatest possible range of values of their powers. This will be important later on.
Example:
Suppose 𝑛 = 7. Then 𝜑(7) = 6, by (7.21), since 7 is prime, and ℤ∗7 = {1, 2, 3, 4, 5, 6}.
Consider the successive powers of 3 in ℤ∗7 :
7.15 P R i M i T i V E R O O T S 267
𝑘: 1 2 3 4 5 6
𝑘
3 mod 7: 3 2 6 4 5 1
We see from this that 3𝜑(7) = 36 ≡ 1 (mod 7), and that 3𝑘 ≢ 1 (mod 7) for all 𝑘 < 6.
So the values of 3𝑘 mod 7, for 𝑘 = 1, … , 6, are all the elements of ℤ∗7 . So 3 is a primitive
root of 7.
On the other hand, 2 is not a primitive root of 7. Consider its powers:
𝑘: 1 2 3
𝑘
2 mod 7: 2 4 1
So 23 ≡ 1 (mod 7), with the exponent 3 < 𝜑(7), which means the definition of prim-
itive root is not satisfied. Taking further powers, with 𝑘 = 4, 5, 6, … , will just give the
same numbers 2,4,1,…; we will not get anything new.
Not all positive integers have primitive roots. For example, there are no primi-
tive roots of 8. Let us consider all the candidates. Firstly, observe that 𝜑(8) = 4 and
ℤ∗8 = {1, 3, 5, 7}. The following table shows that no member of ℤ∗8 can be a primitive root.
𝑘: 1 2
𝑘
1 mod 8: 1
3𝑘 mod 8: 3 1
5𝑘 mod 8: 5 1
7𝑘 mod 8: 7 1
Note that it does not help to look at powers of some number 𝑥 which is in ℤ8 but
not in ℤ∗8 . For example, try 𝑥 = 2. Its powers in ℤ∗8 are 2, 4, 0, 0, 0, …; it never reaches
1, and in fact once it reaches 0 it is stuck there. This is typical of what happens when
taking successive powers of a member of ℤ𝑛 ∖ ℤ∗𝑛 . So a primitive root of 𝑛, if one exists,
must in fact be a member of ℤ∗𝑛 .
Numbers that have primitive roots have been characterised.
Theorem 39.
39 The numbers which have primitive roots are 1, 2, 4, and those of the
form 𝑝 or 2𝑝 𝑘 , where 𝑝 is an odd prime and 𝑘 ∈ ℕ.
𝑘
□
The number 8 is not covered by this list of possibilities. (It is a power of the even
prime, 2, rather than an odd one.) So it cannot have a primitive root.
Theorem 40.
40 If 𝑛 has a primitive root, then it has 𝜑(𝜑(𝑛)) of them. □
This result can be proved by elementary methods, and insight into why it is true
can be gained by playing with small examples.
Consider our earlier example of 𝑝 = 7. We saw that 3 is a primitive root of 7, and
found all its powers 3𝑖 for 𝑖 ≤ 6, with 36 ≡ 1 (mod 7). We then saw that 2 is not a
primitive root of 7, because its powers reach 1 too soon: 23 ≡ 1 (mod 7).
Now let’s use the representation of 2 as a power of the primitive root 3, and look
more closely at why 2 fails to be a primitive root.
268 N U M B E R T H E O RY
We have 2 ≡ 32 (mod 7), and the exponent here, also 2, is a divisor of 6, with
2 × 3 = 6, so
23 ≡ (32 )3 = 32×3 = 36 ≡ 1 (mod 7).
This illustrates that we can use the exponents to work out all the primitive roots. The
exponent 2 is a factor of 6, and as we have just seen, this stops 32 mod 7 = 2 from being
another primitive root. The exponent 3 is also a factor of 6, so 33 mod 7 = 6 won’t be a
primitive root either:
But it’s not just about being a factor of 6. Consider exponent 4, which gives 34 mod 7 =
4. This is not a factor of 6, but it does have a factor of 2 in common with 6. Because of
this, it won’t give us a primitive root either:
So, if the exponent is not coprime to 6, then it cannot yield a primitive root. This
prevents 32 , 33 and 34 from being primitive roots of 7. So none of 2,6,4 are primitive
roots of 7.
On the other hand, if the exponent is coprime to 6, it will yield a primitive root
when our first primitive root 3 is raised to that exponent. This means that exponents 1
and 5 yield primitive roots. So the two primitive roots of 7 are 31 = 3 and 35 mod 7 = 5.
So the number of primitive roots of 7 is indeed
This illustrates Theorem 40, and also that, once we have one primitive root 𝑎 of 𝑛, the
others have the form 𝑎 𝑘 where 𝑘 is coprime to 𝜑(𝑛).
There is, at present, no fast algorithm for finding a primitive root of a number.
7.16 O N E - WAY F U N C T i O N S
1. “easy” to compute;
This can be made precise, but doing so is beyond the scope of this unit.
7.17 M O D U L A R E X P O N E N T i AT i O N W i T H F i X E D B A S E 269
One-way functions are believed to exist, and there is a number of functions that
are regarded as one-way functions. We will see some examples shortly. However, no
function has been rigorously proved to be one-way in the precise formal sense of that
term.
7.17 M O D U L A R E X P O N E N T i AT i O N W i T H F i X E D B A S E
The function modular exponentiation with fixed base performs the mapping
𝑥 ↦ 𝑎 𝑥 mod 𝑛.
We have seen that this can be computed efficiently, using the techniques of § 7.14. Here,
we will focus on using exponents 𝑥 < 𝜑(𝑛), since we saw in § 7.14 that every other
possible exponent can be reduced to such an 𝑥.
Although this function is easy to compute, it seems to be much harder to invert.
For the inverse, we are given 𝑦 < 𝑛 and must find 𝑥 such that 𝑎 𝑥 ≡ 𝑦 (mod 𝑛). If
we want to make the inverse to be as hard as possible, we should ensure that there are
as many potential values of 𝑥 as possible, so that exhaustively searching through all
possible 𝑥, to find the one which satisfies 𝑎 𝑥 ≡ 𝑦 (mod 𝑛), takes as long as possible. To
this end, it is desirable to choose 𝑎 to be a primitive root of 𝑛, because such numbers
give the greatest range of possible values of their powers (as we observed early in § 7.15).
The inverse problem is then the following.
Discrete Logarithm:
Fix: 𝑛 ∈ ℕ, primitive root 𝑎 of 𝑛.
Input: 𝑦 ∈ ℤ∗𝑛
Output: 𝑥 ≤ 𝜑(𝑛) such that 𝑎 𝑥 ≡ 𝑦 (mod 𝑛).
𝑥 = log𝑎 𝑦.
The current belief is that Discrete Logarithm has no fast algorithm. In fact, it seems
to be of about the same difficulty as factorising integers. The best known algorithms for
each take, very roughly, similar amounts of computation time.
Furthermore, Discrete Log is believed to be almost always hard.
Modular exponentiation with a fixed primitive root as the base is believed to be a
one-way function, although this has not been proved, and proving it would solve a major
open problem in computer science.
Recall that 𝜑(𝑛) is maximised when 𝑛 is prime. So, for the best possible candidate
one-way function, we should choose a large prime 𝑝 and then choose 𝑎 to be a primitive
root of 𝑝. This is exactly what we do when using this candidate one-way function to
help distribute cryptographic keys securely. This is described in the next section.
Suppose we have a large number of users who wish to be able to communicate with
each other. Suppose that any pair of them who communicate want secrecy, so that
no-one else (including others in our large set of users) can read their messages. With
traditional cryptosystems (including the type discussed in § 2.10), they will need to
7.18 D i F F i E - H E L L M A N K E Y A G R E E M E N T S C H E M E 271
agree in advance on a shared secret key. This requires each pair of users to have their
own secure communications channel. Not only is this expensive, but it requires time
to arrange. This severely limits the ability of the users to communicate spontaneously,
without having planned ahead of time to do so. It is clear that the demands of modern
electronic communications and commerce are quite at odds with these limitations.
A remarkable solution to this problem was proposed by Diffie and Hellman in 1976,
in a paper that marked the beginning of a revolution in cryptography.7 Their method
uses the one-way function we have just met: modular exponentiation with fixed base,
where the base is a primitive root of a large prime. It works as follows.
Firstly, we fix a large prime 𝑝 and a primitive root 𝑎 of 𝑝. These numbers are public,
in that they are disseminated, without any encryption, to all users of the system, and
system-wide, in that all users use the same values of 𝑝 and 𝑎. Each user generates their
own private random number 𝑥 ∈ {1, … , 𝑝 −1}, which is regarded as a member of ℤ∗𝑝 , and
from it generates 𝑦 = 𝑎 𝑥 mod 𝑝, which is made public. (Note that we are refraining from
calling these numbers keys since, strictly speaking, they are not used directly to encrypt
or decrypt messages.)
Observe that anyone who wants to determine some user’s private number 𝑥 is faced
with the problem of inverting a candidate one-way function. They see 𝑝, 𝑎 and 𝑦 =
𝑎 𝑥 mod 𝑝, and from this must determine 𝑥. This is exactly the Discrete Log problem,
i.e., the inverse of modular exponentiation with fixed base 𝑎. Since this appears to be
a difficult problem, provided 𝑝 is large, we will assume that the private number of each
user is secure.
Suppose now that two users, Alice and Bob, want to communicate. Suppose that
Alice’s private and public numbers are 𝑥𝐴 and 𝑦𝐴 , respectively, while Bob’s are 𝑥𝐵 and
𝑦𝐵 . Each of them knows the system-wide constants 𝑝 and 𝑎, and each can read the
other’s public number. The exact means by which this is done is an implementation
detail that does not concern us here. Perhaps they send the public numbers to each
other (unencrypted), or perhaps the public numbers of all users are collected together
in some central, publicly available file.
Now, when Alice reads Bob’s public number 𝑦𝐵 , she calculates her key 𝑘𝐴𝐵 by raising
Bob’s public number to the power of her own private number:
𝑥
𝑘𝐴𝐵 ∶= 𝑦𝐵𝐴 mod 𝑝.
Note that only she can do this calculation, since only she knows 𝑥𝐴 .
Similarly, Bob reads Alice’s public number 𝑦𝐴 , and then raises it to the power of his
own private number 𝑥𝐵 , obtaining his key 𝑘𝐵𝐴 :
𝑥
𝑘𝐵𝐴 ∶= 𝑦𝐴𝐵 mod 𝑝.
7 W. Diffie and M. E. Hellman, New directions in cryptography, IEEE Transactions on Information Theory
IT-22 (1976) 644–654.
272 N U M B E R T H E O RY
Note that only he can do this, because no-one else knows 𝑥𝐵 . Note also that Alice and
Bob can do these computations independently of each other (provided each has made
their public number available).
Although of Alice and Bob have each done a computation that only they could do,
the keys they each compute turn out, remarkably, to be exactly the same:
𝑥 𝑥
𝑘𝐴𝐵 = 𝑦𝐵𝐴 = (𝑎 𝑥𝐵 )𝑥𝐴 = 𝑎 𝑥𝐵 𝑥𝐴 = 𝑎 𝑥𝐴 𝑥𝐵 = (𝑎 𝑥𝐴 )𝑥𝐵 = 𝑦𝐴𝐵 = 𝑘𝐵𝐴 , all in ℤ∗𝑝 ,
with the exponentiations being done mod 𝑝 using the methods of § 7.14.
The outcome of this process is that Alice and Bob have arrived at the same key,
without that key itself having been sent anywhere. No secure channel is needed for key
transmission. Neither party alone could determine the key they agree on, as it depends
on the private numbers of both of them.
The problem facing a cryptanalyst is the following:
Diffie-Hellman problem:
Given: 𝑝, 𝑎, 𝑎 𝑥𝐴 , 𝑎 𝑥𝐵 (both mod 𝑝);
Find: 𝑎𝑥𝐴 𝑥𝐵 mod 𝑝.
This is believed to be about as difficult as Discrete Log, although we only know that
it cannot be significantly harder than Discrete Log:
Theorem 41. 41 Any algorithm for Discrete Log can be used to construct an algorithm
for the Diffie-Hellman problem. A fast algorithm for Discrete Log yields a fast algorithm
for the Diffie-Hellman problem.
Proof. Suppose we have an algorithm for Discrete Log. The inputs to the Diffie-
Hellman problem are 𝑝, 𝑎, 𝑎𝑥𝐴 and 𝑎 𝑥𝐵 , the last three being members of ℤ∗𝑝 . If we give
𝑝, 𝑎 and (say) 𝑎 𝑥𝐴 to the Discrete Log algorithm, then it will return 𝑥𝐴 . We can then
compute 𝑘𝐴𝐵 = 𝑎 𝑥𝐴 𝑥𝐵 mod 𝑝, just as Alice did above:
𝑘𝐴𝐵 = (𝑎 𝑥𝐵 )𝑥𝐴 = 𝑎 𝑥𝐴 𝑥𝐵 ,
with exponentiation mod 𝑝. This gives the desired output for the Diffie-Hellman prob-
lem.
If the algorithm for Discrete Log is fast, then this prodedure for the Diffie-Hellman
problem, which uses it, will be fast too, since there is not much extra work involved,
and fast exponentiation techniques are used. □
Open problem:
Does the algorithmic relationship between Discrete Log polynomial time and the Diffie-
Hellman problem go the other way too? In other words, is there an efficient way to
transform algorithms for the Diffie-Hellman problem into algorithms for Discrete Log?
7.18 D i F F i E - H E L L M A N K E Y A G R E E M E N T S C H E M E 273
It is widely believed that this is the case, but it has not yet been proved.
Example:
We use small numbers in this example, so the calculations can be done manually
to help understand how it works, but of course you need very large numbers in real
applications.
For our public global parameters, we use 𝑝 = 11 and 𝑎 = 2. In Exercise 17 you will
show that 2 is a primitive root of 11. We will take that as given, for now.
Suppose Alice chooses private number 𝑥𝐴 = 3 and Bob chooses private number 𝑥𝐵 = 6.
Then they compute their public numbers 𝑦𝐴 and 𝑦𝐵 as follows.
Alice Bob
𝑦𝐴 = 𝑎 𝑥𝐴 mod 𝑝 𝑦𝐵 = 𝑎 𝑥𝐵 mod 𝑝
= 23 mod 11 = 26 mod 11
= 8. = 9.
So Alice sends Bob her public number 𝑦𝐴 = 8 and Bob sends Alice his public number
𝑦𝐵 = 9. Then Alice raises Bob’s public number 𝑦𝐵 to the power of her own private number
𝑥𝐴 :
𝑥
𝑘𝐴𝐵 = 𝑦𝐴𝐵 = 93 ≡ 3 (mod 11).
Meanwhile, Bob raises Alice’s public number 𝑦𝐴 to the power of his own private number
𝑥𝐵 :
𝑥
𝑘𝐴𝐵 = 𝑦𝐵𝐴 = 86 ≡ 3 (mod 11).
𝑥 𝑥
We see that 𝑦𝐴𝐵 ≡ 𝑦𝐵𝐴 (mod 11), which is what we expect, since both are equal to
This last calculation, using both 𝑥𝐴 and 𝑥𝐵 , uses the private numbers of both Alice
and Bob. So this particular calculation cannot be done by either of them individually,
provided they are each able to keep their own private number secret. But we have seen
that they can still each work out the final number 𝑘𝐴𝐵 = 3, even though they do not
know the other’s private number.
The final number, 𝑘𝐴𝐵 = 3 in this case, is then ready for use as a key in a proper
cryptosystem.
7.19 EXERCiSES
1.
(a) Write the statement that integer 𝑑 is a divisor of integer 𝑛 as a predicate logic
expression, using the integer multiplication function and without using the divisibility
predicate ∣ .
(b) Write the statement that integer 𝑛 is prime as a predicate logic expression. You
may use the divisibility relation ∣ and the inequality relation ≤.
(c) Write the statement that integer 𝑑 is the greatest common divisor of integers 𝑚
and 𝑛 as a predicate logic expression, using ∣ and ≤.
2. A positive integer 𝑛 is perfect if it equals the sum of its own proper divisors.
(a) Show by direct calculation that the first two perfect numbers are 6 and 28.
(b) The third perfect number is 496. Verify that 496 is perfect.
Over 50 perfect numbers are known, but it is not yet known if there are infinitely
many. All known perfect numbers are even. It is a longstanding open problem to
determine if any odd perfect numbers exist.
3. If you know the day of the week on which a given date falls in a given year, you
can work out the day of the week of the same date in the next year.
Suppose the days of the week are represented by members of ℤ7 , with Sunday rep-
resented by 0, Monday by 1, and so on. Let 𝑑 ∈ ℤ7 be the day of the week of a given
date this year.
(a) Give an expression in terms of 𝑑, using the mod operation, for the day of the week
for the same date next year.
(b) How would you modify the expression if you knew there was a leap day (29 Febru-
ary) between that date in one year and the same date in the next year?
7.19 E X E R C i S E S 275
(c) Now let 𝑑 be the day of the week of 29 February in some leap year. Assuming the
next leap year is four years later, give an expression in terms of 𝑑 for the weekday of
the next 29 February, again using the mod operation.
• you are not penalised for guessing numbers that are not prime (although you don’t
get any information from non-prime guesses about the unknown prime number).
• You have a maximum of six prime guesses. So you must choose your guesses care-
fully, taking into account what you learn from previous guesses.
Think about what would make a good first guess in Primel, i.e., one which is likely to
give as much information as possible about the unknown prime number.
(a) Are there any prohibitions on certain digits in certain positions, in prime numbers
in general? If so, what does this mean for the digits you might want in the five-digit
prime number you use for your first guess?
(b) The density of prime numbers — i.e., the proportion of numbers in a given interval
that are prime — decreases as numbers get higher, although the pattern is a bit
irregular and unpredictable. This effect is present even among five digit primes. So,
for example, there are more with first digit 1 than with first digit 5.
With this in mind, what five digits would be best to use in your first guess? Which
five-digit prime number do you recommend, as the first guess?
(c) When playing the game, try asking ChatGPT for help with choosing your next guess.
For example, if you know that the unknown prime does not have 1, 2 or 3, then
ask ChatGPT to suggest a five-digit prime number that does not contain 1, 2 or 3.
When it responds, check if the number it gives is prime or not, using an authoritative
program or a table that includes all five-digit primes (e.g., https://t5k.org/lists/
276 N U M B E R T H E O RY
7. Prove that the square of an odd number is an odd number and the square of an
even number is an even number.
8. Restate the definitions of the Caesar slide encryption and decryption functions
(Exercise 15) using the mod operation.
10. The digital sum of a positive integer 𝑛, written in standard decimal notation,
is the sum of its digits. Let’s denote it by ds(𝑛).
7.19 E X E R C i S E S 277
For example,
ds(1984) = 1 + 9 + 8 + 4 = 22.
This can be used repeatedly to compute 𝑛 mod 3: just keep computing the digital
sum of the digital sum of the digital sum … until you get just a single digit, and
then you can determine the remainder manually.
(b) A similar method works for remainders modulo 9. Explain why, briefly.
(c) The alternating digital sum of 𝑛 is obtained by alternately adding and subtract-
ing its digits, starting with addition at the right-hand end and moving to the left.
We’ll denote it by ads(𝑛).
For example,
ads(1984) = −1 + 9 − 8 + 4 = 4.
(d) Devise a technique of similar type for working out 𝑛 mod 3 from the bits of the
binary representation of 𝑛.
11. For each of the following equations, apply the method from Exercise 10(b) to
work out, by hand, the remainder mod 9 of each side. In each case, comment on what
comparing these two remainders tells you.
(a) 92 × 31 = 2847
(b) 92 × 31 = 2852
(c) 92 × 31 = 2861
12. Use the Euclidean algorithm to find the greatest common divisor of the following
pair of consecutive Fibonacci numbers: 34 and 55.
What do you notice about the other numbers found at each step of the algorithm?
Prove, by induction on 𝑛, that for all 𝑛 ≥ 3 the Euclidean algorithm uses 𝑛 − 2
subtractions to compute the gcd of the consecutive Fibonacci numbers 𝑓𝑛 and 𝑓𝑛+1 ,
278 N U M B E R T H E O RY
and that every intermediate number found during the computation is also a Fibonacci
number.
13. Use the Extended Euclidean Algorithm to show that 86 and 99 are coprime, and
to express 1 as in integer linear combination of them, and to find the inverse of 86 in ℤ99 .
14.
(a) Find 𝜑(𝑛) for all 𝑛 up to 20.
15. The modular multiplication cryptosystem works as follows. The message and
cypher spaces are each the set of all strings over the 26-letter English alphabet. Each
letter is treated as a member of ℤ26 , with a,b,…,z being 0,1,…,25, respectively. The key
is 𝑘 ∈ ℤ26 . A message 𝑚 is encrypted using the key 𝑘 to produce cyphertext 𝑐 as follows,
where 𝑚𝑖 is the 𝑖-th letter of the message and 𝑐𝑖 is the 𝑖-th letter of the cyphertext:
(a) This definition of keyspace and encryption is not quite correct as it stands. Not
all keys work properly. What restriction must be placed on members of ℤ26 so that
they can work properly as keys for modular multiplication? Specify the keyspace using
appropriate mathematical notation.
(c) How would modular multiplication work in general, for an arbitrary alphabet size 𝑛?
Define its keyspace, encryption function and decryption function.
16. Compute
(a) If a positive integer 𝑎 is a primitive root of 11, what is the least 𝑘 such that 𝑎 𝑘 ≡ 1
(mod 11)?
8 The base here is the year in which the Gregorian calendar was introduced in Britain. The exponent is
⌊1070 𝜋⌋. But you do not need this information to do the computation.
7.19 E X E R C i S E S 279
(b) Show that 2 is a primitive root of 11 by working out its powers 2𝑖 for as far as
necessary. You can do this by repeatedly multiplying by 2, taking remainders mod 11
as you go.
(c) Study the exponents 𝑖 for each 2𝑖 in your list of powers from (b). Which of these
exponents is coprime to 𝜑(11)?
(d) Using (c), list all the primitive roots of 11.
18. Alice and Bob are using the Diffie-Hellman scheme to agree on a key. Their public
global parameters are prime 𝑝 = 11 and primitive root 𝑎 = 7. Their public numbers are
𝑦𝐴 = 2, 𝑦𝐵 = 8.
Play the role of the cryptanalyst: find their private numbers and the shared key they
each compute.
19. You have been happily using the Diffie-Hellman scheme, with modulus 𝑝 = 17
and base 𝑎 = 7, and these parameters have met your (rather limited) security needs.
However your General Manager has decided that “bigger is better” and that, from now
on, you and your co-workers will be using 𝑝 = 18 (with appropriate choice of 𝑎).
(b) What are the security implications of the change? Try to be reasonably precise,
e.g. by roughly estimating the percentage increase/decrease in the time a cryptanalyst
would have to spend on some sort of exhaustive attack.
20. Explain how the security of the Diffie-Hellman scheme will be affected if user A
selects private key 𝑥𝐴 = 𝑝 − 1 (where 𝑝 is the prime modulus used)
21. Devise a variant of the Diffie-Hellman scheme to enable three people A, B and C
to arrive at a common secret key, subject to the following conditions:
• they initially have no secret information in common;
• the common key itself is not sent anywhere by anyone;
• it is hard for an eavesdropper, or anyone only knowing public information, to
determine the common key;
• it is impossible for any one or two of A, B, C to find this common key without the
cooperation of the other(s) (i.e., if the other tells them nothing and has no public
information).
8
C O U N T I N G & C O M B I N AT O R I C S
Suppose we have a list of 𝑟 numbers and a list of 𝑐 numbers. In total, the two lists
contain 𝑟 + 𝑐 numbers. If each number requires 8 bytes, then the two lists together
require 8(𝑟 + 𝑐) bytes.
Suppose now we want to add up all the numbers in both lists. We can do this using
two loops, one after the other (so the first loop finishes before the second one starts).
sum = 0
1 In terms of its contents, a file is just a string of symbols. The term “file” refers to the way it is stored and
accessed within the computer; the term itself says nothing about its internal structure.
281
282 C O U N T i N G & C O M B i N AT O R i C S
This computation requires one addition for each 𝑖 and one addition for each 𝑗. Therefore
we have 𝑟 + 𝑐 additions altogether.
In general, if you are to choose one item from two disjoint sets of options, then the
number of choices available to you is the number of options in the first set plus the
number of options in the second set.
We met this principle in § 1.12, when we saw that the size of a disjoint union of sets
is the sum of the sizes of the sets, see (1.14).
8.2𝛼 C O U N T i N G B Y M U LT i P L i C AT i O N
Suppose you want to store a table of numbers in memory. If the table has 𝑟 rows and
𝑐 columns, then you must store 𝑟 × 𝑐 numbers. If each number requires, say, 8 bytes of
storage, then the table needs 𝑟 × 𝑐 × 8 bytes, or 8𝑟𝑐 bytes for short.
Suppose now that you want to add up all the numbers in the table. We again use
two loops, but this time they are nested rather than separate.
sum = 0
for each 𝑖 from 1 to 𝑟
for each 𝑗 from 1 to 𝑐
sum := sum + (entry in the 𝑖-th row and 𝑗-th column in the table)
This computation requires one addition for each pair (𝑖, 𝑗), and therefore 𝑟𝑐 additions
altogether.
In each case here — whether we are determining storage requirements or counting
additions — we are interested in the number of pairs (𝑖, 𝑗) where 𝑖 ∈ {1, 2, … , 𝑟} and
𝑗 ∈ {1, 2, … , 𝑐}. There are 𝑟 choices for the first member of a pair, and 𝑐 choices for
the second member of a pair. Crucially, these choices are independent, meaning that
the particular choice of the first member of the pair has no effect on the number of
options there are for the second member of the pair, and vice versa. In such cases,
the independence of the choices means that the numbers of the separate choices are
multiplied.
We have seen this principle before, in § 1.14: see (1.19). The size of a Cartesian
product is just the product of the sizes of the sets being combined. In the above examples,
the pairs we are counting are precisely the members of the Cartesian product
where the first set has 𝑟 members and the second set has 𝑐 members, so the number of
them is 𝑟𝑐.
8.3 i N C L U S i O N - E X C L U S i O N 283
8.3 iNCLUSiON-EXCLUSiON
In § 8.1𝛼 , we considered the size of a disjoint union of sets. Determining the size of a
union of sets requires more care when the sets are not disjoint.
If 𝐴 ∩ 𝐵 ≠ ∅ then |𝐴 ∪ 𝐵| is no longer given by |𝐴| + |𝐵|, because |𝐴| + |𝐵| double-
counts everything in the intersection 𝐴 ∩𝐵. So we have to subtract |𝐴 ∩𝐵| once, so that
its members are counted just once instead of twice:
Example
How many entries in the Monash library catalogue (https://www.monash.edu/library)
contain at least one of “Babbage” and “Lovelace”?
Let 𝐵 be the set of entries containing “Babbage”, and let 𝐿 be the set of entries
containing “Lovelace”. By doing a Basic Search just for “Babbage” (no need for quotes;
we only use them here to identify the exact search term used), you may find that
If you enter both terms “Babbage” and “Lovelace” in the search field (separated by a
space, and without quotes), you get the number of items containing both terms:
|𝐵 ∩ 𝐿| = 3, 263.
You can now find the number of items containing at least one of these terms.
In the Monash library catalogue, you can actually check this answer using an Advanced
Search, which enables you to combine searches using any of AND, OR and NOT.2 But
there are many other search tools for which union is either unavailable or significantly
harder than intersection. This makes sense, since searching for two alternative terms
involves two searches of the entire database (once for each term), whereas searching for
2 But hang on, NOT is a unary operation, not a binary operation! Try using the NOT operation to combine
two catalogue searches, which could be using the two searches we have used here or others you are interested
in, and determine which set operation they mean by NOT.
284 C O U N T i N G & C O M B i N AT O R i C S
joint occurrences of two terms (i.e., where both appear in the same item) can be done
by doing just one search of the entire database followed by a second search restricted
to the items found in the first search (with this second search usually being much more
efficient than the first search, since by then there is much less data to sift through).
Now consider the size |𝐴 ∪ 𝐵 ∪ 𝐶| the union of three sets, 𝐴, 𝐵 and 𝐶. Again, just
adding the sizes of these three sets, obtaining |𝐴| + |𝐵| + |𝐶|, overcounts elements that
belong to more than one set.
We can try to compensate by subtracting |𝐴 ∩ 𝐵|, |𝐴 ∩ 𝐶| and |𝐵 ∩ 𝐶|, i.e., the sizes of
the pairwise intersections. This corrects the overcounting of the elements that belong
to exactly two of the three sets. But it over-corrects the overcounting of the elements
that belong to all three sets! This is because any element that belongs to all three of 𝐴,
𝐵 and 𝐶 also belongs to all three of the pairwise intersections 𝐴 ∩ 𝐵, 𝐴 ∩ 𝐶 and 𝐵 ∩ 𝐶.
The net result of this is that everything is now correctly counted except the elements of
𝐴 ∩ 𝐵 ∩ 𝐶, which are not counted at all! So we adjust by adding the size of that triple
intersection, |𝐴 ∩ 𝐵 ∩ 𝐶|, and then everything is counted exactly once as required. The
upshot of this is
Example
Now let’s determine the number of Monash library catalogue entries containing at
least one of “Babbage”, “Lovelace” and “Turing”. Let the sets 𝐵 and 𝐿 be as before, and
let 𝑇 be the set of entries containing “Turing”. We found |𝐵|, |𝐿| and |𝐵 ∩𝐿| earlier, and
we will use them again shortly. Further queries with Basic Search tell us that
Therefore, using (8.2), the number of entries containing at least one of these three terms
is given by
Notice, in (8.2), how the signs alternate according to the number of intersecting
sets: we add the sizes of single sets, subtract the sizes of pairwise intersections, and add
the sizes of the triple intersections.
This alternation persists in expressions for the sizes of the unions of arbitrary num-
bers of sets in terms of the sizes of all possible intersections of them. Suppose the sets
are 𝐴1 , 𝐴2 , … , 𝐴𝑛 . Then the general expression has the form
Theorem 42.
42 For all 𝑛,
𝑛
|𝐴1 ∪ 𝐴2 ∪ ⋯ ∪ 𝐴𝑛 | = (−1)𝑘+1 ⋅ (sum of sizes of all intersections of 𝑘 sets) (8.4)
𝑘=1
Base case:
When 𝑛 = 1, there is only one set, 𝐴1 , and the left and right sides of (8.4) are
both |𝐴1 |. So the equation holds in this case.
Inductive step:
Let 𝑛 ≥ 1. Assume that (8.4) holds. (This is our Inductive Hypothesis.)
Now consider the size of the union of 𝑛 + 1 sets, i.e., |𝐴1 ∪ 𝐴2 ∪ ⋯ ∪ 𝐴𝑛+1 |.
286 C O U N T i N G & C O M B i N AT O R i C S
We first need to relate this somehow to the size of the union just of 𝑛 sets. To this
end, we can start by relating the union of 𝑛 + 1 sets to the union of the first 𝑛 sets:
𝐴1 ∪ 𝐴2 ∪ ⋯ ∪ 𝐴𝑛+1 = 𝐴1 ∪ 𝐴2 ∪ ⋯ ∪ 𝐴𝑛 ∪ 𝐴𝑛+1
We can view this as a union of two sets, namely 𝐴1 ∪ 𝐴2 ∪ ⋯ ∪ 𝐴𝑛 and 𝐴𝑛+1 . And we
already know how to find the size of the union of two sets.
|𝐴1 ∪ 𝐴2 ∪ ⋯ ∪ 𝐴𝑛+1 |
= |(𝐴1 ∪ 𝐴2 ∪ ⋯ ∪ 𝐴𝑛 ) ∪ 𝐴𝑛+1 |
= |𝐴1 ∪ 𝐴2 ∪ ⋯ ∪ 𝐴𝑛 | + |𝐴𝑛+1 | − |(𝐴1 ∪ 𝐴2 ∪ ⋯ ∪ 𝐴𝑛 ) ∩ 𝐴𝑛+1 |
(by (8.1))
= |𝐴1 ∪ 𝐴2 ∪ ⋯ ∪ 𝐴𝑛 | + |𝐴𝑛+1 | − |(𝐴1 ∩ 𝐴𝑛+1 ) ∪ (𝐴2 ∩ 𝐴𝑛+1 ) ∪ ⋯ ∪ (𝐴𝑛 ∩ 𝐴𝑛+1 )|
(by the distributive law). (8.5)
This is progress: instead of the size of a union of 𝑛 + 1 sets, we have two occurrences of
a size of a union of 𝑛 sets, and we can use the Inductive Hypothesis on each of these.
Note also that the first of these occurrences is added, while the second is subtracted.
Continuing from (8.5), we have
|𝐴1 ∪ 𝐴2 ∪ ⋯ ∪ 𝐴𝑛+1 |
𝑛
= ⒧(−1)𝑘+1 ⋅ (sum of sizes of all intersections of 𝑘 of the sets 𝐴1 , 𝐴2 , … , 𝐴𝑛 )⒭
𝑘=1
+ |𝐴𝑛+1 |
𝑛
sum of sizes of all intersections of 𝑘 of the sets
− ⒧(−1)𝑘+1 ⋅ ⒧ ⒭⒭
𝑘=1
𝐴1 ∩ 𝐴𝑛+1 , 𝐴2 ∩ 𝐴𝑛+1 , … , 𝐴𝑛 ∩ 𝐴𝑛+1
(8.6)
(by applying the Inductive Hypothesis to the first and third terms in (8.5)).
We now consider the first and third terms in (8.6). It turns out that they are closely
related, indeed complementary in a sense. To see this, we consider them each in turn.
The first term in (8.6) uses, inside the summation over 𝑘, the sum of the sizes of
all intersections of 𝑘 of the sets 𝐴1 , 𝐴2 , … , 𝐴𝑛 . It will be convenient to describe this as
the sum of the sizes of 𝑘 of the sets 𝐴1 , 𝐴2 , … , 𝐴𝑛+1 excluding 𝐴𝑛+1 . This is a bit more
long-winded, but it does reflect the context that we are now considering 𝑛 + 1 sets, not
just 𝑛 sets. So the first term in (8.6) may be written
𝑛
sum of sizes of all intersections of 𝑘 of the sets 𝐴1 , 𝐴2 , … , 𝐴𝑛 , 𝐴𝑛+1
(−1)𝑘+1 ⋅ ⒧ ⒭.
𝑘=1
excluding 𝐴𝑛+1
8.3 i N C L U S i O N - E X C L U S i O N 287
Now consider the second term in (8.6). What can we say about the intersection of 𝑘
of the sets 𝐴1 ∩ 𝐴𝑛+1 , 𝐴2 ∩ 𝐴𝑛+1 , … , 𝐴𝑛 ∩ 𝐴𝑛+1 ? The intersection of any two of these sets
is really an intersection of three sets (including 𝐴𝑛+1 ):
We can take the negation inside the sum, so that the whole sum is now added (instead of
substracted) but the coefficient of the sum of sizes is −(−1)𝑘+1 , which may be rewritten
as (−1)𝑘+2 . Then it equals
𝑛
sum of sizes of all intersections of 𝑘 + 1 of the sets 𝐴1 , 𝐴2 , … , 𝐴𝑛 , 𝐴𝑛+1
(−1)𝑘+2 ⋅ ⒧ ⒭.
𝑘=1
including 𝐴𝑛+1
Here, the number of sets being intersected is 𝑘 + 1, and this is also the exponent of −1
in the coefficient. The range of values that 𝑘 +1 can take, in this sum, is from 2 to 𝑛 +1
(since 𝑘 ranges from 1 to 𝑛). Writing our sum in terms of 𝑘 + 1 rather than 𝑘 gives
𝑛+1
sum of sizes of all intersections of 𝑘 + 1 of the sets 𝐴1 , 𝐴2 , … , 𝐴𝑛 , 𝐴𝑛+1
(−1)(𝑘+1)+1 ⋅ ⒧ ⒭.
𝑘+1=2
including 𝐴𝑛+1
This sum covers all possible numbers of sets, 𝑘 + 1, except for the case of just a single
set (𝑘 + 1 = 1).
To enable easier comparison with the first term in (8.6), it will help if we consistently
use 𝑘 for the number of sets being intersected. So we will replace 𝑘 + 1 by 𝑘 throughout
the last sum in the previous paragraph. Again, the sum goes from 2 to 𝑛 + 1 instead of
from 1 to 𝑛. So the expression becomes
𝑛+1
sum of sizes of all intersections of 𝑘 of the sets 𝐴1 , 𝐴2 , … , 𝐴𝑛 , 𝐴𝑛+1
(−1)𝑘+1 ⋅ ⒧ ⒭.
𝑘=2
including 𝐴𝑛+1
288 C O U N T i N G & C O M B i N AT O R i C S
Let us now plug the expressions we have derived back into into (8.6). We have
|𝐴1 ∪ 𝐴2 ∪ ⋯ ∪ 𝐴𝑛+1 |
𝑛
sum of sizes of all intersections of 𝑘 of the sets 𝐴1 , 𝐴2 , … , 𝐴𝑛 , 𝐴𝑛+1
= ⒧(−1)𝑘+1 ⋅ ⒧ ⒭⒭
𝑘=1
excluding 𝐴𝑛+1
+ |𝐴𝑛+1 |
𝑛+1
sum of sizes of all intersections of 𝑘 of the sets 𝐴1 , 𝐴2 , … , 𝐴𝑛 , 𝐴𝑛+1
+ (−1)𝑘+1 ⋅ ⒧ ⒭
𝑘=2
including 𝐴𝑛+1
(8.7)
At first glance, it may look like the first and third terms together cover all possible
intersections of any number of the sets 𝐴1 , 𝐴2 , … , 𝐴𝑛+1 , since the first term deals with
those intersections that don’t use 𝐴𝑛+1 and the third term deals with those that do use
𝐴𝑛+1 . But there are some differences.
• The first term covers 𝑘 = 1, whereas the third term does not. Because 𝐴𝑛+1 is
excluded in the first term, it only gives |𝐴1 | + |𝐴2 | + ⋯ + |𝐴𝑛 | and does not include
the size of the last set. This is ok, though, because the size of the last set, |𝐴𝑛+1 |,
has its own term, namely the second term (which we have hardly mentioned, but
now it plays its small part). So the 𝑘 = 1 contribution from the first term, plus
the second term, account for the sum of the sizes of all single sets, and we don’t
need any contribution from the third term for these, which is just as well.
• The third term covers 𝑘 = 𝑛 + 1, whereas the first term does not. But the only
way we can take 𝑛 + 1 sets from the list 𝐴1 , 𝐴2 , … , 𝐴𝑛+1 is to take all of them, and
this means that we must necessarily include 𝐴𝑛+1 . So this is entirely taken care of
by the 𝑘 = 𝑛 + 1 contribution from the third term; the first term does not include
𝑘 = 𝑛 + 1 so it does not interfere.
|𝐴1 ∪ 𝐴2 ∪ ⋯ ∪ 𝐴𝑛+1 |
𝑛+1
sum of sizes of all intersections of 𝑘 of the sets 𝐴1 , 𝐴2 , … , 𝐴𝑛 , 𝐴𝑛+1
= (−1)𝑘+1 ⋅ ⒧ ⒭
𝑘=1
regardless of whether they include or exclude 𝐴𝑛+1
𝑛+1
= (−1)𝑘+1 ⋅ ⒧sum of sizes of all intersections of 𝑘 of the sets 𝐴1 , 𝐴2 , … , 𝐴𝑛 , 𝐴𝑛+1 ⒭.
𝑘=1
Conclusion:
Therefore, by Mathematical Induction, (8.4) holds for all 𝑛 ∈ ℕ.
8.4 i N C L U S i O N - E X C L U S i O N : D E R A N G E M E N T S 289
Theorem 43.
43 For all 𝑛,
𝑛
|𝐴1 ∩ 𝐴2 ∩ ⋯ ∩ 𝐴𝑛 | = (−1)𝑘+1 ⋅ (sum of sizes of all unions of 𝑘 sets) (8.8)
𝑘=1
(𝑛 − 1)𝑛 .
290 C O U N T i N G & C O M B i N AT O R i C S
• In the standard Enigma cypher machine used by the Nazi regime’s army during the
Second World War, at each position in the message the function sending plaintext
letters to cyphertext letters (in the same alphabet) was a fixed-point-free bijection
on the alphabet.
In this situation, we cannot just count the options available for each 𝑥 ∈ 𝐴 and then
multiply them as if they are independent. The problem with that is that, if 𝑓(𝑥) = 𝑦
and 𝑤 ≠ 𝑥 then we must have 𝑓(𝑤) ≠ 𝑦, else 𝑓 is not a bijection. So the choices we make
for each 𝑥 ∈ 𝐴 interfere with each other. So we need another approach. This is where
inclusion-exclusion will prove useful.
We start by taking a complementary view of the problem. This is a general problem-
solving strategy. It won’t work in every situation, but it is worth keeping in mind.
We have
We already know the total number of bijections from 𝐴 to 𝐴: this is just 𝑛!, as we saw
in § 2.12. Therefore
# bijections with no fixed point = 𝑛! − # bijections with at least one fixed point.
(8.9)
So, to count fixed-point-free bijections, we first count bijections with at least one fixed
point.
For convenience, denote the elements of 𝐴 by 𝑎1 , 𝑎2 , … , 𝑎𝑛 , so that
𝐴 = {𝑎1 , 𝑎2 , … , 𝑎𝑛 }.
For each 𝑖 ∈ {1, 2, … , 𝑛}, let 𝐴𝑖 be the set of all bijections on 𝐴 that fix 𝑎𝑖 . So the set of
all bijections that fix at least one element of 𝐴 is
𝐴1 ∪ 𝐴2 ∪ ⋯ 𝐴𝑛 .
3 Since 𝐴 is finite, any injection 𝑓 ∶ 𝐴 → 𝐴 is also a surjection, and vice versa. (See the end of § 2.7, on p. 50.)
So, if we count fixed-point-free injections (or surjections) from a finite set to itself, then we will really be
counting fixed-point-free bijections anyway.
8.4 i N C L U S i O N - E X C L U S i O N : D E R A N G E M E N T S 291
|𝐴1 ∪ 𝐴2 ∪ ⋯ 𝐴𝑛 |.
This is a job for the Inclusion-Exclusion principle, in the form of Theorem 42. To apply
that theorem, we will need to work out, for each 𝑘, the sum of the sizes of all intersections
of 𝑘 of the sets 𝐴1 , 𝐴2 , … , 𝐴𝑛 .
Consider, then, what an intersection of 𝑘 of the sets 𝐴1 , 𝐴2 , … , 𝐴𝑛 looks like.
Let’s start with 𝑘 = 1. Then we have just a single set, say 𝐴𝑖 . This is the set of all
bijections on 𝐴 that fix 𝑎𝑖 . How many such bijections are there? If 𝑓 is one of these
bijections, then its value on 𝑎𝑖 is determined by the fact that it fixes 𝑎𝑖 , so for 𝑓(𝑎𝑖 ) we
have no choice: it must equal 𝑎𝑖 . Then none of the other elements of 𝐴 can be mapped
to 𝑎𝑖 , because then 𝑓 would not be a bijection. So 𝑓 maps the elements of 𝐴 ∖ {𝑎𝑖 } to
𝐴 ∖ {𝑎𝑖 }, and in fact it must be a bijection on that set, otherwise it can’t be a bijection
on 𝐴. So the number of bijections on 𝐴 that fix 𝑎𝑖 is just the number of bijections on
𝐴 ∖ {𝑎𝑖 }, and there are (𝑛 − 1)! of these, since |𝐴 ∖ {𝑎𝑖 }| = 𝑛 − 1. Hence
There are 𝑛 of these sets, and all have the same size, so the sum of the sizes of all the
sets 𝐴𝑖 is given by
Now consider what happens in general, for arbitrary 𝑘. Consider the intersection of
the first 𝑘 sets,
𝐴1 ∩ 𝐴 2 ∩ ⋯ 𝐴 𝑘 .
This contains those bijections on 𝐴 that fix 𝑎1 and also fix 𝑎2 and also 𝑎3 and so on up
to 𝑎𝑘 . So, we want to count bijections 𝑓 ∶ 𝐴 → 𝐴 that fix every 𝑎𝑖 with 1 ≤ 𝑖 ≤ 𝑘. The
values of such a bijection 𝑓 on 𝑎1 , 𝑎2 , … , 𝑎𝑘 is completely determined by the requirement
that it fixes those elements. So it remains to consider what 𝑓 does on the other elements
of 𝐴, namely 𝑎𝑘+1 , 𝑎𝑘+2 , … , 𝑎𝑛 . Now 𝑓 cannot map any of these elements to the fixed
points 𝑎1 , 𝑎2 , … , 𝑎𝑘 , else it would not be a bijection, because each of those elements is
already mapped to by itself. So 𝑓 has to map the set {𝑎𝑘+1 , 𝑎𝑘+2 , … , 𝑎𝑛 } into itself.
Furthermore, it has to map this set onto itself too, else it isn’t a surjection. So, in fact,
the restriction of 𝑓 to {𝑎𝑘+1 , 𝑎𝑘+2 , … , 𝑎𝑛 } must be a bijection on that set, and it can be
any bijection on that set at all. So, counting bijections that fix 𝑎1 , 𝑎2 , … , 𝑎𝑘 is the same
as just counting bijections on {𝑎𝑘+1 , 𝑎𝑘+2 , … , 𝑎𝑛 }, which is a set of size 𝑛 − 𝑘, so there
are (𝑛 − 𝑘)! bijections on this set. So we have
|𝐴1 ∩ 𝐴2 ∩ ⋯ 𝐴𝑘 | = (𝑛 − 𝑘)!.
292 C O U N T i N G & C O M B i N AT O R i C S
This is just one of the many possible intersections of 𝑘 of these sets. Since no element of
𝐴 has any special role, and since their names do not matter, the size of the intersection
of 𝑘 of them is always the same. We could pick any 𝑘 distinct elements of 𝐴, say
where 𝑖1 , 𝑖2 , … , 𝑖𝑘 are any 𝑘 distinct elements of 𝐴. Regardless of our choice, we still have
Note that the special case 𝑘 = 1 agrees with the expression for that case we derived
above, (8.10).
The number of ways of choosing 𝑘 of the 𝑛 sets 𝐴1 , 𝐴2 , … , 𝐴𝑛 is ⒧𝑛𝑘⒭. So
𝑛
sum of the sizes of all intersections of 𝑘 of the sets = ⒧ ⒭(𝑛 − 𝑘)!.
𝑘
Since the left-hand side here is the number of bijections on 𝐴 that fix at least one member
of 𝐴, using (8.9) gives
𝑛
𝑛
# fixed-point-free bijections = 𝑛! − (−1)𝑘+1 ⒧ ⒭(𝑛 − 𝑘)!.
𝑘=1
𝑘
𝑛 𝑛!
⒧ ⒭(𝑛 − 𝑘)! = ⋅ (𝑛 − 𝑘)! (see § 1.10, especially (1.6))
𝑘 𝑘! (𝑛 − 𝑘)!
𝑛!
= .
𝑘!
8.5 S E L E C T i O N 293
So
𝑛
𝑛!
# fixed-point-free bijections = (−1)𝑘 ⋅ .
𝑘=0
𝑘!
We have been focusing on the number of fixed-point-free bijections. But, often, we
don’t just want to know the number of structures of interest; we may also want to know
what proportion they are, out of all possible structures. In this case, we may ask, what
proportion of all bijections on 𝐴 are fixed-point-free? Since there are 𝑛! bijections on 𝐴
altogether, the proportion that are fixed-point-free is
# fixed-point-free bijections 1 𝑛 𝑛!
= (−1)𝑘 ⋅
total # bijections 𝑛! 𝑘=0 𝑘!
𝑛
1
= (−1)𝑘 ⋅ .
𝑘=0
𝑘!
It is interesting that 𝑛 no longer appears inside the sum; its only role is to determine how
many terms (𝑛 + 1, in fact) must be added up. Note also that these terms rapidly get
smaller and smaller, and furthermore they alternate in sign. You can see the structure
by writing the sum out:
1 1 1 1 1
1− + − + − ⋯ + (−1)𝑛 .
1! 2! 3! 4! 𝑛!
This is actually the first 𝑛 + 1 terms of the standard infinite series for 𝑒−1 , where as
usual 𝑒 = 2.71828 … is the base of natural logarithms. So, as 𝑛 → ∞, the proportion of
bijections on a set of size 𝑛 that are fixed-point-free converges to
𝑒−1 = 0.367989 … .
In other words, if you choose a bijection at random from all bijections on 𝑛 elements,
with all bijections equally likely, then the chance that it has no fixed points converges to
about 36.8% as 𝑛 → ∞. The convergence is rapid, so this proportion gives a very useful
approximation even for moderate-sized 𝑛.
At this point, it is worth revisiting Exercises 2.9 and 2.10.
8.5 SELECTiON
There are many situations where we want to choose 𝑟 objects from a set of 𝑛 objects.
For example:
(c) requesting 𝑟 meals for an event, from a catering menu containing 𝑛 meal options;
294 C O U N T i N G & C O M B i N AT O R i C S
Counting the number of ways of making these selections depends on the specific nature
of the task.
• ordered selection (as in (a) and (b)) versus unordered selection (as in (c) and
(d));
• selection with replacement (as in (a) and (c)) versus selection without replacement
(as in (b) and (d)).
Given a set 𝐴 of size 𝑛, suppose we make a sequence of 𝑟 choices from 𝐴, where each
choice can be any member of 𝐴. So:
• Choosing a member of 𝐴 does not stop it being chosen again later. In other words,
if a choice takes an element from 𝐴, we can think of that element as being replaced,
in 𝐴, by a copy of itself, so that the element is still available for future choices.
This is why we say that our selection is done with replacement.
𝐴 ×𝐴 ×⋯×𝐴.
𝑛 copies of 𝐴
Suppose that we are again making a sequence of 𝑟 choices from 𝐴, but that now we
cannot repeat earlier choices. So:
• Our selection is ordered, as in § 8.6,
• When we come to the 𝑘-th choice, we have previously made 𝑘 − 1 choices, all of
which are now forbidden.
• For every choice we make, the number of available choices remaining is reduced
by 1.
Therefore, the number of ways of choosing 𝑟 objects, in order and without replacement,
from a set of 𝑛 objects is
𝑛
⋅ (𝑛 − 1) ⋅ ⋯ ⋅ (𝑛 − 𝑟 + 1) .
𝑟 factors
This is often called the falling factorial and denoted by (𝑛)𝑟 . It is also sometimes
written 𝑛 𝑃𝑟 or occasionally 𝑛 𝑃𝑟 .
In § 2.12, we applied this counting method to counting injections and bijections,
where domain and codomain are finite.
When doing ordered selection in § 8.6–§ 8.7, it was convenient to treat selection with
replacement first, because it was easier and because it “paved the way” for selection
without replacement. But, for unordered selection, it turns out that selection without
replacement is easier, so we discuss it first, in this section (albeit briefly, because we
296 C O U N T i N G & C O M B i N AT O R i C S
have done it before). Then, in the next section, we will use the ideas from this section
(and § 1.10) to help us count unordered selections with replacement.
In fact, we already know how to count unordered selections without replacement,
because we did it in § 1.10. We derived expressions involving factorials and binomial
coefficients in (1.5)and (1.6). The number of unordered selections — i.e., subsets — of
𝑟 elements that can be chosen, without replacement, from a set of 𝑛 elements is given
by the binomial coefficient ⒧𝑛𝑟⒭ which can be written in a few different ways:
𝑛 𝑛! (𝑛)𝑟
⒧ ⒭ = = .
𝑟 (𝑛 − 𝑟)! 𝑟! 𝑟!
When choosing without replacement (as in the previous section), each member of our
set 𝐴 of size 𝑛 can be chosen at most once. For convenience, suppose the objects in 𝐴
are numbered from 1 to 𝑛. Let 𝑥𝑖 be the number of times the 𝑖-th object in 𝐴 is chosen.
Then each 𝑥𝑖 ∈ {0, 1}, and we require that the total number of objects chosen is 𝑟:
𝑥1 + 𝑥2 + ⋯ + 𝑥𝑛 = 𝑟.
We saw in the previous section that the total number of choices is ⒧𝑛𝑟⒭.
Now we consider choosing with replacement. There is no longer any limit on how
many times we can choose a particular object in 𝐴, except for the overall requirement
that we make exactly 𝑟 choices. So each 𝑥𝑖 ≥ 0. The 𝑥𝑖 can be any nonnegative integer
subject to the same constraint we had earlier,
𝑥1 + 𝑥2 + ⋯ + 𝑥𝑛 = 𝑟. (8.12)
1
+ ⋯ + 1 = 𝑟,
𝑟 ones altogether
+ ⋯ + 1 1
1 +⋯+1 .
+ ⋯ + 1 ⋯ 1
𝑥1 𝑥2 𝑥𝑛
8.9 U N O R D E R E D S E L E C T i O N W i T H R E P L A C E M E N T 297
We can specify the groups by placing barriers (shown as vertical lines) between them:
𝑟 ones altogether
1
+ ⋯ + 1 1
+ ⋯ + 1 ⋯ 1
+⋯+1 .
𝑥1 𝑥2 𝑥𝑛
The understanding here is that 𝑥1 is the sum of the 1s before the first barrier, 𝑥2 is the
sum of the 1s between the first and second barriers, and so on, with 𝑥𝑖 being the sum
of the 1s between the (𝑖 − 1)-th and 𝑖-th barriers (1 ≤ 𝑖 ≤ 𝑛 − 1) and 𝑥𝑛 being the sum
of the 1s after the (𝑛 − 1)-th barrier. Note that we have no 0-th barrier and no 𝑛-th
barrier; they are not needed.4 So the number of barriers is 𝑛 − 1, i.e., one less than the
number of groups.
For example, if 𝑛 = 5 and 𝑟 = 3, then we have three ones,
3 ones
1 + 1 + 1 = 3,
So we still have a simple equation specifying the required value of a sum of 𝑛 integers, but
now the integers are all required to be positive (instead of merely nonnegative) and their
sum is now 𝑟 + 𝑛 (instead of just 𝑟). Although the equation looks different, it is really
encoding the same information. There is a bijection between sequences (𝑥1 , 𝑥2 , … , 𝑥𝑛 ) of
4 If we used them, they would be before the start and after the end, respectively. There would be no flexibility
in where they are placed.
298 C O U N T i N G & C O M B i N AT O R i C S
𝑟 + 𝑛 ones altogether
+ ⋯ + 1+1 1
1 + ⋯ + 1+1 ⋯ 1
+ ⋯ + 1+1 .
𝑦1 𝑦2 𝑦𝑛
• Each of these positions can be used at most once. So, if a position is chosen, it
cannot be chosen again. In other words, we are now choosing without replacement!
𝑟 +𝑛−1
⒧ ⒭.
𝑛−1
8.10 E X E R C i S E S 299
𝑟 +𝑛−1
⒧ ⒭.
𝑟
8.10 EXERCiSES
for 𝑖 from 1 to 𝑎 do
for 𝑗 from 1 to 𝑏 do
for 𝑘 from 1 to 𝑐 do
for 𝑙 from 1 to 𝑑 do
beep!
for 𝑚 from 1 to 𝑒 do
for 𝑛 from 1 to 𝑓 do
beep!
for 𝑜 from 1 to 𝑔 do
beep!
for 𝑝 from 1 to ℎ do
beep!
(a) Give an expression in the positive integer variables 𝑎, 𝑏, 𝑐, 𝑑, 𝑒, 𝑓, 𝑔, ℎ for the number
of times this algorithm beeps.
(b) If each of these eight variables is 𝑂(𝑛), give a big-O expression for the number of
times the algorithm beeps.
2.
(a) In an Australian Federal Election, a ballot paper for a House of Representatives
seat has 𝑛 boxes, one for each candidate. A voter must enter the numbers 1 to 𝑛, in the
order of their preference, in those boxes, with exactly one number in each box. In how
many ways can this be done?
(b) In the Senate, the ballot paper again has one box for each of 𝑁 candidates, but this
time, voters are only required to enter numbers 1 to 12, for their twelve most preferred
candidates. In how many ways can this be done?
3. Two fair dice are thrown, and each shows a number from {1, 2, 3, 4, 5, 6}. The
outcome is the ordered pair of numbers shown.
(b) How many outcomes are there in which the number on the first die is less than the
number on the second die?
(c) How many outcomes are there in which the sum of the two numbers is 7?
(d) How many outcomes are there in which both numbers are ≤ 3?
• nine “vertical” streets: Spencer St., King St., William St., Queen St., Elizabeth
St., Swanston St., Russell St., Exhibition St., and Spring St.
These divide the city centre into 4 × 8 = 32 square blocks.
Suppose you are at the corner of Flinders and Swanston Streets, having just emerged
from Flinders Street Station. You can’t make up your mind whether to visit Federation
Square, St Paul’s Cathedral or Young & Jackson’s Pub. So you decide to go for a walk,
always staying on one of these main streets, never turning back, and always going further
away from your starting point.
(a) How many routes are there, following these rules, from your starting point to the
corner of William and La Trobe Streets?5
Express your answer using one of the expressions we have used for counting, then
work out its numerical value.
(b) How many routes of this kind are there from your starting point which do not meet
Russell Street and which take you four block-lengths away? Give the total as well as the
number of routes to each intersection that lies at this distance from the start. Comment
on the relationship among these numbers and how they relate to some counting you did
earlier in semester.
(c) How many ways are there of walking from the south-west corner (Flinders and
Spencer Streets) to the north-east corner (Spring and La Trobe Streets) using a shortest
possible route?
5. Using only Basic Searches (with no quotation marks or special commands) and
the inclusion-exclusion principle, determine how many Monash Library catalogue entries
contain at least one of the three terms CSIRAC, SILLIAC and WREDAC.
5 While you are there, you can see Russell’s Old Corner Shop (now closed), which is described as Melbourne’s
oldest residential building, and then take a short walk in Flagstaff Gardens to see the site of Flagstaff
Observatory (1858).
8.10 E X E R C i S E S 301
6.
(a) Use the inclusion-exclusion principle to determine how many positive integers ≤ 100
are not multiples of 3, 5 or 7.
(b) Is there another simple way of working this out, that you can use to check your
answer?
8. Determine the numbers of 𝑛-digit positive integers that have their digits
9. The card game poker uses a standard deck of 52 cards, divided into four suits
♠, ♣, ♢, ♡ of 13 cards each, with the cards within each suit having a rank
2, 3, 4, 5, 6, 7, 8, 9, 10, J, K, Q, A,
where J,Q,K,A stand for Jack, Queen, King, Ace, respectively. These designations may
all be regarded as numbers, so J,Q,K,A represent 11,12,13,14 respectively.
A unique property of the Ace is that it can also stand for 1. So it can be considered
to be the predecessor of 2 as well as the successor of K. But it can’t play these low and
high roles simultaneously, in the same hand!
A poker hand consists of a set of five cards from the deck. In an actual game, there
are multiple players, each with a hand dealt from the same deck. But in this exercise
we consider just a single hand, in isolation.
Five cards in a poker hand are consecutive if their ranks are in numerical sequence.
Here, the Ace can either be the first of five consecutive lowest cards, A,2,3,4,5, or the
last of the five consecutive highest cards, 10,J,Q,K,A. But there is no wrap-around,6 so,
for example, the five cards Q,K,A,2,3 are not consecutive. This is because, as mentioned
above, the Ace cannot play its low and high roles simultaneously.
How many poker hands are there of each of the following types:
(e) flush:
flush all five cards have the same suit, but their ranks are not all consecutive
(although some of them might be);
(f) straight:
straight five consecutive cards, not all in the same suit (although some of them
will be);
(j) nothing:
nothing a hand of none of the special types (b)–(i) listed above, so its five cards
are all of different ranks, their ranks are not all consecutive, and the cards are not
all of the same suit.
(e) every digit is different to its predecessor, but other repetitions are allowed;
(f) all digits are in strictly ascending order (so each digit is numerically < its successor);
(g) all digits are in nondecreasing order (so each digit is numerically ≤ its successor);
(h) the ordering of digits is monotonic, meaning that it’s either increasing or decreasing
(although not necessarily strictly so); in other words, no internal digit is greater than
both its neighbours, and no internal digit is less than both its neighbours;
(l) no digit appears in its own position (i.e., the first digit cannot be 1, the second digit
cannot be 2, and so on);
(m) in addition to the previous restriction (l), all the digits are different.
• from the set of 𝑥-sequences of nonnegative integers satisfying the constraints given
there (including (8.12))
• to the set of 𝑦-sequences of positive integers satisfying the constraints given there
(including (8.13)).
9
DISCRETE PROBABILITY I
305
306 DiSCRETE PROBABiLiTY i
9.1𝛼 T H E N AT U R E O F R A N D O M N E S S
Suppose someone tosses a coin. You do not know, in advance, whether it will turn up
Heads or Tails, i.e., which face will be uppermost once it has landed.1 If the coin is fair
—meaning that it has no bias towards either outcome, so each is equally likely — then
the probability that it comes up Heads is 12 , and the probability that it comes up Tails
is 12 too. We can say that the event that the coin comes up Heads has probability 12 ,
which we write mathematically as
1
Pr(coin comes up Heads) = ,
2
and similarly,
1
Pr(coin comes up Tails) = .
2
These events take place in a setting where randomness is at play, which is why we
describe them using probabilities rather than just by propositions which can only be
true or false.
The nature and source of randomness is a surprisingly deep topic and a lot has
been written about it. Tossing a coin is often used as an introductory example of a
simple random experiment, but where does the randomness come from? During the
tossing process, it is subject to physical forces from the tossing hand, the movement of
the air through which it passes, and gravity. If we do sufficiently careful and precise
measurements of the coin, its environment and the forces acting upon it, it may be
possible in principle to determine, accurately enough, its movement during the period
of the toss until it comes to rest, and therefore to determine the outcome of the toss. Of
course, this is usually impractical. So, instead of treating it as a deterministic process
(which it seems to be, at least to the level of detail required to determine the outcome
1 Traditionally, Heads is the head of the monarch whose face would historically adorn one side of a coin;
Tails, being the opposite of Heads, indicates the other side of the coin. Not many coins would have actual
“tails” depicted on the other side. One exception is the Australian penny from 1938 to the introduction of
decimal currency in 1966, which had a kangaroo, including its long tail, on the other side. So Tails could
be interpreted more literally in those days.
9.2𝛼 P R O B A B i L i T Y 307
of the toss with high confidence), we say that it is too complex to treat deterministically
and model it instead as a simple random process.
This illustrates one common source of randomness: processes that are actually de-
terministic, but for which deterministic models are infeasible. You might say that ran-
domness is a cloak for our computational shortcomings!
That is not to say that all physical processes that we treat as random are really
just overly complex deterministic processes. In some physical processes, we have no
deterministic concept (however complex) to explain them, so we treat them as random.
Again, randomness might be a cloak, but for our ignorance rather than our computa-
tional shortcomings. Or, for some physical processes, we might think that they really
are random in some deep fundamental sense. Quantum mechanics is often interpreted
as treating the measurement of physical systems as inherently random.
In other settings, the role of randomness in expressing our ignorance is very natural.
For example, a coin might be lying on the ground, some distance away, and you cannot
see whether it shows Heads or Tails. There is no random experiment here; either the
coin shows Heads, or it shows Tails, and someone close to it may know which one it is,
but you don’t know yet. So you may model your knowledge of the state of the coin by
saying that its probability of showing Heads is 12 .
We will focus on randomness as an aspect of processes that produce outcomes we
are interested in. For such processes, we will often describe the outcomes they produce
as random, too.
9.2𝛼 PROBABiLiTY
Informally, the probability of something is a real number in the unit interval [0,1] that
measures how likely it is, where something that’s impossible has probability 0 and some-
thing that’s certain has probability 1.
Probability is usually considered in the context of an experiment. Here, an experiment
is a process that is at least partly random and can give rise to any outcome from some
set of all possible outcomes. The set of all possible outcomes is called the sample space.
space
We can view this as just a universal set.
An event is just a subset of the sample space.
The simplest situation in which we can define probability is when the sample space
𝑈 is finite and all its elements are equally likely. In this scenario, the probability Pr(𝐴)
of an event 𝐴 ⊆ 𝑈 is given by
|𝐴|
Pr(𝐴) = . (9.1)
|𝑈|
So the probability of 𝐴 is just the proportion of members of 𝑈 that belong to 𝐴. If
you choose a member of 𝑈 at random, with an equal chance of choosing each one, then
Pr(𝐴) measures how likely your chosen element is to belong to 𝐴.
308 DiSCRETE PROBABiLiTY i
How do you work this out? You just need to work out the sizes of the sets. To do
this, you can use all the techniques of counting that we have discussed so far, and many
others. The study of probability is intimately related to the study of counting.
We have two extreme special cases of the definition (9.1):
Pr(∅) = 0, Pr(𝑈) = 1.
We will discuss the estimation of probabilities later, but now return to defining and
exactly calculating them.
We illustrate the definitions with some familiar examples.
Suppose our random experiment is the toss of a fair coin. The two possible outcomes
are Heads and Tails, so our sample space is
𝑈 = {Heads, Tails}.
Pr(∅) = 0,
|{Heads}| 1
Pr({Heads}) = = ,
|𝑈| 2
1
Pr({Tails}) = (similarly),
2
2
Pr({Heads, Tails}) = = 1.
2
The throw of a fair die gives six equally likely outcomes, with sample space
𝑈 = {1, 2, 3, 4, 5, 6}.
|{𝑥}| 1
Pr({𝑥}) = = .
|𝑈| |𝑈|
1
Pr(𝑥) = .
|𝑈|
and calculate the probability of any event 𝐴 as the sum of the probabilities of its
elements:
Pr(𝐴) = Pr(𝑥). (9.4)
𝑥∈𝐴
In our current scenario, where every element is equally likely and has probability 1/|𝑈|,
this equation (9.4) agrees with our earlier definition (9.1). But our new equation (9.4) is
more general. We can now deal with situations where the elements of the sample space
need not all have the same probability.
310 DiSCRETE PROBABiLiTY i
We suppose, then, that the elements of the sample space each have a probability,
which must be a number in the unit interval [0,1], and that these probabilities add to 1.
The probability of an element 𝑥 ∈ 𝑈 is denoted by Pr(𝑥). We require that, for all 𝑥 ∈ 𝑈,
0 ≤ Pr(𝑥) ≤ 1,
Pr(𝑥) = 1. (9.5)
𝑥∈𝑈
We emphasise that this sum is over every element in the entire sample space, however
large or small that sample space may be.
We now define the probability Pr(𝐴) of any event 𝐴 ⊆ 𝑈 to be
This is more general than our previous definition of probability, which only applies to
finite sample spaces with all elements having the same probability. We have, again, the
extreme cases
Pr(∅) = 0, Pr(𝑈) = 1.
For example, in the board game Scrabble2 , there are 100 tiles each with an English
letter, except that two tiles are blank, and these tiles are all in a bag so that players
can choose them at random. Suppose, at the start of the game, you choose a letter
by drawing a random tile from the bag. The numbers of tiles of each type gives the
following probabilities for the letters (or blank, denoted □).
2 Scrabble is one of the most popular word games in the world, and is good for improving spelling and
vocabulary. The letter frequencies in the game are based on an empirical count from text in some particular
issues of newspapers when the game was designed, although the sample used may not have been huge.
9.2𝛼 P R O B A B i L i T Y 311
The sample space can be taken to be the set of all English letters together with the
blank, so has size 27, but this time the probabilities of its members are not all the same.
Applying our definition (9.6) gives, for example,
Pr(vowel) = Pr({A,E,I,O,U})
= Pr(A) + Pr(E) + Pr(I) + Pr(O) + Pr(U)
= 0.09 + 0.12 + 0.09 + 0.08 + 0.04
= 0.42
In fact, not only does our new definition (9.6) deal with finite sample spaces where
the probabilities are not all the same, but it can also deal with some infinite sample
spaces. Suppose our sample space is ℕ and the probability of an element 𝑛 ∈ ℕ is given
by
1
Pr(𝑛) = 𝑛 .
2
For this to be valid, we need the probabilities of all the elements in the sample space to
add up to 1, so that (9.5) is satisfied. In this case, we have
1 1 1 1 1 1 1 1 1
Pr(𝑥) = Pr(𝑛) = 𝑛
= + 2 + 3 + 4 +⋯ = + + + +⋯
𝑥∈𝑈 𝑛∈ℕ 𝑛∈ℕ
2 2 2 2 2 2 4 8 16
1
But this is just the sum of an infinite geometric series, with first term 𝑎 = 2 and common
ration 𝑟 = 12 . So, by (6.45), the sum is
1 1
𝑎 2 2
= = = 1.
1−𝑟 1 − 12 1
2
The probabilities we have assigned to the positive integers in this example are far
from uniform. Can you imagine giving all positive integers the same probability? If so,
9.3𝛼 C H O i C E O F S A M P L E S PA C E 313
what would that probability be? Would the probabilities sum to 1, as required? If not,
we cannot really call them probabilities.
If it’s too tricky to assign the same probability to all the positive integers, how
close can we get? Suppose you want to give a decreasing sequence of probabilities to
the positive integers such that they all sum to 1. What is the most slowly-decreasing
sequence you can come up with that does this, while still giving every positive integer
a positive probability?
9.3𝛼 C H O i C E O F S A M P L E S PA C E
We have seen in the previous section that, to define probabilities of events, we need to
have a sample space. Each element of the sample space must have a defined probability,
with those probabilities summing to 1. Events correspond to subsets of the sample space,
and the probability of an event is just the sum of the probabilities of its elements.
Choice of sample space is therefore fundamental. You need it to be an accurate
model of the situation you are studying, so that the probabilities of events tell you
about their likelihood in that situation.
Suppose you are playing Monopoly, where two dice are thrown and the numbers
they show are added to give the number of steps you take on that move. So the length
of your move is an integer in {2,3,…,12}. It is tempting to make this set the sample
space. But what probabilities should we assign to its elements?
With some thought or experimentation, it soon becomes clear that the elements of
{2,3,…,12} are not all equally likely, so we should not just give them each a probability
of 1/11. (Doing so would define a sample space that is valid in itself, in that it satisifes
the definition of a sample space. But the probabilities calculated from it do not align
with the actual probabilities of the various totals obtained when throwing two dice. So
these uniform probabilities are incorrect, as a model of this situation.)
To determine what the correct probabilities should be, we need to go deeper. Al-
though the data we are interested in is just the total of the numbers shown on the two
dice, the random process of throwing two dice gives a larger set of outcomes, namely a
number from {1, 2, 3, 4, 5, 6} on each die. So the full set of outcomes, from throwing two
dice, is the Cartesian product
{ 1, 2, 3, 4, 5, 6 } × { 1,
2, 3, 4, 5, 6 },
possible results possible results
from first die from second die
which is the set of all pairs of results, one from the first die and another from the second
die. It helps to visualise these outcomes in a 6 × 6 table:
314 DiSCRETE PROBABiLiTY i
We assume throughout that both dice are fair, i.e., each of the six outcomes of each
die is equally likely and has probability 1/6. We also assume throughout that the two
dice are not linked in any way; the outcome of one does not influence the outcome of
the other. It follows that all the pairs of outcomes are equally likely too, and since there
are 6 × 6 = 36 pairs of outcomes, they must have probability 1/36 each.
This gives us a more fine-grained sample space for the throw of two dice, with 36
elements (instead of just 11 for the possible totals), and it now has uniform probabilities.
(Uniformity is not an essential feature for probabilities of elements of sample spaces, but
we like it when it happens, as it makes life easier, provided it yields an accurate model.)
Furthermore, we can now calculate probabilities for the total of two dice. For exam-
ple, what is the probability that the total is 8? This corresponds to the following subset
of our new sample space, consisting of all pairs whose sum is 8:
{(2, 6), (3, 5), (4, 4), (5, 3), (6, 2)}.
Since this subset has five elements, and since each element has probability 1/36, we have
1 1 1 1 1 5
Pr(total is 8) = + + + + = .
36 36 36 36 36 36
We can do a similar calculation for every possible outcome in the set {2,3,…,12}.
It is helpful to envisage this event on our diagram of the sample space. The elements
of this event are circles in green in the next diagram:
9.3𝛼 C H O i C E O F S A M P L E S PA C E 315
We see that this event, where the total of the two dice is 8, is a diagonal of five pairs.
In fact, all the other possible totals correspond to diagonals parallel to this one. We
can see that a total of 2 has only one element, so its probability is 1/36, and that as
the totals increase, their probabilities increase too, with the probability increasing by
1/36 for each increment of the total until the total is 7, which corresponds to the longest
diagonal and maximises the probability:
6 1
Pr(total is 7) = = .
36 6
Then, as the total keeps increasing, the probability decreases at the same rate until we
reach the highest possible total, 12, with a probability of 1/36.
1 2 3 4 5 6 5 4 3 2 1
probability
36 36 36 36 36 36 36 36 36 36 36
Once we have determined these probabilities of the totals 2,3,…,12, we can then use
them along with the smaller sample space {2, 3, … , 12} to compute probabilities of events
pertaining solely to values of the total of the two dice (e.g., whether the total is 8, or
even, or prime, or ≤ 5, etc.). For example,
1 2 3 4 10 5
Pr(total is ≤ 5) = + + + = = . (9.7)
36 36 36 36 36 12
But keep in mind that:
• To work out the probabilities of the totals in the first place, we needed the larger
sample space, consisting of the 36 pairs with uniform probabilities.
316 DiSCRETE PROBABiLiTY i
Pr(the two dice show the same number) = Pr({(1, 1), (2, 2), (3, 3), (4, 4), (5, 5), (6, 6)})
6
=
36
1
= .
6
This example illustrates a more general point. While a sample space with nonuniform
probabilities is ok in principle, and indeed natural in many situations, there is often
a larger sample with uniform probabilities that underlies it in some way. Uniform
probabilities can be simpler to deal with than nonuniform ones, so a sample space with
uniform probabilities may make some probability calculations easier even if the sample
space itself is larger.
Having said that, our first priority is to use a sample space and associated probabil-
ities that model the situation at hand as accurately as possible. So we will not go for
uniform probabilities if they remove us from the reality we are trying to model.
Our earlier Scrabble example is another case where a larger sample space, with
uniform probabilities, could be used. In that example, the sample space we used was
the set of all English letters together with the blank, making 27 elements in all, and
we used nonuniform probabilities based on the numbers of tiles of each type. However,
the individual tiles themselves can be treated as elements of a sample space of size 100,
each with probability 0.01. To compute the probability of a vowel, we count up all tiles
bearing a vowel, of which there are 42 (9 × A, 12 × E, 9 × I, 8 × O, 4 × U), and divide this
by the total number of tiles, 100, to obtain
using (9.1). This particular calculation may or may not seem easier than the one we did
earlier. But the key point here is that the larger sample space underpins the one we used
earlier. The probabilities in our table, such as 0.09 for A and so on, were derived from
the observation that we had 100 equally likely tiles, of which nine were A. So, in effect,
we used the larger sample space (of individual tiles rather than letters) to work out,
and justify, the probabilities we assigned to the letters in the smaller sample space. So,
again, it is the larger sample space (with its uniform probabilities) which is “what’s really
going on”; the smaller sample space summarises those aspects of the larger space that
9.4𝛼 M U T U A L LY E X C L U S i V E E V E N T S 317
are relevant to the problem at hand, and its probabilities are obtained by calculations
from the larger sample space.
9.4𝛼 M U T U A L LY E X C L U S i V E E V E N T S
= ⒧ Pr(𝑥)⒭ + ⒧ Pr(𝑥)⒭
𝑥∈𝐴 𝑥∈𝐵
(since each 𝑥 ∈ 𝐴 ⊔ 𝐵 belongs to exactly one of 𝐴 or 𝐵)
= Pr(𝐴) + Pr(𝐵)
(again using the definition of probability, twice).
For example, if we throw two dice and ask about the probability that their sum is
≤ 5 or ≥ 10 (perhaps because we are playing monopoly and want to avoid landing on a
318 DiSCRETE PROBABiLiTY i
This important general principle may be captured in words by saying that the probability
of any disjoint union is the sum of the probabilities of the events. More succinctly,
probability is additive over disjoint unions.
Decomposing events into disjoint unions of simpler events is a very powerful tool in
probability. In fact, we have come very close to using it already. When calculating the
probability that a random Scrabble tile is a vowel using the sample space of all 100 tiles
each with probability 0.01 (p. 316 in § 9.3𝛼 ), we used our knowledge of the numbers
of tiles bearing each vowel. Converting these to probabilities, and expressing the event
“the tile is a vowel” as a disjoint union of simpler events, we have
The very definition of probability may be viewed in terms of the additivity of prob-
ability over disjoint unions. An event is just the disjoint union of all the singleton sets
9.5 O P E R AT i O N S O N E V E N T S 319
(i.e., sets of one element) obtained from the elements of the event, and its probability is
just the sum of the probabilities of those singleton sets.3
Later we will see more applications of the fact that probability is additive over
disjoint unions.
9.5 O P E R AT i O N S O N E V E N T S
Since events are just subsets of the sample space, we can apply set operations to them
to form other events. We have already seen the disjoint union in the previous section.
The complement 𝐴 of an event 𝐴 occurs precisely when 𝐴 does not occur. So 𝐴 and
its complement 𝐴 are mutually exclusive, and their union is the entire sample space. So
Pr(𝐴) + Pr(𝐴) = 1.
Therefore
Pr(𝐴) = 1 − Pr(𝐴). (9.11)
Examples:
• Some children play a pencil-and-paper “dice cricket” game where each throw of a
fair die gives the number of runs scored against one delivery, except that 5 is out.
So the probability of not being out, from one specific delivery (i.e., one throw of
the die), is
1 5
Pr(not out) = 1 − Pr(out) = 1 − =
6 6
3 Strictly speaking, the probability of a set is the sum of the probabilities of its individual elements; it is the
probabilities of the individual elements that are the starting point, and it is from them that the probabilities
of events are defined in (9.6). But, of course, the probability of a singleton set equals the probability of
its sole element. So it is true that the probability of an event is just the sum of the probabilities of all the
singleton events (i.e., singleton sets) inside it.
320 DiSCRETE PROBABiLiTY i
• In drawing one of the 100 letter tiles in Scrabble, the probability of drawing a
blank tile is 2/100, so
2 98
Pr(not drawing a blank) = 1 − Pr(drawing a blank) = 1 − = = 0.98.
100 100
More generally, suppose 𝐴 ⊆ 𝐵 and consider the set difference 𝐵 ∖ 𝐴. In this case,
𝐵 = 𝐴 ⊔ (𝐵 ∖ 𝐴). Therefore
So we have
𝐴⊆𝐵 ⟹ Pr(𝐵 ∖ 𝐴) = Pr(𝐵) − Pr(𝐴). (9.13)
It also follows from (9.12) that
This could be paraphrased as saying that you cannot make something less likely by
creating more ways for it to happen. Not surprising, but good to know!
Example:
In drawing our first Scrabble tile, what is the probability that we get a letter tile
(not a blank) that has a consonant? We already know that the probability of a vowel
is 0.42 and the probability of a non-blank tile is 0.98. Define events 𝐴 and 𝐵 for these
two occurrences:
𝐴: the tile we draw is a vowel
𝐵: the tile we draw is a letter (not a blank).
Then 𝐴 ⊆ 𝐵, and 𝐵 ∖ 𝐴 is the event whose probability we seek:
𝐵 ∖ 𝐴: the tile we draw is a consonant.
Since 𝐴 ⊆ 𝐵, we have
Our earlier expression (9.13) for the situation 𝐴 ⊆ 𝐵 is a special case of this, since 𝐴 ⊆ 𝐵
implies 𝐴 = 𝐴 ∩ 𝐵.
This raises the general question of how to determine Pr(𝐴 ∩ 𝐵). This is intimately
related to determining Pr(𝐴 ∪ 𝐵). We consider these probabilities now, in the general
9.5 O P E R AT i O N S O N E V E N T S 321
situation where events 𝐴 and 𝐵 that are not necessarily mutually exclusive, so their sets
are not necessarily disjoint, and neither is necessarily a subset of the other.
Consider Pr(𝐴 ∪ 𝐵). The union 𝐴 ∪ 𝐵 is not necessarily a disjoint union of 𝐴 and
𝐵; the sets may intersect, and their intersection 𝐴 ∩ 𝐵 may be small or large, and its
probability matters here. Nonetheless, we can still find a way to express 𝐴 ∪ 𝐵 as a
disjoint union of certain of its subsets. See if you can work out how to do this, and how
to use this to work out Pr(𝐴 ∪ 𝐵). Before showing how it’s done, we pause to reflect on
the methods we have been using so far in this section.
In working out probabilities, it often helps to decompose events into a disjoint union
of simpler, mutually exclusive events, as we discussed on p. 318 in § 9.4𝛼 . In terms of
sets, we are just expressing a set as a partition of simpler sets. We have seen further
instances of this already in this section:
• We expressed the sample space as the disjoint union of 𝐴 and 𝐴 in order to work
out the relationship between the probabilities of 𝐴 and 𝐴 in (9.11).
• When 𝐴 ⊆ 𝐵, we expressed 𝐵 as the disjoint union of 𝐴 and 𝐵 ∖ 𝐴 in (9.12), in
order to derive an expression for Pr(𝐵 ∖ 𝐴) in (9.13).
So, let us continue in this vein and consider Pr(𝐴 ∪ 𝐵). Since
𝐴 ∪ 𝐵 = (𝐴 ∖ 𝐵) ⊔ (𝐴 ∩ 𝐵) ⊔ (𝐵 ∖ 𝐴),
we have
Pr(𝐴 ∪ 𝐵) = Pr(𝐴 ∖ 𝐵) + Pr(𝐴 ∩ 𝐵) + Pr(𝐵 ∖ 𝐴). (9.16)
Now,
Pr(𝐴) = Pr(𝐴 ∖ 𝐵) + Pr(𝐴 ∩ 𝐵)
Pr(𝐵) = Pr(𝐴 ∩ 𝐵) + Pr(𝐵 ∖ 𝐴).
Adding these two equations, we obtain
So, if we already know the probabilities of events 𝐴 and 𝐵, then we need to know the
probability of one of the union and intersection, in order to be able to work out the
probability of the other.
These equations (9.17) and (9.18) are reminiscent of the relationship between the
sizes of two sets and their union and intersection: see (1.13) and Exercises 1.9 and 1.10.
322 DiSCRETE PROBABiLiTY i
In fact, if we take our expressions from those exercises and divide each side by the size
of the universal set (i.e., in this context, the sample space), then we have
|𝐴 ∪ 𝐵| |𝐴| |𝐵| |𝐴 ∩ 𝐵|
= + − , (9.19)
|𝑈| |𝑈| |𝑈| |𝑈|
|𝐴 ∩ 𝐵| |𝐴| |𝐵| |𝐴 ∪ 𝐵|
= + − . (9.20)
|𝑈| |𝑈| |𝑈| |𝑈|
Each quotient here is just the probability of the set shown in the numerator in the spe-
cial case when all elements of the sample space are equally likely. So, really, (9.19) and
(9.20) are just special cases of (9.17) and (9.18), respectively.
When working out Pr(𝐴 ∩ 𝐵), it can sometimes help to partition one of the events,
say 𝐵, into a disjoint union of other events, say 𝐵1 , 𝐵2 , … , 𝐵𝑛 :
𝐵 = 𝐵1 ⊔ 𝐵2 ⊔ … ⊔ 𝐵𝑛 .
Observe that
𝐴 ∩ (𝐵1 ⊔ 𝐵2 ) = (𝐴 ∩ 𝐵1 ) ⊔ (𝐴 ∩ 𝐵2 ).
This is just an application of the Distributive Law together with the observation that
𝐴 ∩ 𝐵1 and 𝐴 ∩ 𝐵2 are disjoint (since they are subsets of the disjoint sets 𝐵1 and 𝐵2 ,
respectively).
For three events, we have
𝐴 ∩ (𝐵1 ⊔ 𝐵2 ⊔ 𝐵3 ) = (𝐴 ∩ 𝐵1 ) ⊔ (𝐴 ∩ 𝐵2 ) ⊔ (𝐴 ∩ 𝐵3 ).
𝐴 ∩ (𝐵1 ⊔ 𝐵2 ⊔ ⋯ ⊔ 𝐵𝑛 ) = (𝐴 ∩ 𝐵1 ) ⊔ (𝐴 ∩ 𝐵2 ) ⊔ ⋯ ⊔ (𝐴 ∩ 𝐵𝑛 ).
Therefore,
by (9.10). So, one way to work out Pr(𝐴 ∩ 𝐵) is to work out each Pr(𝐴 ∩ 𝐵𝑖 ), where 1 ≤
𝑖 ≤ 𝑛, and then just add these probabilities. That requires working out 𝑛 probabilities
of intersections of events, but in some situations it is possible to choose the partition
𝐵 = 𝐵1 ⊔𝐵2 ⊔⋯⊔𝐵𝑛 in such a way that working out the probabilities Pr(𝐴 ∩𝐵𝑖 ) is much
easier than working out Pr(𝐴 ∩ 𝐵).
9.5 O P E R AT i O N S O N E V E N T S 323
𝐵1
𝐴
𝐴 ∩ 𝐵1
𝐴 ∩ 𝐵2 𝐵2
𝐴 ∩ 𝐵3
𝐵3
Figure 9.1: Events 𝐴 and 𝐵, with 𝐵 partitioned into 𝐵1 , 𝐵2 , 𝐵3 . This means that 𝐴 ∩ 𝐵 is
partitioned into 𝐴 ∩ 𝐵1 , 𝐴 ∩ 𝐵2 , 𝐴 ∩ 𝐵3 .
For example, what is the probability that a three-letter English word starts with ‘c’
and has a vowel as its second letter? This kind of question arises in word games and
also, in more elaborate forms, in language modelling. Suppose the three-letter word is
chosen uniformly at random from the set of all three-letter words in a standard word
list. Then
Pr(1st letter is ‘c’ and 2nd letter is a vowel) = Pr((1st letter is ‘c’) ∩ (2nd letter is a vowel).
and
𝐵 = 𝐵1 ⊔ 𝐵2 ⊔ 𝐵3 ⊔ 𝐵4 ⊔ 𝐵5 .
324 DiSCRETE PROBABiLiTY i
Here we have used the word list at /usr/share/dict/words (i.e., filename words, direc-
tory path /usr/share/dict/) in the virtual Linux system in your Ed Workspace. This
has 1443 three-letter words. It is not hard to use computational tools such as grep and
wc (see Module 0 Applied Session) to calculate the answer directly in this case. But
it is also worth thinking about how you might solve this problem manually, using a
physical dictionary. Looking up the first two letters of a word enables you to narrow
down the options a lot, and makes manual calculation feasible, showing the advantage
of using (9.21). Even when using computational tools, smart partitioning of events can
help calculate some probabilities more efficiently.
At this point, it is worth studying this probability of 𝐴∩𝐵 alongside the probabilities
of the events 𝐴 and 𝐵 themselves, which are:
44
Pr(𝐴) = Pr(1st letter is ‘c’) = ≈ 0.030,
1443
760
Pr(𝐵) = Pr(2nd letter is a vowel) = ≈ 0.527.
1443
Something to consider, to prepare for studying conditional probability later: based
only on these probabilities of 𝐴, 𝐵 and 𝐴 ∩ 𝐵, do you think that having first letter ‘c’
makes it more or less likely that the second letter is a vowel? Why?
9.6 i N C L U S i O N - E X C L U S i O N F O R P R O B A B i L i T i E S 325
An important special case of (9.21) arises when 𝐵 = 𝑈, i.e., 𝐵 is the entire sample
space, so Pr(𝐵) = 1 and 𝐴 ∩ 𝐵 = 𝐴 ∩ 𝑈 = 𝐴. Then we have
𝑈 = 𝐵1 ⊔ 𝐵 2 ⊔ ⋯ ⊔ 𝐵 𝑛
(9.22)
⟹ Pr(𝐴) = Pr(𝐴 ∩ 𝐵1 ) + Pr(𝐴 ∩ 𝐵2 ) + ⋯ + Pr(𝐴 ∩ 𝐵𝑛 ),
The relationship between probabilities of events, their unions and intersections ((9.17)
and (9.18)) can be extended to three events.
Pr(𝐴 ∪ 𝐵 ∪ 𝐶)
= Pr(𝐴 ∪ 𝐵) + Pr(𝐶) − Pr((𝐴 ∪ 𝐵) ∩ 𝐶)
(applying (9.17) to the sets 𝐴 ∪ 𝐵 and 𝐶, instead of 𝐴 and 𝐵)
= Pr(𝐴) + Pr(𝐵) − Pr(𝐴 ∩ 𝐵) + Pr(𝐶) − Pr((𝐴 ∪ 𝐵) ∩ 𝐶)
(applying (9.17) to 𝐴 and 𝐵, in Pr(𝐴 ∪ 𝐵))
= Pr(𝐴) + Pr(𝐵) − Pr(𝐴 ∩ 𝐵) + Pr(𝐶) − Pr((𝐴 ∩ 𝐶) ∪ (𝐵 ∩ 𝐶))
(applying the Distributive Law to (𝐴 ∪ 𝐵) ∩ 𝐶, in the last term)
= Pr(𝐴) + Pr(𝐵) − Pr(𝐴 ∩ 𝐵) + Pr(𝐶) − ⒧Pr(𝐴 ∩ 𝐶) + Pr(𝐵 ∩ 𝐶) − Pr((𝐴 ∩ 𝐶) ∩ (𝐵 ∩ 𝐶))⒭
(applying (9.17) to 𝐴 ∩ 𝐶 and 𝐵 ∩ 𝐶)
= Pr(𝐴) + Pr(𝐵) − Pr(𝐴 ∩ 𝐵) + Pr(𝐶) − Pr(𝐴 ∩ 𝐶) − Pr(𝐵 ∩ 𝐶) + Pr((𝐴 ∩ 𝐶) ∩ (𝐵 ∩ 𝐶))
= Pr(𝐴) + Pr(𝐵) − Pr(𝐴 ∩ 𝐵) + Pr(𝐶) − Pr(𝐴 ∩ 𝐶) − Pr(𝐵 ∩ 𝐶) + Pr(𝐴 ∩ 𝐵 ∩ 𝐶)
(since (𝐴 ∩ 𝐶) ∩ (𝐵 ∩ 𝐶) = 𝐴 ∩ 𝐵 ∩ 𝐶)
= Pr(𝐴) + Pr(𝐵) + Pr(𝐶) − Pr(𝐴 ∩ 𝐵) − Pr(𝐴 ∩ 𝐶) − Pr(𝐵 ∩ 𝐶) + Pr(𝐴 ∩ 𝐵 ∩ 𝐶) (9.23)
(rearranging slightly, for neatness).
Compare this expression with the expression for |𝐴 ∪ 𝐵 ∪ 𝐶| given in (8.2), and which
you derived in Exercise 1.11.
The structure of the expression (9.23) is
3
Pr(𝐴∪𝐵∪𝐶) = (−1)𝑘+1 ⋅ ⒧sum of probabilities of all intersections of 𝑘 of the sets 𝐴, 𝐵, 𝐶⒭.
𝑘=1
If all elements in our sample space are equally likely, then the probability of an event
is just its size (as a set) divided by the size of the sample space (provided the sample
space is finite). In this case, Theorem 44 and Theorem 45 just become
𝑛
|𝐴1 ∪ 𝐴2 ∪ ⋯ ∪ 𝐴𝑛 | sum of sizes of all intersections of 𝑘 sets
= (−1)𝑘+1 ⋅ ⒧ ⒭,
|𝑈| 𝑘=1
|𝑈|
𝑛
|𝐴1 ∩ 𝐴2 ∩ ⋯ ∩ 𝐴𝑛 | sum of sizes of all unions of 𝑘 sets
= (−1)𝑘+1 ⋅ ⒧ ⒭.
|𝑈| 𝑘=1
|𝑈|
Removing the common denominator |𝑈| throughout, we just obtain Theorem 42 and
Theorem 43. This shows that the Inclusion-Exclusion principle for counting is a special
case of the Inclusion-Exclusion principle for probability.4
4 Conversely, if all elements of a finite sample space are equally likely, then probabilities are just sizes of sets
divided by the size of the sample space; in other words, they are just counts that have been scaled so that
the sample space itself is scaled to 1. So, in this scenario, the Inclusion-Exclusion principle for counting
implies the Inclusion-Exclusion principle for probability. So, in fact, the two Inclusion-Exclusion principles
are equivalent in a precise sense.
328 DiSCRETE PROBABiLiTY i
For example, when throwing a pair of dice, suppose we are interested in the first die
showing 3 and the second die showing 5. Then
So
6 1
Pr(𝐴) = Pr({(3, 1), (3, 2), (3, 3), (3, 4), (3, 5), (3, 6)}) = = ,
36 6
6 1
Pr(𝐵) = Pr({(1, 5), (2, 5), (3, 5), (4, 5), (5, 5), (6, 5)}) = = ,
36 6
1 1 1
Pr(𝐴) ⋅ Pr(𝐵) = ⋅ = ,
6 6 36
1
Pr(𝐴 ∩ 𝐵) = Pr({(3, 5)}) = .
36
Since Pr(𝐴 ∩ 𝐵) = Pr(𝐴) ⋅ Pr(𝐵), the events 𝐴 and 𝐵 are independent.
The independence of these two events is not too surprising, intuitively, if we accept
that the two dice behave separately when thrown. In this case, the mechanism itself
suggests independence. But independence can be more subtle than this.
Now suppose that the two events we are interested in are (i) the dice totalling seven,
and (ii) the second die showing the “capped successor” of the first, meaning it’s one
greater than the first up to a maximum of 6, so that the capped successor of 𝑥 is 𝑥 + 1
unless 𝑥 = 6, in which case it’s 6. We have
These events may not “look” so independent. They each depend on both dice; we cannot
attribute them to two separate dice and appeal to the apparently separate nature of the
two dice throws. But consider their intersection
𝐴 ∩ 𝐵 = {(3, 4)}
9.7 i N D E P E N D E N T E V E N T S 329
Superficially, this seems a small change from the previous example: 𝐴 is unchanged,
while 𝐵 is very similar to what it was before, requiring now that the number on the
second dice is equal to that on the first dice instead of being its capped successor. (In a
sense, we have only replaced a difference of +1 or 0 by a difference of 0.) The probabilities
of these two events are the same (with Pr(𝐵) having been calculated on p. 316 in § 9.3𝛼 ):
1
Pr(𝐴) = ,
6
1
Pr(𝐵) = .
6
1 1 1
Pr(𝐴) ⋅ Pr(𝐵) = ⋅ = , as above.
6 6 36
But their intersection is now empty: two integers that add up to seven cannot be equal!
𝐴 ∩ 𝐵 = ∅,
Pr(𝐴 ∩ 𝐵) = 0.
Therefore Pr(𝐴 ∩ 𝐵) ≠ Pr(𝐴) ⋅ Pr(𝐵), so these events 𝐴 and 𝐵 are not independent.
330 DiSCRETE PROBABiLiTY i
• The probability of the union of mutually exclusive events is the sum of their
individual probabilities.
Just as we said that probability is additive over disjoint unions, we can now say that
probability is multiplicative over intersections of events.
It is important not to confuse mutual exclusive with independent. Do not be tricked
by Venn diagrams! The fact that two mutually exclusive events “look separate from each
other” on a Venn diagram does not mean they are independent. In fact, if two events
𝐴 and 𝐵 are mutually exclusive, then they are not independent, provided they both
have positive probability. To see this, we just work from the definitions. By definition
of mutual exclusivity,
Pr(𝐴 ∩ 𝐵) = 0,
yet, if both events have positive probability, then
Therefore
Pr(𝐴 ∩ 𝐵) ≠ Pr(𝐴) Pr(𝐵),
so the definition of independence is not satisfied, so the events are not independent. In-
tuitively, this actually makes sense, since the occurrence of 𝐴 prevents 𝐵 from occurring.
Independence cannot be depicted on a Venn diagram as simply as mutual exclusivity.
If two events are independent, then we know they cannot be disjoint (provided each has
positive probability). That’s a necessary condition for independence, but not a suffi-
cient one. Determining independence is not just a matter of checking the emptiness, or
not, of regions in a Venn diagram; it comes down to the precise relationship, given in
(9.26), between the probabilities of the three sets 𝐴, 𝐵 and 𝐴 ∩ 𝐵.
One of the main uses of independence is to help work out the probabilities of com-
plex events. As usual, when confronted with a complex problem, we want to reduce it
to simpler problems. A complex event, being representable as a set, can typically be
described by combining simpler sets using standard set operations. So, if an event is
an intersection of independent events, then we can work out the probabilities of those
simpler events separately and then multiply them to get the probability of the more
complex event we are interested in. This may be viewed as yet another instance of
“divide-and-conquer” problem-solving.
𝑆 𝑇
𝑀
• for each link, the probability that a link survives is 𝑝, and the probability that it
fails is therefore 1 − 𝑝, and that these probabilities are the same for each link;
• the links behave independently of each other. This means that, if 𝐴 is an event
determined solely by the first link 𝑆𝑀 , and 𝐵 is an event determined solely by the
second link 𝑀 𝑇, then Pr(𝐴 ∩ 𝐵) = Pr(𝐴) Pr(𝐵).
What is the probability that 𝑆 and 𝑇 can communicate with each other? For this to
be possible, we need both links between them to survive, otherwise there is no path
between them. So
𝑆 𝑇
Now what is the survival probability? We will work this out in two different ways.
First method:
332 DiSCRETE PROBABiLiTY i
Observe that, for there to be a path linking 𝑆 and 𝑇, we just need at least one of
the two links to survive.
by (9.17). Now, the survival of the separate links are independent events, so
Pr((top link survives)∩(bottom link survives)) = Pr(top link survives)⋅Pr(bottom link survives).
Second method:
Let us now start with
and work out the probability of the complementary event, that there is no path from 𝑆
to 𝑇, instead.
For this complementary event to happen, the top and bottom links must both fail.
Therefore
This agrees with the result of our first method, above. This second method is some-
what simpler in this case, and illustrates the benefit of being alert to the possibility of
computing the probability of an event by computing the probability of its complement
instead.
The probability of an event quantifies how likely it is in a given setting. But we often need
to find the probability of the same event in more than one setting. New information
may come to light, which may affect our view of how likely an event is, so we may
need to recalculate our probabilities. Or we might have several different competing
explanations of some measurement or observation we have made, so we want to calculate
the probability of our observation under these various explanations in order to compare
them.
Conditional probability gives us tools to do this.
Suppose we are interested in an event 𝐴. Its probability is Pr(𝐴), calculated using
the probabilities of elements of the sample space.
Suppose then that another event 𝐵 occurs. This might be a new development, a
change in the circumstances. Or it might be just an improvement in our information
about the situation. Or we may simply be supposing event 𝐵 to occur (even if we don’t
know whether it occurs or not) in order to reason about what might happen if it did
occur. Whatever the motivation, we still want to know the probability of 𝐴, but under
the condition that 𝐵 occurs, and this may well change the probability of 𝐴. We assume
here that 𝐵 ≠ ∅ and Pr(𝐵) > 0.
Our consideration of what can happen is now more restricted in scope than it was
previously. In particular, elements of the sample space outside 𝐵 are now excluded. We
have to restrict our calculations to elements of the sample space that are in 𝐵. In effect,
𝐵 is the new sample space.
But the elements of 𝐵 cannot have the same probabilities that they had in our
original sample space, because those probabilities add up to Pr(𝐵), which in general can
be < 1. This violates the fundamental requirement that the sum of the probabilities of
all elements of the sample space must be 1.
So, in order to use 𝐵 as a new sample space, we need new probabilities for its elements.
These new probabilities should still be proportional to the original probabilities: if one
element was twice as likely as another in the original sample space, then this should still
be true now. What we will do, then, is to scale all the probabilities of the elements by
334 DiSCRETE PROBABiLiTY i
the same constant factor so that they add up to 1. The appropriate scaling to use is to
divide the probability of each element by Pr(𝐵).
For any 𝑥 ∈ 𝐵, we write Pr(𝑥 ∣ 𝐵) for the probability of element 𝑥 when 𝐵 is used as
the sample space. Then, using our usual notation Pr(𝑥) for the probability of element
𝑥 in the original sample space, we have
Pr(𝑥)
Pr(𝑥 ∣ 𝐵) = .
Pr(𝐵)
We should check that these probabilities satisfy the requirements for probabilities of
elements in a sample space. Firstly, they are clearly nonnegative (since the original
probabilities Pr(𝑥) are nonnegative and Pr(𝐵) > 0). Secondly, they sum to 1, because
Pr(𝑥) 1 1
Pr(𝑥 ∣ 𝐵) = = Pr(𝑥) = ⋅ Pr(𝐵) = 1.
𝑥∈𝐵 𝑥∈𝐵
Pr(𝐵) Pr(𝐵) 𝑥∈𝐵 Pr(𝐵)
So we can indeed use these probabilities for the elements of 𝐵 when treating 𝐵 as the
new sample space.
Let 𝑋 ⊆ 𝐵, and suppose we want its probability under the condition that 𝐵 occurs.
We denote this by Pr(𝑋 ∣ 𝐵). To work this out, we just use our restricted sample space,
𝐵, with appropriately scaled probabilities Pr(𝑥 ∣ 𝐵) for its elements. The definition of
probability, (9.6), gives
Pr(𝑥) 1 1 Pr(𝑋 )
Pr(𝑋 ∣ 𝐵) = Pr(𝑥 ∣ 𝐵) = = Pr(𝑥) = ⋅Pr(𝑋 ) = .
𝑥∈𝑋 𝑥∈𝑋
Pr(𝐵) Pr(𝐵) 𝑥∈𝑋 Pr(𝐵) Pr(𝐵)
Pr(𝐴 ∩ 𝐵)
Pr(𝐴 ∣ 𝐵) = Pr(𝐴 ∩ 𝐵 ∣ 𝐵) = .
Pr(𝐵)
9.8 C O N D i T i O N A L P R O B A B i L i T Y 335
The main outcome from this discussion is the following expression for conditional
probability:
Pr(𝐴 ∩ 𝐵)
Pr(𝐴 ∣ 𝐵) = (9.29)
Pr(𝐵)
• Pr(𝐵 ∣ 𝐵) = 1.
• Pr(∅ ∣ 𝐵) = 0.
Equation (9.29) is most useful when we need to work out conditional probability and
the probabilities of 𝐴∩𝐵 and 𝐵 are available or can be calculated. In other situations, we
may already have the conditional probability and can then use it to work out Pr(𝐴 ∩𝐵),
by rearranging (9.29):
Pr(𝐴 ∩ 𝐵) = Pr(𝐴 ∣ 𝐵) Pr(𝐵). (9.30)
This makes sense, intuitively: for 𝐴 and 𝐵 to both occur, we need 𝐵 to occur and, given
that 𝐵 occurs, we also need 𝐴 to occur.
Theorem 46.
46 Events 𝐴 and 𝐵 are independent if and only if Pr(𝐴 ∣ 𝐵) = Pr(𝐴).
Proof. (⇒)
Suppose 𝐴 and 𝐵 are independent. Then
Pr(𝐴 ∩ 𝐵)
Pr(𝐴 ∣ 𝐵) =
Pr(𝐵)
Pr(𝐴) Pr(𝐵)
= (by independence)
Pr(𝐵)
= Pr(𝐴).
(⇐)
Suppose Pr(𝐴 ∣ 𝐵) = Pr(𝐴). Then
Pr(𝐴 ∩ 𝐵)
= Pr(𝐴),
Pr(𝐵)
from which it follows that Pr(𝐴 ∩ 𝐵) = Pr(𝐴) Pr(𝐵), i.e., 𝐴 and 𝐵 are independent.
Example 1:
336 DiSCRETE PROBABiLiTY i
On p. 324 in § 9.5, we asked if having first letter ‘c’ makes it more or less likely that
a random three-letter word has a vowel as its second letter. We now answer this, using
the probabilities that we calculated there.
Recall that
760
Pr(2nd letter is a vowel) = ≈ 0.527 (9.31)
1443
We need to compare this with the appropriate conditional probability, which is the
probability of the second letter being a vowel given that the first letter is ‘c’.
By comparing this with the unconditional probability that the second letter is a vowel,
in (9.31), we see that the conditional probability is greater than the unconditional one:
Pr(2nd letter is a vowel) < Pr(2nd letter is a vowel ∣ 1st letter is ‘c’).
We conclude that the first letter being ‘c’ makes it more likely that the second letter is a
vowel. This agrees with our intuition. In fact, it does more: it replaces vague intuition
by a precise quantitative statement.
Example 2:
We saw on p. 316 that the probability of drawing a tile with a vowel from a bag of
100 Scrabble tiles is 0.42. But this does not mean that 42% of letter tiles are vowels,
since there are also two blank tiles. The probability that a random tile is a vowel, given
that it is not a blank, is
≃ 0.43.
9.9 B AY E S ’ T H E O R E M 337
9.9 B AY E S ’ T H E O R E M
The previous section introduced the conditional probability Pr(𝐴 ∣ 𝐵) of event 𝐴 given
event 𝐵. What if we put the same two events the other way round, and ask about Pr(𝐵 ∣
𝐴)? This is also a valid conditional probability, and we now pin down the relationship
between the two conditional probabilities.
Pr(𝐵 ∣ 𝐴) Pr(𝐴)
Pr(𝐴 ∣ 𝐵) = .
Pr(𝐵)
Pr(𝐵 ∣ 𝐴) Pr(𝐴)
Pr(𝐴 ∣ 𝐵) = .
Pr(𝐵)
So, to convert Pr(𝐵 ∣ 𝐴) to Pr(𝐴 ∣ 𝐵), just multiply it by the ratio Pr(𝐴)/ Pr(𝐵) of the
probabilities of the events. To remember which ratio to use (so you don’t accidentally
use Pr(𝐵)/ Pr(𝐴) instead), keep in mind that the ratio you want is the one which, when
written in-line using “/ ”, has 𝐴 and 𝐵 in the same order as they are in the conditional
probability you’re aiming for. So, if you’re aiming for Pr(𝐴 ∣ 𝐵), the ratio you want has
𝐴 and 𝐵 in that same order, i.e., Pr(𝐴)/ Pr(𝐵).
One of the main applications of Bayes’ Theorem is to capture how beliefs change
when new information becomes available.
A magician keeps three coins in their pocket, to help with their various tricks. One
of the three coins is a fair coin, with Heads and Tails having probability 12 each. Another
has Heads on each side, and the third has Tails on each side. One of the three coins is
chosen at random, with each of the three being equally likely to be chosen.
1
Pr(Fair) = Pr(DoubleHead) = Pr(DoubleTail) = .
3
338 DiSCRETE PROBABiLiTY i
This chosen coin is then tossed once, and the outcome observed. We see only the outcome
on the upper face of the coin; we do not get to turn it over and see what was on the
other side.
Let 𝐴 be the event that the chosen coin is the fair coin. Before we see the outcome
of a toss, our knowledge about 𝐴 is captured by
1
Pr(𝐴) = Pr(Fair) = .
3
Now suppose the coin comes up Heads. We can work out its probability in (at least)
two different ways. Firstly,
Pr(Fair ∣ Heads).
9.9 B AY E S ’ T H E O R E M 339
We already know
1
Pr(Heads ∣ Fair) = (by definition of a fair coin),
2
1
Pr(Fair) = (since the three coins are equally likely to be chosen),
3
1
Pr(Heads) = (as determined above).
2
Therefore, by Bayes’ Theorem,
1
Pr(Heads ∣ Fair) Pr(Fair) 2 ⋅ 13 1
Pr(Fair ∣ Heads) = = 1 = .
Pr(Heads) 2
3
So, from a single coin-toss, we do not change our belief about how likely it is that the
coin is fair. But what about the coin being a DoubleHead or DoubleTail? Certainly a
DoubleTail coin cannot ever show Heads, so the observation of Heads, even from just a
single coin toss, rules out this possibility entirely:
Pr(DoubleTail ∣ Heads) = 0.
Finally, what does our observation of Heads tell us about the probability that the chosen
coin is DoubleHeads? As a shortcut, we can use the fact that our three conditional
probabilities,
must sum to 1, since the three outcomes are mutually exclusive and cover all possibilities
for the chosen coin:
Therefore
So we have changed our belief about how likely it is that the coin is DoubleHeads. Before
we observed the coin toss outcome, our belief was summarised by Pr(DoubleHead) = 13 .
But now, we have Pr(DoubleHead ∣ Heads) = 23 , so we believe that the DoubleHead coin
is twice as likely as before.
We can still work this last conditional probability out using Bayes’ Theorem, for
practice.
We saw that it is also possible for the prior and posterior probabilities to be equal:
1
prior probability: Pr(Fair) = 3 ,
1
posterior probability: Pr(Fair ∣ Head) = 3 .
9.9 B AY E S ’ T H E O R E M 341
Notice that the expressions Bayes’ Theorem gives for our three probabilities
“Given that we have observed Heads, the Double-Head coin is twice as likely
as the Fair coin.”
In these cases, we don’t need to know Pr(Heads), because it just serves as a scaling factor.
If we don’t use it, then we won’t get exact conditional probabilities any more; instead,
we get three quantities that are in the same ratios to each other as the probabilities. We
can compute
Each of these statements says that the left-hand side is proportional to the right-hand
side. The same constant of proportionality is used in each case, namely 1/ Pr(Heads),
but this constant factor is now omitted from the calculation. The three numbers we
obtain are no longer probabilities; they are still ≥ 0, but they no longer sum to 1. But
they are in the same ratios with each other as the probabilities were. We can see from
these numbers that Pr(DoubleHead ∣ Heads) is the largest of the three probabilities, and
that the double-headed coin is twice as likely as the fair coin.
So, in some situations where we only want to do comparisons rather than compute
the probabilities exactly, we may not need to compute Pr(Heads).
Having made that important point, let’s return to considering the exact caluclation
of the probabilities, and in particular the calculation of the denominator Pr(Heads). In
the three-coins example, we first calculated Pr(Heads) as a sum of terms of the form
Pr(Heads ∣ 𝐴) Pr(𝐴), where 𝐴 is each of the three coins: 𝐴 ∈ {Fair, DoubleHead, DoubleTail}.
So we had
This is very typical of applications of Bayes’ Theorem, so much so that the theorem is
often presented in the following form.
342 DiSCRETE PROBABiLiTY i
Pr(𝐵 ∣ 𝐴𝑗 ) Pr(𝐴𝑗 )
Pr(𝐴𝑗 ∣ 𝐵) = 𝑛 .
∑𝑖=1 Pr(𝐵 ∣ 𝐴𝑖 ) Pr(𝐴𝑖 ))
(We are using (9.22) with the names 𝐴 and 𝐵 interchanged throughout, but that makes
no difference to the underlying mathematics.)
Using Theorem 47 with 𝐴𝑗 instead of 𝐴, with the substitution (9.32), we obtain
An important special case is when 𝑛 = 2. We have two events, which we can call 𝐴
and 𝐴, and
Pr(𝐵 ∣ 𝐴) Pr(𝐴)
Pr(𝐴 ∣ 𝐵) = .
Pr(𝐵 ∣ 𝐴) Pr(𝐴)) + Pr(𝐵 ∣ 𝐴) Pr(𝐴))
9.10 EXERCiSES
1. A random word is chosen from a list of all five-letter English words, and then one
of that word’s five letters is chosen.
(a) What is a suitable sample space for this experiment? What should the probabilities
of its elements be?
(b) Challenge: use your Linux skills and /usr/share/dict/words to determine the
probability of each English letter in this experiment.
2. A fair coin is tossed three times. The outcome is the sequence of results of the
coin tosses. For convenience, we denote Heads and Tails by H and T, respectively.
Specify a suitable sample space, with associated probabilities, for this experiment.
3. A fair coin is tossed repeatedly until it comes up Heads for the first time. Then
the tossing stops. The outcome is the sequence of the results of the tosses. For example,
if the first Head occurs on the third toss, then the outcome is TTH.
9.10 E X E R C i S E S 343
Specify a suitable sample space, with associated probabilities, for this experiment.
4. Suppose two fair dice are thrown and their numbers are added. Let 𝑇 be the total
of the numbers on the two dice.
(b) What is the minimum value of 𝑑 such that the probability that the total differs from
the middle value, 7, by at least 𝑑 is at most 0.1? Symbolically, we are asking for the
minimum 𝑑 such that
Pr(|𝑇 − 7| ≥ 𝑑) ≤ 0.1.
6. A random positive integer < 100 is chosen, with all choices being equally likely.
What is the probability that the chosen number is divisible by 3 but not by 9?
7. (Birthday Paradox).
Let 𝑛 ∈ ℕ. Suppose 𝑛 people are chosen at random.
(a) What is the probability that at least two of them share a birthday?
(b) What is the minimum value of 𝑛 such that it is more likely than not that there are
at least two people in the set that share a birthday?
9. Prove by induction on 𝑛 that, for all 𝑛 ∈ ℕ and any mutually disjoint sets
𝐴1 , 𝐴2 , … , 𝐴𝑛 ,
𝑛
Pr(𝐴1 ⊔ 𝐴2 ⊔ ⋯ ⊔ 𝐴𝑛 ) = Pr(𝐴𝑘 ).
𝑘=1
11. Recall the network reliability problems in § 9.7. Now consider the following
network. Again, all links behave independently and have identical survival probability 𝑝.
344 DiSCRETE PROBABiLiTY i
𝑆 𝑇
(a) Give an expression, in terms of 𝑝, for the probability that there is a path of surviv-
ing links from 𝑆 to 𝑇.
(b) Compare your answer to (a) with the answer to Exercise 4.8(b). Discuss the rela-
tionship between your answers to the two questions.
Now let’s upgrade this communications network by adding a link between 𝑀 and 𝑁 ,
giving the following network.
𝑆 𝑇
(c) Give an expression, in terms of 𝑝, for the probability that there is a path of surviv-
ing links from 𝑆 to 𝑇 in this upgraded network.
(d) Compare your answer to (c) with the answer to Exercise 4.8(c). Discuss the rela-
tionship between your answers to the two questions.
12. Determine the probability that a random permutation on a set of two elements
is fixed-point-free.
Then do the same for a set of three elements, and then a set of five elements.
13. What is the probability that a Scrabble letter is a vowel, given that it is in the
first half of the alphabet?
14. Prove that, for any two events 𝐴 and 𝐵, the two conditional probabilities Pr(𝐴 ∣ 𝐵)
and Pr(𝐵 ∣ 𝐴) are equal if and only if the events have the same probability.
9.10 E X E R C i S E S 345
15. Consider again the network of four nodes and five links from Exercise 11. Suppose
that 𝑝 = 12 .
16. You are at a cricket match and the ball is hit high towards a fielder. You
think, will they catch it? The fielder could be any of four players: Alex, Chris, Kim and
Sam. You can’t tell the difference between them in their white cricket uniforms at this
distance. You assess that Alex has a probability of catching of 0.9, and a probability of
dropping the catch of 0.1. Each of Chris, Kim and Sam has catching probability 0.4 and
dropping probability 0.6.
(b) Suppose now that you see that the ball is caught. What is the probability that the
fielder who caught it was Alex? What is the probability that it was not Alex?
17. A serious crime is committed in a big city, and police are trying to identify the
perpetrator. There are 5 million people who cannot be ruled out. A degraded fragment
of DNA from the criminal is found at the scene. There is no other evidence. A random
person has only a one-in-a-million chance of matching the DNA fragment. The fragment
is compared with a database and a match is found with someone who provided DNA
for a different reason in a completely irrelevant context some years ago. How likely is it
that this person committed the crime? Are they guilty, beyond reasonable doubt?
18. In the three-coins example considered from page 337 onwards in § 9.9, suppose
that, instead of observing just one coin toss, we observe two coin tosses instead.
For each of the following observations of the outcomes of the two coin tosses, de-
termine the prior and posterior probabilities of the chosen coin being each of the three
possibilities: Fair, DoubleHead, DoubleTail.
We continue our study of discrete probability by looking at random variables and proba-
bility distributions, including the four most important discrete probability distributions.
These give us tools for modelling and analysing a huge variety of random processes.
10.1𝛼 R A N D O M VA R i A B L E S
For a given sample space, there are many numerical quantities we might be inter-
ested in. For example, when throwing two dice — with each member of sample space
{1, 2, 3, 4, 5, 6}×{1, 2, 3, 4, 5, 6} having probability 1/36 — we might be interested in their
sum (if playing Monopoly), or their product, or their maximum. The idea of a numerical
quantity determined from a random member of a sample space is captured by the notion
of a random variable.
A random variable is a function from a sample space to the real numbers. So, if
𝑈 is the sample space, it’s just a function 𝑓 ∶ 𝑈 → ℝ.
We think of a random variable working as follows.
1. First, a random member 𝑥 ∈ 𝑈 of the sample space is chosen, according to the
probabilities defined on 𝑈.
347
348 DiSCRETE PROBABiLiTY i i
• The number of tests a program passes, during testing on a fixed number of ran-
domly chosen inputs.
It’s common to denote a random variable by a capital letter, and often one near the
end of the alphabet, such as 𝑋 , 𝑌 or 𝑍, but that is not a formal requirement. This
symbol is usually thought of as containing a random value, where that value is obtained
by taking a random member of the sample space and then applying the function to it.
For example, if 𝑍 is a random variable representing the sum of the numbers shown
when throwing two dice, then 𝑍 contains a value chosen randomly from {2, 3, 4, … , 11, 12},
and these values have the probabilities given on p. 315. If we pick any possible value
𝑘 ∈ {2, 3, 4, … , 11, 12}, then we write
𝑍 =𝑘
for the event that the random variable 𝑍 has the value 𝑘. Since it’s an event, it has a
probability. We can use the usual definition of the probability of an event to work out
the probability of the event 𝑍 = 𝑘. We just add up the probabilities of the elements of
the sample space that belong to the event.
Here, the sample space members are the pairs (𝑖, 𝑗), where 𝑖, 𝑗 ∈ {1, 2, 3, 4, 5, 6}. So
we add up the probabilities of all these pairs such that 𝑖 + 𝑗 = 𝑘:
For example, if 𝑘 = 9, then we have the event 𝑍 = 9, which consists of the pairs
(3, 6), (4, 5), (5, 4), (6, 3). Its probability is given by
Pr(𝑍 = 9) = Pr({(3, 6), (4, 5), (5, 4), (6, 3)}) = Pr((𝑖, 𝑗))
(𝑖,𝑗)∶ 𝑖+𝑗=9
use “random variables” with other codomains, such as ℤ𝑛 or sets. This is fine as long as
you don’t try to do operations with them that are not defined in the codomain being
used. For example, you can’t find average values of “random variables” that are sets,
since averaging needs addition and division and neither of these operations is defined on
sets.
So, for example, if 𝑘 is some fixed number in the codomain of random variable 𝑋 , then
it’s ok to write Pr(𝑋 = 𝑘) or to refer to the probability that 𝑋 = 𝑘 or the probability of
the value 𝑘 (if the random variable 𝑋 is clear from the context). And it’s ok to refer to
the probability distribution of the random variable 𝑋 . But it’s not ok to write “Pr(𝑋 )”
or refer to the “probability of 𝑋 ” when 𝑋 is a random variable, since a random variable
is not an event and therefore does not just have one single probability. And it’s not ok
to refer to the probability distribution of 𝑋 = 𝑘, since 𝑘 is just a single specific value
of 𝑋 , and 𝑋 = 𝑘 is an event; as such, 𝑋 = 𝑘 has a probability, but it does not have an
350 DiSCRETE PROBABiLiTY i i
Although there are some similarities between sample spaces and random variables,
there are also important differences, both in the way we define them and in their purpose
and motivation.
• When defining sample spaces, we divide the range of possible outcomes up into
elementary, “atomic” outcomes. We try to make the sample space elements as
simple as we can, so that any event can be described by some set of them, and
there is no requirement for these sample space elements to be numbers. We want
the probabilities of individual elements to be easy to calculate, and we often try
to get a sample space where all elements have the same probability. The main aim
is for the sample space to be a good model of the underlying random process.
• When defining random variables, our priority is to capture useful numerical func-
tions of random data. These might not be easy to calculate, and they typically
lump many elements of a sample space together. The main aim is for the random
variable to be a good model of what we are interested in about the random data.
Sometimes we need to combine random variables. One of the most common opera-
tions we want to do, to combine random variables, is to add them. We have seen one
example of this, when we added the numbers obtained from throwing two dice.
So let’s consider what sums of random variables look like.
Let 𝑋 and 𝑌 be random variables. Their sum is denoted by 𝑋 + 𝑌. This, too,
is a random variable, but its probability distribution is, in general, different to the
distributions of 𝑋 and 𝑌. To work out the probability that this new random variable
𝑋 + 𝑌 takes a specific value 𝑘, we have to look at all possible pairs of values of 𝑋 and
𝑌 whose sum is 𝑘, and add up their probabilities:
Examples:
10.3𝛼 E X P E C TAT i O N 351
• If two dice are thrown, and 𝑋 and 𝑌 are the numbers shown on the first and sec-
ond die respectively, then 𝑋 and 𝑌 are considered independent (unless something
dodgy is going on).
• Let 𝑋 be the rank and let 𝑌 be the suit of a random card drawn from a standard
deck of 52 playing cards. Then 𝑋 and 𝑌 are independent.
10.3𝛼 E X P E C TAT i O N
A random variable takes many different values, each with some probability. Sometimes,
though, we want to work with a single number that is somehow representative of the
entire set of possible values, taking into account their probabilities. This representative
number should, in some sense, correspond to what we expect the random variable to
give us, or to be typical of the values it gives. There are several different ways to do this;
indeed, the terms “expected value” and “typical value” are often interpreted differently.
But the most important and widely used representative value of a random variable is
its expected value, also called its expectation or mean. This is based on the familiar
notion of the average of a set of numbers.
You know that, to compute the average of a set of numbers, you just add them up
and divide by how many of them there are. So, the average of the three numbers 2, 5,
6 is
2+5+6 13
= = 4 13 .
3 3
Another way to put it is that each number is multiplied by 13 and then they are added.
So the three numbers 2, 5, 6 each have a coefficient or “weight”, and the three coefficients
are fractions and add up to 1. In this case, the three coefficients are all the same: 13 , 13 , 13 .
There might be repetition among the numbers we are averaging. The average of 2,
5, 5, 6 (with 5 appearing twice) is
2+5+5+6 1 2 1
= ⋅ 2 + ⋅ 5 + ⋅ 6 = 0.5 + 2.5 + 1.5 = 4.5.
4 4 4 4
We may view this as using the same three numbers 2, 5, 6 as before, but with different
coefficients: 14 , 12 , 14 . These coefficients are still fractions and they still add up to 1.
These coefficients are reminiscent of probabilities, since they belong to the unit
interval [0, 1] and add up to 1. So, taken together, the coefficients can be viewed as a
probability distribution.
352 DiSCRETE PROBABiLiTY i i
We can use this viewpoint to define the average of any set of numbers on which there
is a probability distribution. And a set of numbers with a probability distribution is
nothing more or less than a random variable, as we saw in § 10.2𝛼 . So, for any random
variable 𝑋 , we define its expectation 𝐸(𝑋 ) to be
where the sum is over all possible values of 𝑋 . So we just multiply each value by its
probability, and add up all these products.
The expectation is also called the expected value or the mean mean. It can also be
called the average, although it is more usual to reserve that term for averages obtained
from actual observed data rather than from probability distributions.
Examples:
• Let 𝑋 be the number obtained from throwing a fair die once. Then each value in
{1, 2, 3, 4, 5, 6} has probability 1/6. So
1 1 1 1 1 1 1+2+3+4+5+6 21 7
𝐸(𝑋 ) = 1⋅ +2⋅ +3⋅ +4⋅ +5⋅ +6⋅ = = = = 3 12 .
6 6 6 6 6 6 6 6 3
• Suppose we have a biased coin for which Heads is twice as likely as Tails:
2
Pr(Heads) = ,
3
1
Pr(Tails) = .
3
Let 𝑋 be the number of Heads in a single toss, so
1, with probability 23 ;
𝑋 =
0, with probability 13 ;
Then
2 1 2 2
𝐸(𝑋 ) = 1 ⋅ + 0 ⋅ = + 0 = .
3 3 3 3
• In Scrabble, every letter tile displays (as a subscript) the points that letter is worth,
if used in a word.1 Blanks are worth 0 points.
1 There are other factors involved when the game is played. The actual points you get from playing a letter
may be doubled or tripled when played on certain special squares on the board, and may be doubled if
used to make two words in a single turn. We ignore those factors. We focus only on the face value of a tile.
10.3𝛼 E X P E C TAT i O N 353
• You spend $5 on a lottery ticket. Your ticket is one of 10,000 sold. First prize is
$20,000, second prize is $5,000 and third prize is $2,000. What is your expected
prizemoney? What is your expected profit?
Let 𝑃 be the random variable representing your prizemoney. The values of 𝑃, with
their probabilities, are
𝑃, in $ probability
20,000 0.0001
5,000 0.0001
2,000 0.0001
0 0.9997
We have
So your expected prizemoney is $2.70. To find your expected profit, deduct the
cost of your ticket. Therefore your expected profit is
𝐸(𝑐) = 𝑐. (10.3)
We often want to add random variables together. We have already seen an example
of this: adding the numbers on two dice to form a total, when playing Monopoly. We
now show that we can calculate the expectation of a sum of random variables by just
working out the expectations of all the random variables separately and then adding
them up.
Proof.
𝐸(𝑋 + 𝑌) = 𝑘 ⋅ Pr(𝑋 + 𝑌 = 𝑘)
𝑘
= 𝑘 Pr((𝑋 = 𝑖) ∧ (𝑌 = 𝑗))
𝑘 (𝑖,𝑗)∶ 𝑖+𝑗=𝑘
= 𝑘 ⋅ Pr((𝑋 = 𝑖) ∧ (𝑌 = 𝑗)),
𝑘 (𝑖,𝑗)∶ 𝑖+𝑗=𝑘
with the last step being possible because 𝑘 is a constant as far as the inner sum is
concerned (even though it varies in the outer sum). Now the two summations are both
out the front (on the left), and together they just amount to summing over all pairs
(𝑖, 𝑗), without restriction. Then we can replace 𝑘 by 𝑖 + 𝑗. Therefore
So now we are summing over all pairs (𝑖, 𝑗) where 𝑖 is a possible value for 𝑋 and 𝑗 is
a possible value for 𝑌. We can organise this sum as an outer sum over all possible 𝑖
and an inner sum over all possible 𝑗, where the possibilities for 𝑗 in the inner sum are
unaffected by the choices of 𝑖 in the outer sum. So we can replace the single summation
∑(𝑖,𝑗) over all pairs by these two nested summations ∑𝑖 ∑𝑗 . Or we could put the nested
summations the other way round, as in ∑𝑗 ∑𝑖 . Right now, we do the former.
Now, in the first nested summation, the factor 𝑖 in the inner sum over 𝑗 does not depend
on 𝑗 at all, so as far as that inner sum is concerned, it is fixed. So it can be taken outside
the inner sum as a fixed common factor. Similarly, in the second nested summation, the
factor 𝑗 in the inner sum over 𝑖 does not depend on 𝑖, so that factor 𝑗 can be taken
outside the inner sum. (Neither of these factors can be taken outside the outer sum,
though, as these factors are the variables used in those outer summations, so they each
vary as their outer sum is being done.) So we have
by the law of total probability, (9.22). Here we are using the fact that the events 𝑌 = 𝑗,
considered for all 𝑗, together cover all possibilities, and are mutually exclusive. Similarly,
The first term here is just 𝐸(𝑋 ) and the second term is just 𝐸(𝑌). Therefore
Linearity of expectation is a very simple rule. It’s one of those rare cases where
something simple that you hope might be true actually is true! It is remarkably general,
too: it does not even require the two random variables to be independent; you can check
that we did not assume independence of 𝑋 and 𝑌 at any stage in the proof. It also turns
out to be surprisingly powerful.
Linearity of expectation is another instance of working something out for a complex
object by working it out for components of that object and then combining the answers.
We can also ask about the expectation of a product of random variables. This time,
we need independence to get a simple rule.
Theorem 50. 50 If two random variables 𝑋 and 𝑌 are independent, then the expectation
of their product is the product of their expectations:
where the sum is over all possible products of values of 𝑋 and 𝑌. We can express the
event 𝑋 𝑌 = 𝑘 as a partition into events of the form (𝑋 = 𝑥) ∧ (𝑌 = 𝑦) where 𝑥𝑦 = 𝑘, so
that Pr(𝑋 𝑌 = 𝑘) is just the sum of the probabilities of those smaller events:
where the sum is over all pairs 𝑥, 𝑦 whose product is 𝑘. Substituting this into (10.9), we
obtain
= 𝑘 Pr((𝑋 = 𝑥) ∧ (𝑌 = 𝑦))
𝑘 𝑥,𝑦∶ 𝑥𝑦=𝑘
Now these two summations, together, just involve summing over all possible pairs 𝑥, 𝑦,
without regard to their sum (since the inner sum is over all pairs with a specific product
𝑘, but the outer sum is over all possible values of 𝑘). So
Since we are assuming that 𝑋 and 𝑌 are independent, we have, for every 𝑥 and 𝑦,
= 𝑥𝑦 Pr(𝑋 = 𝑥) Pr(𝑌 = 𝑦)
𝑥 𝑦
(just writing the sum over all pairs 𝑥, 𝑦 as a sum over 𝑥 and a sum over 𝑦)
= 𝑥 Pr(𝑋 = 𝑥) ⋅ 𝑦 Pr(𝑌 = 𝑦)
𝑥 𝑦
= 𝑥 Pr(𝑋 = 𝑥) 𝑦 Pr(𝑌 = 𝑦)
𝑥 𝑦
(since 𝑥 Pr(𝑋 = 𝑥) does not depend on 𝑦, so it can be taken outside the inner sum)
= 𝐸(𝑋 ) ⋅ 𝐸(𝑌).
358 DiSCRETE PROBABiLiTY i i
10.4 MEDiAN
We mentioned earlier that the expectation gives a single number that represents what we
expect from a random variable. Although a random variable is inherently unpredictable,
the expectation gives us a rough idea where it tends to sit on the real number line. We
also mentioned that we might want a number that is typical of the values taken by the
random variable, and we noted in passing that the terms “expected value” and “typical
value” might be interpreted differently.
The median is intended to capture this notion of a “typical” value of a random
variable.
The median of a random variable 𝑋 is a real number 𝑚 such that
1
Pr(𝑋 ≤ 𝑚) ≥ ,
2
1
Pr(𝑋 ≥ 𝑚) ≥ .
2
So we can think of 𝑚 as lying exactly “in the middle of 𝑋 ” as far as its probability
distribution is concerned.
In saying this, we are not saying that 𝑚 lies exactly half-way between the smallest
and largest values that 𝑋 might take. So the median must not be confused with the
mid-range which is defined by
Examples
• Let 𝑋 be the number obtained from tossing a single die. Then its median lies
between 3 and 4, which we can determine from the probabilities or simply from
the symmetry of 𝑋 . It makes sense in this case to define the median to be 3.5. In
general, when the median falls naturally in a gap, its value can be chosen to be the
middle of that gap. There are also more detailed ways of calculating the median
in such cases, but we do not consider them here.
10.4 M E D i A N 359
Pr(𝑋 = 0) = 0.02,
Pr(𝑋 = 1) = 0.68,
Pr(𝑋 ≥ 2) = 0.3,
so Pr(𝑋 ≤ 1) = 0.7 ≥ 0.5 and Pr(𝑋 ≥ 1) = 0.98 ≥ 0.5, satisfying the definition. You
should satisfy yourself that no other median value works in this case.
• Let 𝑃 be the lottery prizemoney random variable defined in the example on p. 353.
Its median is 0, since Pr(𝑃 ≥ 0) = 1 ≥ 0.5 and Pr(𝑃 ≤ 0) = 0.9997 ≥ 0.5.
The expectation and median each have their pros and cons.
• The median is less vulnerable than the expectation to changes to the extreme
values of the random variable. If the lowest (or highest) value changes a bit, then
the median is usually unaffected. This is different to the expectation, which can
be affected by a small change to any value of the random variable.
• The median does not mix with arithmetic operations as well as the expectation.
For example, the median of a sum of two random variables does not, in general,
equal the sum of their medians. So there is no analogue of Linearity of Expectation
to help simplify calculations. This means that the expectation is usually preferred
for random variables that might be used in arithmetical calculations.
• The expectation can be quite different to most actual values of the random variable.
For example, consider a random variable which can take any of the five values 1,
2, 3, 4, 90, with probability 15 each. Then its expectation is
1 1 1 1 1 100
1 ⋅ + 2 ⋅ + 3 ⋅ + 4 ⋅ + 90 ⋅ = = 20.
5 5 5 5 5 5
But its median is 3. Clearly, in this case, the median is more like the other values,
and therefore more “typical” of them, than the expectation, which is a long way
from any of the values.
Both the expectation and the median of a random variable 𝑋 give a numerical
indicator of where 𝑋 may be located on the number line. For this reason, they can each
be described as a measure of location. The mid-range is another measure of location.
But none of these measures says anything about how widely spread, or not, the values
of 𝑋 are. This is often important to know, so we need a new measure to try to capture
it.
360 DiSCRETE PROBABiLiTY i i
10.5 MODE
Our last measure of location is the mode. We only consider it briefly, but it is important
terminology to know.
The mode of a random variable 𝑋 is the value 𝑔 for which Pr(𝑋 = 𝑔) is greatest.
This may not be unique.
The mode may be described as the most frequent, or most popular, value of 𝑋 . This
is useful, since there are times when we just want to know what the most likely value is.
The mode’s probability gives us an upper bound on the probability of every other
value of 𝑋 . But it does not say anything else about how the distribution behaves at
other values. It is quite possible for the mode to be a long way from the mean and/or
the median. So it may not be a good representative of the random variable as a whole.
Nonetheless, for many important well-behaved distributions, the mode is not too far
from the mean and median.
10.6 VA R i A N C E
In many situations, it is important to have some measure of how much a random variable
varies. The values of the variable might be tightly concentrated or loosely smeared out.
The expectation or median tell us nothing about this variability. So we now introduce
a measure of variability, the variance, and its close relative, the standard deviation.
The variance of a random variable 𝑋 measures how far its values tend to be, on
average, from its expected value. In measuring this, it gives greater weight to values that
are further from the expected value. It does this by using the square of the difference
from the expected value, and taking the expectation of this squared difference. We now
define it formally.
Let 𝜇 = 𝐸(𝑋 ) be the expected value of 𝑋 . Then the variance Var(𝑋 ) is defined by
This has many important properties, some of which we will come to soon. But we have
to keep in mind that, if the values of 𝑋 have units on some scale (e.g., if 𝑋 arises from
a physical measurement), then Var(𝑋 ) does not have the same units as 𝑋 ; rather, its
units are the square of the units of 𝑋 , since it is an expectation of squares. For example,
if 𝑋 is a distance in metres, then the units of Var(𝑋 ) are square metres, which is a unit
of area rather than length. So, although the variance does indicate how widely 𝑋 varies
around 𝜇 — with a larger variance indicating wider variation — it does not do so on the
same scale. In order to get a measure of variation that is on the same numerical scale
as 𝑋 and has the same units, we can use the standard deviation,
deviation which is defined by
Examples
• Let 𝑋 be the number given by the throwing a single die. We know that 𝜇 =
𝐸(𝑋 ) = 3.5 (see p. 352). So
Var(𝑋 ) = 𝐸(𝑋 2 ) − 𝜇 2 .
Proof. The variance is just the expectation of (𝑋 − 𝜇)2 . Let’s expand (𝑋 − 𝜇)2 before
taking its expectation. Observe that
(𝑋 − 𝜇)2 = 𝑋 2 − 2𝑋 𝜇 + 𝜇 2 . (10.12)
362 DiSCRETE PROBABiLiTY i i
So we can work out the variance, 𝐸((𝑋 −𝜇)2 ), by taking the expectation of the expanded
form on the right of (10.12). When doing that, we can use Theorem 49 (Linearity of
Expectation).
You can use either way of calculating the variance — the definition (10.11), or
Theorem 51 — according to your preference, or whichever is easier for the data you
have.
You may wonder why, when defining the variance, we use the average squared dif-
ference rather than just the differences themselves. What would happen if we just took
the expectation of the difference 𝑋 − 𝜇, rather than the expectation of (𝑋 − 𝜇)2 ? By
Linearity of Expectation and (10.3), we have
So this tells us nothing! The problem with this approach was that the total positive
and negative differences end up cancelling each other out. We could, instead, take the
expectation of the absolute difference, which is called the mean absolute deviation:
deviation
In fact, this is occasionally used, and it can be treated as being on the same scale as 𝑋 ,
with the same units. But its mathematical properties are not as strong as the variance
or standard deviation, so we do not consider it further.
Theorem 52.52 If two random variables 𝑋 and 𝑌 are independent, then the variance of
their sum is the sum of their variances:
Proof.
If 𝑋 and 𝑌 are independent, then 𝐸(𝑋 𝑌) = 𝐸(𝑋 )𝐸(𝑌), by Theorem 50. So the final
term above, 2(𝐸(𝑋 𝑌) − 𝐸(𝑋 )𝐸(𝑌)), is zero. Therefore
Comments:
• The variance is easier to work with, when adding random variables, than the
standard deviation, for which we have
• The quantity 𝐸(𝑋 𝑌) − 𝐸(𝑋 )𝐸(𝑌), used near the end of the above proof, is im-
portant in its own right. It is the covariance of the random variables 𝑋 and 𝑌,
denoted by Cov(𝑋 , 𝑌):
This is zero when 𝑋 and 𝑌 are independent, and can be zero in some other situa-
tions too. If it is nonzero then it indicates some kind of linear relationship between
the two random variables, though not usually an exact one; the relationship may
be approximate and probabilistic.
Since the standard deviation measures how far 𝑋 tends to be from its mean 𝜇, we
would expect that 𝑋 is unlikely to be too many standard deviations away from 𝜇, and
that the further away from 𝜇 we go, the less likely 𝑋 is to appear there.
This intuition can be made precise. We use 𝑡 for how far away from the mean we
want to go (in terms of numbers of standard deviations). We will put an upper bound
on the probability of being at least that far away.
364 DiSCRETE PROBABiLiTY i i
1
Pr ⒧|𝑋 − 𝜇| ≥ 𝑡𝜎⒭ ≤ .
𝑡2
Proof. The variance of 𝑋 is 𝐸((𝑋 − 𝜇)2 ). This expectation is a sum over all values of
𝑋 , but we’ll compare that with what we get by only taking values that are far enough
away from the mean.
≥ (𝑘 − 𝜇)2 Pr(𝑋 = 𝑘)
𝑘∶ |𝑘−𝜇|≥𝑡𝜎
(we now only sum over those 𝑘 that are ≥ 𝑡 standard deviations
away from the mean)
≥ (𝑡𝜎)2 Pr(𝑋 = 𝑘)
𝑘∶ |𝑘−𝜇|≥𝑡𝜎
(since (𝑡𝜎)2 does not depend on 𝑘, so can be taken outside the sum)
= (𝑡𝜎)2 Pr(|𝑋 − 𝜇| ≥ 𝑡𝜎).
Therefore
1
Pr ⒧|𝑋 − 𝜇| ≥ 𝑡𝜎⒭ ≤ .
𝑡2
So, the larger 𝑡𝜎 is, the less likely it is that 𝑋 is that far away from the mean, and
the theorem gives an upper bound on the probability of being that far away.
This theorem’s main virtue is its generality. It applies to any random variable at all.
In practice, when we have a random variable with a specific well-behaved probability
distribution, stronger statements may be possible; there may be smaller bounds on the
probability of being a given distance from the mean. But, if we don’t know much about
how a given random variable is distributed, or if it’s hard to analyse it, then we can
10.7 U N i F O R M D i S T R i B U T i O N 365
Having introduced many of the basic concepts pertaining to random variables, es-
pecially expectation and variance, we now look at some important and useful random
variables and their associated probability distributions.
Let 𝑎 and 𝑏 be integers with 𝑎 ≤ 𝑏. The uniform distribution gives the same proba-
bility to all integers between 𝑎 and 𝑏 inclusive, and zero probability to all other integers.
The number of integers in the interval [𝑎, 𝑏] is 𝑏 − (𝑎 − 1) = 𝑏 − 𝑎 + 1, so they each get
probability 1/(𝑎 − 𝑏 + 1). All other integers have probability zero.
If a random variable 𝑥 has the uniform distribution as its probability distribution,
we say it is uniformly distributed,
distributed and its probabilities are given by
⎧ 1
, if 𝑎 ≤ 𝑥 ≤ 𝑏;
Pr(𝑋 = 𝑥) = ⎨ 𝑏 − 𝑎 + 1
⎩ 0, otherwise.
We can write 𝑋 ∼ Unifℤ (𝑎, 𝑏) to mean that the random variable 𝑋 is uniformly dis-
tributed over the integer interval [𝑎, 𝑏] ∩ ℤ. The subscript ℤ may be omitted if it is clear
from the context that 𝑋 can only take integer values in the interval [𝑎, 𝑏].
A plot of the discrete uniform distribution, for 𝑎 = 2 and 𝑏 = 6, is shown in Fig-
ure 10.1.
Here are some examples of uniformly distributed random variables, some of which
we have seen before.
• Let 𝑋 be the outcome of the toss of a fair coin, with outcomes encoded as 0 and
1 for Tails and Heads respectively. Then 𝑋 ∼ Unif(0, 1).
• Let 𝑋 be the number shown on a fair die after it is thrown. Then 𝑋 ∼ Unif(1, 6).
• Let 𝑋 be the age of an adult student who has not reached the minimum age for
a full Victorian Driver’s Licence. This means that their age could be 18, 19, 20
or 21 (since the minimum age for a full licence is 22). In the absence of any
other information about the student or their driving history, we might use 𝑋 ∼
Unif(18, 21).
Sometimes, we use a uniform distribution because we have reason to believe that all
possible values are indeed equally likely, as for a fair die or a fair coin.
We also use a uniform distribution to model ignorance or uncertainty, as we did
in the third example above. Suppose we know that an integer-valued random variable
always takes values in [𝑎, 𝑏], but we know nothing more about it. So we don’t know
where its most popular or least popular values are; we don’t know whether it tends to
lie closer to 𝑎 or closer to 𝑏 or somewhere in the middle; we don’t know whether its
366 DiSCRETE PROBABiLiTY i i
Pr(𝑋 = 𝑘)
0.3
0.2
0.1
0.0
𝑘
0 1 2 3 4 5 6 7 8 9 10
A Bernoulli trial is a random experiment with two possible outcomes, success and
failure. The probabilities of these outcomes are denoted by 𝑝 and 𝑞 = 1−𝑝 respectively:
failure
𝑝 = Pr(success),
𝑞 = Pr(failure) = 1 − 𝑝.
This very simple random experiment can be used to model a huge variety of situa-
tions. The outcomes “success” and “failure” can be renamed according to the needs of the
situation; the names themselves are just conventional, and are not an essential feature
of the model. We could, instead, call the outcomes “yes” and “no”, or “on” and “off”, or 1
2 The notion of uncertainty of a random variable can be quantified, using the concept of entropy. You should
learn about entropy later in your computer science studies. It is a foundational concept in information
theory and was first introduced by Claude Shannon in coding theory, where data is encoded for sending over
a communications channel, and the encoding scheme used must be as efficient as possible while protecting
against some level of random “noise” on the channel. It is also used in cryptography, data compression,
and machine learning. One of the basic theorems about entropy, as a measure of uncertainty, is that the
entropy of a distribution over an integer interval is maximised by the uniform distribution on that interval.
368 DiSCRETE PROBABiLiTY i i
and 0. Coin tosses can be regarded as Bernoulli trials in which the outcomes are Heads
and Tails. Mostly, our coins have been fair, with 𝑝 = 𝑞 = 12 , but biased coins can also be
modelled, using 𝑝 ≠ 12 (see, e.g., p. 352). In our network reliability example on p. 330
in § 9.7, each edge was a Bernoulli trial in which the outcomes represent the survival or
failure of links in a network, with link outcomes being independent and the probability
of survival being 𝑝 for all links.
When we refer to some number or sequence of Bernoulli trials, they are understood
to be independent and identically distributed. So, the outcome of any one of them is
independent of all previous trials, and each trial uses the same success probability 𝑝.
A Bernoulli trial can be described by a {0, 1}-valued random variable:
1, with probability 𝑝;
𝑋 =
0, with probability 1 − 𝑝.
This is about the simplest random variable that can be defined that has any randomness
at all. We can work out its mean and variance:
Suppose we have 𝑛 Bernoulli trials with success probability 𝑝. How many successes
are there? This is a random variable 𝑍 whose set of possible values is {0, 1, 2, … , 𝑛−1, 𝑛}.
If 𝑘 is a value in this set, what is the probability that we have 𝑘 successes (and therefore
𝑛 − 𝑘 failures)?
Each success has probability 𝑝 and each failure has probability 1−𝑝. Since the trials
are independent, a given sequence of outcomes must have probability
𝑝 𝑘 (1 − 𝑝)𝑛−𝑘 .
Pr(𝑍 = 𝑘)
0.3
0.2
0.1
0.0
𝑘
0 1 2 3 4 5 6 7 8 9 10
𝑍 ∼ Bin(𝑛, 𝑝).
A plot of the binomial distribution, for 𝑛 = 10 and 𝑝 = 0.3, is shown in Figure 10.2.
Note that, for the highest values of 𝑘 (i.e., 𝑘 = 9, 10), the probabilities are nonzero even
though the points appear to be on the horizontal axis.
For example, suppose we want to know the probability that we get exactly six
successes from 𝑛 = 10 Bernoulli trials each with success probability 𝑝 = 0.3. This is the
370 DiSCRETE PROBABiLiTY i i
same scenario as for the plot in Figure 10.2, where we now seek the distribution’s value,
Pr(𝑍 = 6), when 𝑘 = 6. Using (10.16), we have
10
Pr(𝑍 = 6) = ⒧ ⒭0.36 (1 − 0.3)10−6 = 210 ⋅ 0.36 ⋅ 0.74 ≈ 0.037.
6
𝑍 = 𝑋 1 + 𝑋2 + ⋯ + 𝑋𝑛 .
𝐸(𝑍) = 𝐸(𝑋1 + 𝑋2 + ⋯ + 𝑋𝑛 )
= 𝐸(𝑋1 ) + 𝐸(𝑋2 ) + ⋯ + 𝐸(𝑋𝑛 ) (by Linearity of Expectation)
= 𝑝
+𝑝 +⋯+𝑝 (by (10.14))
𝑛 copies
= 𝑛𝑝.
Since Bernoulli trials are independent, we can compute the variance by adding up
the variances of all the 𝑋𝑖 (by Theorem 52):
Var(𝑍) = Var(𝑋1 + 𝑋2 + ⋯ + 𝑋𝑛 )
= Var(𝑋1 ) + Var(𝑋2 ) + ⋯ + Var(𝑋𝑛 ) (by Theorem 52)
= 𝑛𝑝(1 − 𝑝).
The binomial distribution is often used when the number 𝑛 of trials is very large. We
should be on the lookout for efficient approximations when working with large numbers.
10.9 P O i S S O N D i S T R i B U T i O N 371
Fortunately, there are two very useful approximations to the binomial distribution that
can be used in two types of situations with large 𝑛 (although they do not cover all
possible scenarios). We consider the first of these in this section.
Let 𝑋 be a random variable that can take any nonnegative integer value. We say
that 𝑋 has Poisson distribution with parameter 𝜇 if, for all 𝑘 ∈ ℕ0 ,
𝑒−𝜇 𝜇 𝑘
Pr(𝑋 = 𝑘) = . (10.17)
𝑘!
We can write 𝑋 ∼ Poisson(𝜇) to mean that 𝑋 has Poisson distribution with parameter 𝜇.
In (10.17), 𝑒 is the base of natural logarithms, as usual.
We should first check that this is a valid probability distribution. The values are
nonnegative, but do they add up to 1? We can use the infinite series for 𝑒𝑥 :
∞
𝑥2 𝑥3 𝑒𝑖 𝑥𝑖
𝑒𝑥 = 1 + 𝑥 + + +⋯+ +⋯⋯ = , (10.18)
2! 3! 𝑖! 𝑖=0
𝑖!
for all 𝑥 ∈ ℝ. The total of all the probabilities in the Poisson distribution is
∞
𝑒−𝜇 𝜇 0 𝑒−𝜇 𝜇 1 𝑒−𝜇 𝜇 2 𝑒−𝜇 𝜇 3
Pr(𝑋 = 𝑘) = + + + +⋯
𝑘=0
0! 1! 2! 3!
𝜇0 𝜇1 𝜇2 𝜇3
= 𝑒−𝜇 ⒧ + + + + ⋯⒭
0! 1! 2! 3!
𝜇2 𝜇3
= 𝑒−𝜇 ⒧1 + 𝜇 + + + ⋯⒭
2! 3!
= 𝑒−𝜇 ⋅ 𝑒𝜇 (by (10.18))
= 1,
Pr(𝑋 = 𝑘)
0.3
0.2
0.1
0.0
𝑘
0 1 2 3 4 5 6 7 8 9 10 11 12 13
Let’s now give the expectation and variance of the Poisson distribution in general.
Theorem 54.
54 If 𝑋 is a Poisson random variable with parameter 𝜇,
𝐸(𝑋 ) = 𝜇,
Var(𝑋 ) = 𝜇,
StdDev(𝑋 ) = √𝜇.
Proof.
∞
𝐸(𝑋 ) = 𝑘 Pr(𝑋 = 𝑘)
𝑘=0
∞
𝑒−𝜇 𝜇 𝑘
= 𝑘
𝑘=0
𝑘!
∞
𝜇𝑘
= 𝑒−𝜇 𝑘
𝑘=1
𝑘!
(since 𝑒−𝜇 does not depend on 𝑘, so it can be taken outside the sum;
we have also started the sum at 𝑘 = 1, since for 𝑘 = 0 we have 𝑘𝜇 𝑘 /𝑘! = 0)
∞
𝜇𝑘
= 𝑒−𝜇
𝑘=1
(𝑘 − 1)!
∞
𝜇 𝑘−1
= 𝑒−𝜇 ⋅ 𝜇 (taking one factor 𝜇 outside the sum)
𝑘=1
(𝑘 − 1)!
∞
−𝜇 𝜇𝑖
= 𝑒 ⋅𝜇 (writing the sum in terms of 𝑖 = 𝑘 − 1 rather than 𝑘)
𝑖=0
𝑖!
= 𝑒−𝜇 ⋅ 𝜇 ⋅ 𝑒 𝜇
= 𝜇.
A similar argument can be used to show that Var(𝑋 ) = 𝜇. We leave that as an exercise.
It then follows that the standard deviation is √𝜇.
The symbol 𝜇 for the Poisson parameter was chosen as a reminder that it is actually
the expectation, and also the variance, of the distribution.
• a random variable could conceivably take any nonnegative integer value (or it has
a finite upper bound which is very large), and
Many Poisson random variables are based on counting how many times an event happens
within some time interval, in situations where these events are independent of each
other and can happen at any point in time. The parameter 𝜇 depends on the situation,
including the length of the time interval.
Here are some examples of random variables with Poisson distributions.
• Telephone calls: the number of calls received (by a phone, call centre or exchange)
during a given time interval.
• Website visits: the number of visits to a website within a given time interval.
• Busking: the number of coins tossed into a busker’s cap during a given time
interval.
• Radioactive decay. Suppose you have a quantity of radioactive material, the atoms
of which emit alpha particles (which are essentially Helium nuclei, and consist of
two protons and two neutrons). The emissions by the atoms are independent of
each other. The number of alpha particles emitted by a given quantity of material
in a given period of time follows a Poisson distribution. The parameter 𝜇 depends
on the particular element (and isotope) as well as the quantity of material and the
length of the time interval during which observations are made.
• the first 𝑘 − 1 trials must result in failure, which has probability (1 − 𝑝)𝑘−1 , since
the trials are independent; and
𝑘 1 2 3 4 ⋯⋯
probability 𝑝 (1 − 𝑝)𝑝 (1 − 𝑝)2 𝑝 (1 − 𝑝)3 𝑝 ⋯⋯
As usual, we check that these are indeed probabilities. They are clearly nonnegative,
since 0 ≤ 𝑝 ≤ 1. So we now check that they sum to 1. Adding the probabilities gives an
infinite series
𝑝 + (1 − 𝑝)𝑝 + (1 − 𝑝)2 𝑝 + (1 − 𝑝)3 𝑝 + ⋯ .
This is an infinite geometric series with first term 𝑎 = 𝑝 and common ratio 𝑟 = 1 − 𝑝, so
its sum is
𝑎 𝑝 𝑝 𝑝
= = = = 1,
1−𝑟 1 − (1 − 𝑝) 1−1+𝑝 𝑝
as required. So they are indeed probabilities.
Let 𝑋 be a random variable taking positive integer values. 𝑋 is geometrically distributed
if, for some 𝑝 ∈ [0, 1] and every 𝑘 ∈ ℕ,
Pr(𝑋 = 𝑘) = (1 − 𝑝)𝑘−1 𝑝.
We now give the expectation and variance for the general case.
Theorem 55.
55 If 𝑋 ∼ Geom(𝑝) then
1
𝐸(𝑋 ) = ,
𝑝
1−𝑝
Var(𝑋 ) = .
𝑝2
Proof. (outline)
∞ ∞ ∞
𝐸(𝑋 ) = 𝑘 Pr(𝑋 = 𝑘) = 𝑘(1 − 𝑝)𝑘−1 𝑝 = 𝑝 𝑘(1 − 𝑝)𝑘−1 , (10.19)
𝑘=1 𝑘=1 𝑘=1
376 DiSCRETE PROBABiLiTY i i
Pr(𝑋 = 𝑘)
0.3
0.2
0.1
0.0
𝑘
0 1 2 3 4 5 6 7 8 9 10 11 12 13
since the factor 𝑝, inside the sum, does not depend on 𝑘, so can be taken outside the
sum.
The sum here is not a type of sum we have studied before. If the terms being added
had no coefficient 𝑘, then we would have an infinite geometric series. If, instead, the
exponent 𝑘 − 1 were removed, we would have an infinite arithmetic series. But, as it
stands, this infinite sum is not of either of those types.
Nonetheless, there is something familiar about it. Study the expression inside the
summation in (10.19). Where have you seen an expression of the form
𝑘𝑥𝑘−1
before?
See if you can remember, before going to the next page!
378 DiSCRETE PROBABiLiTY i i
𝑑 𝑘
𝑥 = 𝑘𝑥𝑘−1 .
𝑑𝑥
Here we have 𝑥 = 1 − 𝑝; if we differentiate (1 − 𝑝)𝑘 with respect to 𝑝, then we must
introduce a minus sign, by the Chain Rule:
𝑑 𝑑
(1 − 𝑝)𝑘 = −𝑘(1 − 𝑝)𝑘−1 , therefore 𝑘(1 − 𝑝)𝑘−1 = − (1 − 𝑝)𝑘 .
𝑑𝑝 𝑑𝑝
This sum is now the sum of a geometric series with first term 𝑎 = 1 − 𝑝 and common
ratio 𝑟 = 1 − 𝑝. So this sum is given by
𝑎 1−𝑝 1−𝑝 1
= = = − 1.
1−𝑟 1 − (1 − 𝑝) 𝑝 𝑝
Therefore
𝑑 1
𝐸(𝑋 ) = −𝑝 ⒧ − 1⒭
𝑑𝑝 𝑝
−1
= −𝑝 ⒧ 2 ⒭
𝑝
1
= .
𝑝
The geometric distribution has the important property of being memoryless, which
we now explain.
3 It may be surprising to encounter calculus here, since our focus is on discrete probability. But mathematics
is not obliged to respect the walls that humans find it convenient to erect between its parts! It is common
for tools from continuous mathematics, such as calculus, to be applied to the study of discrete probability
and other parts of discrete mathematics. In fact, the traffic goes both ways.
10.10 G E O M E T R i C D i S T R i B U T i O N 379
Suppose you conduct a series of Bernoulli trials, with success probability 𝑝, and let
𝑋 be the number of trials until the first success. Then 𝑋 ∼ Geom(𝑝). Now suppose it so
happens, by chance, that the first 5 trials are all failures. How much longer do we have
to wait for success? The time for this wait, after the fifth trial and given that the first
five trials are all failures, is also geometrically distributed, and with the same success
probability.
This is just a consequence of the independence of the trials. If we have seen five
failures, then independence implies that this has no influence whatsoever on subsequent
trials. In other words, the trials “forget” previous outcomes. Our waiting time to success
behaves exactly as if we are right at the very start of our sequence of trials.
There is a common fallacy about these situations called the “law of averages”, which
is the belief that a sequence of failures makes success more likely and/or that a sequence
of successes makes failure more likely. It is possible to define sequences of random
experiments where such a law does indeed hold, but it does not hold for a sequence of
independent trials with the same success probability.
So, in general, the memoryless property means that, if 𝑋 ∼ Geom(𝑝) and 𝑡 ∈ ℕ, then
the distribution of 𝑋 − 𝑡 given that 𝑋 ≥ 𝑡 is also geometric with probability 𝑝.
In fact, it can be shown that this property characterises the geometric distribution:
any memoryless random variable with values in ℕ must be geometrically distributed.
• Program running time: the number of iterations of a loop in a program, where the
stopping condition depends on a random quantity satisfying some condition. We
assume that the random quantities are independent in each iteration and identi-
cally distributed. (Even quantities that are not explicitly random can sometimes
be usefully modelled using randomness. See the discussion in § 9.1𝛼 .)
• Time to failure: the number of time intervals (seconds/hours/etc) that pass until
the failure of a specific component in a device/machine/system. Here we assume
that the component has the same failure probability in each time interval, and
that failures in different time intervals are independent. Note that here we are
inverting the roles of success and failure, so we wait until the first failure rather
than until the first success. This is just a matter of renaming the outcomes, and
does not affect the underlying theory.
• Cricket (batting): the number of balls faced by a batter in a Test cricket match.
• Cricket (bowling): the number of balls bowled until the first wicket falls in a Test
cricket match.
380 DiSCRETE PROBABiLiTY i i
Suppose now that, instead of asking just for the time until the first success, we ask for
the time until we have seen both success and failure. So we are asking for the time until
we have seen both possible outcomes. If we are tossing a coin, we are asking for the
number of tosses until we have seen both Heads and Tails. For simplicity, we focus on
the case where both outcomes are equally likely, i.e., 𝑝 = 12
We know that we need at least two tosses, because after just one toss we have only
seen one of the two outcomes. We could have both outcomes after two tosses, as in the
toss sequences HT or TH, or it could take three or more tosses, e.g., HHT, TTH, HHHT,
TTTTTTTTH, and so on.
Let the random variable 𝑍 be the number of tosses until we have seen both outcomes.
We know that 𝑍 ≥ 2. After the first toss (whatever its outcome may be), how long do
we have to wait until we see the other outcome too? This random variable is just 𝑍 − 1.
After the first toss, we just wait until the other outcome appears, and that outcome has
probability 12 . So our waiting time, 𝑍 − 1, is geometrically distributed with probability
𝑝 = 12 :
𝑍 − 1 ∼ Geom( 12 ).
Therefore
1
𝐸(𝑍 − 1) = 1 = 2. (10.20)
2
We can use this to calculate the expected total waiting time, from the very start, until
we have seen both outcomes. This is the expectation of 𝑍, which is
The coupon collector’s problem is the traditional name given to the extension
of this problem to an arbitrary number of equally likely outcomes.
It derives its name from a situation where items in a commercial product line (break-
fast cereal boxes, in the original scenario) each contain one of 𝑛 possible coupons, and
a prize is available to someone (perhaps the first person) who collects all 𝑛 coupons.
Assume that all 𝑛 coupons are equally likely to appear in a given cereal box, and that
the numbers of them are so large that we can consider the coupons to be chosen with
replacement from an infinite supply. If you want to collect all 𝑛 different coupons, how
many of the items do you have to buy in order to achieve your aim? This usually means
you get repeats of some coupons, but as soon as you have found your last coupon, you
stop.
10.11 T H E C O U P O N C O L L E C T O R ’ S P R O B L E M 381
We model this as a sequence of independent trials where, instead of just one or two
successful outcomes from each trial, we have 𝑛 possible outcomes, and that all outcomes
are equally likely, so each has probability 1/𝑛.
Let random variable 𝑍 be the number of trials until we have seen each possible
outcome at least once. We do not care about the order in which the outcomes occur; we
don’t mind which outcome is the first one we see, which is the second, etc, and which is
the last to be seen, as long as we do see them all eventually. As soon as the last outcome
(whichever one that may be) occurs, we stop, and the time taken (i.e., number of trials)
to reach this point is the value of 𝑍.
For the event 𝑍 = 𝑘, we need the 𝑘-th trial to be the first time some particular
outcome occurs. This means that no previous trial gives that outcome, and also that
the previous trials together include all other outcomes at least once. Deriving an exact
expression for this probability, Pr(𝑍 = 𝑘), is nontrivial, and we will not do it now. We
focus on the expectation of this random variable, which tells us how long we expect to
wait, on average, until all possible outcomes have occurred.
The first trial immediately gives one of the 𝑛 outcomes (though it could be any of
them). So, immediately, we have seen our first outcome, and there are 𝑛 −1 we have yet
to see. One down, 𝑛 − 1 to go!
Then we must wait some period of time until we get another outcome. How long do
we have to wait until we get an outcome that’s different to the first one? For each trial
after the first,
1
Pr(outcome is the same as the first trial) = ,
𝑛
since each outcome has probability 1/𝑛, so the chance that a specific later trial agrees
with the first trial is 1/𝑛. Therefore
1
Pr(outcome is different to the first trial) = 1 − .
𝑛
When we are waiting for a different outcome to that of the first trial,
• we regard a different outcome (to the first trial) as a “success”, so the success
probability is 1 − 1/𝑛;
• we regard the same outcome (as the first trial) as a “failure”, so the failure proba-
bility is 1/𝑛.
So, with this new view of success/failure, the trials can be viewed as Bernoulli trials
with
1
𝑝 = 1− ,
𝑛
1
1−𝑝 = .
𝑛
382 DiSCRETE PROBABiLiTY i i
So the time to success from these trials has geometric distribution with 𝑝 = 1 − 1/𝑛.
Denote this time by 𝑋2 . So
1
𝑋2 ∼ Geom ⒧1 − ⒭ .
𝑛
Since 𝑋2 is the minimum time after the first trial until we get a different outcome to
the first trial, the total time from the start until we have seen two different outcomes is
1 + 𝑋2 ,
since we have one trial for the first outcome (whatever that is), followed by 𝑋2 trials for
an outcome different to the first outcome.
After the second outcome has occurred, how many trials are there until we get a
third outcome? We represent this by a new random variable 𝑋3 . We can again model
this using a geometric distribution, but with different concepts of “success” and “failure”,
and different probabilities. Now, “failure” is having one of the first two outcomes again,
and “success” is having any other outcome. So
2
Pr(“success”) = Pr(trial outcome is different to the first two outcomes to appear)
= 1− ,
𝑛
2
Pr(“failure”) = Pr(trial outcome is the same as the first two outcomes to appear) = .
𝑛
So
2
𝑋3 ∼ Geom ⒧1 − ⒭ .
𝑛
The time from the start until we have seen three different outcomes is
1 + 𝑋2 + 𝑋3 .
We can continue in this vein. For each 𝑘 ∈ {1, 2, … , 𝑛}, define the random variable 𝑋𝑘
by
𝑋𝑘 = number of trials after (𝑘 − 1) different outcomes have occurred, until the 𝑘-th outcome occurs.
and the sequence consists of some number of “failures” followed by a single “success”.
Therefore
𝑘−1
𝑋𝑘 ∼ Geom ⒧1 − ⒭. (10.21)
𝑛
The extreme cases here are
10.11 T H E C O U P O N C O L L E C T O R ’ S P R O B L E M 383
• 𝑘 = 𝑛: in this case, all outcomes have already occurred except for one, and we are
just waiting for that last outcome, which has probability
𝑛−1 1
Pr(“success”) = 1 − =
𝑛 𝑛
in each trial. So
1
𝑋𝑛 ∼ Geom ⒧ ⒭ .
𝑛
𝑍 = 1 + 𝑋2 + 𝑋3 + ⋯ + 𝑋𝑘 + ⋯ + 𝑋𝑛−1 + 𝑋𝑛 .
This is a sum of random variables, and they are all geometrically distributed, although
with different probabilities. As we mentioned earlier, our priority is to work out how
long we expect to wait until we have seen all outcomes. So we won’t work out the entire
probability distribution of 𝑍, but rather will focus on its expectation.
Fortunately, this expectation can be calculated using linearity of expectation together
with our knowledge of the expectations of the individual geometric random variables 𝑋𝑘 .
384 DiSCRETE PROBABiLiTY i i
where 𝐻𝑛 is the 𝑛-th harmonic number, being the sum of the reciprocals of the first 𝑛
positive integers:
1 1 1 1 1
𝐻𝑛 = 1 + + + ⋯ ⋯ + + .
2 3 𝑛−2 𝑛−1 𝑛
We learned about the harmonic numbers in § 6.15. Recall the approximation in (6.48),
which we repeat here:
𝐻𝑛 ≈ log𝑒 𝑛 + 𝛾.
It follows that
𝐸(𝑍) ≈ 𝑛(log𝑒 𝑛 + 𝛾),
and in fact it is usually a very good approximation to simply use
𝐸(𝑍) ≈ 𝑛 log𝑒 𝑛.
• A computer program can produce any one of 𝑛 possible outputs, and for random
inputs these are equally likely. The program is complex, so you have no feasible
way of crafting an input that produces a given output. (Perhaps you do not have
access to its inner workings, but must treat it as a black box.) Nonetheless, you
must test it thoroughly, to ensure that each possible output can be produced
without crashing. So you test the program by running it on a sequence of random
10.12 E X E R C i S E S 385
inputs. How many tests do you expect to have to do, until you have seen every
possible output?
• In biology, the problem has been used to estimate the number of different species
of some type of life form in some area. This is a kind of “inverse” of the coupon
collector’s problem, since we use sequences of species actually observed (which may
be very long and contain many repetitions) to make inferences about how many
species there are in total, and we may need to use a more general setting where
the distribution of outcomes (species) is not uniform. Extensive theory has been
developed for this.
10.12 EXERCiSES
1. Suppose two independent fair dice are thrown, with one represented by the
random variable 𝑋 and the other represented by the random variable 𝑌. Calculate the
probability distribution of |𝑋 − 𝑌|.
3. Suppose one of the four bitstrings 000, 011, 101, 110 is chosen uniformly at random.
So each of these four strings of three bits has probability 1/4 of being chosen. Define
{0, 1}-valued random variables 𝑋 , 𝑌, 𝑍 as follows:
(a) Prove that (i) 𝑋 and 𝑌 are independent; (ii) 𝑋 and 𝑍 are independent; (iii) 𝑌 and
𝑍 are independent,
(b) Determine whether or not 𝑍 is independent of the other two, i.e., if 𝑍 is independent
of the pair (𝑋 , 𝑌).
(c) Similarly, determine whether or not 𝑌 is independent of the other two, and determine
whether or not 𝑋 is independent of the other two.
4. You want to play a game that needs one die, but you do not have one. Your friend
suggests using five fair coins instead, observing the number of Heads (which can be any
number in {0, 1, 2, 3, 4, 5}), and adding one to get a number in {1, 2, 3, 4, 5, 6}, and using
this final number in place of the number shown by a die.
Discuss this suggestion.
Can you think of any other way in which a set of fair coins (possibly more than six)
can be used to simulate the throw of a fair die?
6. Find the variance of the sum of the numbers shown on two fair dice.
What is the probability that this sum is at least two standard deviations away from
its mean? Compare this with what Chebyshev’s Inequality says about this case.
(a) Your friend claims that, because 𝑝 = 12 , half the trials must be successes and half
must be failures. What is the probability that that actually happens?
(b) What is the probability that 𝑋 is at least two standard deviations away from the
mean? (Use a calculator/spreadsheet/program as needed.)
(c) What is the probability that 𝑋 is at least three standard deviations away from the
mean?
(d) How do these exact probabilities compare with the bounds given by Chebyshev’s
Theorem?
10.12 E X E R C i S E S 387
8. During a meteor shower, meteors can arrive randomly at any time and are
independent of each other. The Eta Aquariids meteor shower (on now!) has — at its
peak and in idealised conditions — an average rate of 0.83 meteors per minute that are
visible to the unaided eye.4
(a) What is the distribution of the number of meteors seen in a given minute around
the peak of the shower?
(b) What is the probability that no meteors are seen in a given minute?
(c) How long should an observer be prepared to wait for, if they want to have a proba-
bility of 0.95 of seeing a meteor? (You’ll need a calculator, spreadsheet or program
for this.)
9. Each time a bowler on the opposing team bowls to a Test cricket batter, the
probability of the batter being dismissed (thereby ending their innings) is 1.25%, with
the outcomes from different balls being independent.
(a) What is the distribution of the number of balls the batter faces until they are dis-
missed?
(b) What is the probability they get a “golden duck” (i.e., dismissed by the first ball
they face)?
(d) Suppose the batter has faced 200 balls without being dismissed. How many further
balls would you expect them to face until being dismissed?
10. For each of the random variables listed below, state which of the following
distributions is the best model for it: uniform, binomial, Poisson, geometric. If none of
these seem to fit, discuss why, and suggest an appropriate distribution.
(a) the face value of the top card in a well-shuffled standard deck of 52 playing cards,
where the face values of Ace, Jack, Queen and King are defined to be 1, 11, 12 and
13, respectively, and the face value of any other card is the number shown on it.
(b) the number of days in a particular week on which your regular morning train to
work arrives late, where train arrival times on different days are independent and
identically distributed, and a train is late if it arrives at least a minute after its
scheduled arrival time.
4 This rate is idealised in the sense that it assumes perfect atmospheric conditions and gives the number of
meteors that would be seen if they were all high in the sky, at the best time of night for viewing the shower.
In reality, viewing conditions seldom come close to that ideal. You have to be much more patient than the
“official” rates may seem to indicate!
388 DiSCRETE PROBABiLiTY i i
(c) the number of cosmic rays that hit the CPU of your laptop during your next class;
(d) the number of working days you have to wait, from tomorrow onwards, until your
regular morning train arrives on time (with same assumptions as for (b)).
(e) the number of cereal packets you have to buy until you get one of your three favourite
coupons (under the original coupon collector’s scenario).
(f) the number of cowrie shells that land aperture-up when seven of them are thrown in a
game of Pachisi5 , assuming the cowrie shells are identical and behave independently.
(g) a random digit from the first 1,002 digits after the decimal point in the decimal
representation of 3/7.
(h) a random digit from the first 1,000 digits after the decimal point in the decimal
representation of 5/11.
(i) a random digit from the first 10100 digits of the decimal representation of 𝜋.
11. You met the Birthday Paradox in Exercise 9.7. There, the focus was on how
many people you need to meet in order for it to be more likely than not that at least two
of those people have the same birthday. Now let’s ask a different question, but under
the same assumptions we made then.
Suppose you are recording birthdays of members of a customer loyalty scheme, so
that your company can send them greetings on their birthday each year. How many
people do you expect to have to enrol in the scheme until their birthdays cover every
day of the year?
Graphs are abstract models of networks. They can be used to model any system con-
sisting of components that interact in some way. For example, they can model social
networks, molecules, maps, electronic circuits, transport networks, the web, communi-
cations networks, software systems, timetabling requirements, and much else. In each
case, we have a set of nodes or vertices together with links or edges between certain
pairs of vertices. Table 11.1 lists a number of network types, identifying the vertices
and edges for each. We met some of these previously, on p. 66 in § 2.13.
Table 11.1: Some types of networks, with their vertices and edges.
Like other mathematical models, graphs are abstractions, so they don’t represent
everything about a system. They are intended to capture the structure of the inter-
actions, without incorporating the details of how those interactions work or what the
nodes do. This means a lot of information is thrown away. For example, in a graph that
models a social network, we don’t record people’s height or where each pair of friends
389
390 G R A P H T H E O RY i
first met. Nonetheless, the information retained in the graph enables many important
problems about the network to be solved.
Problems we can tackle using graphs include:
• In a social network, who has the most friends? Who has the most central position
in the network? What is the largest clique of people, who all know each other?
Can you identify subcommunities of people within the network?
• What’s the minimum number of cities you need to travel through in order to drive
between two specific cities? Is there a tour that visits every city and returns to its
starting point, and if so, which of these has the fewest repeat visits to cities?
• Given a computer network, how can it be displayed on a screen with the fewest
edge crossings? (This is important for constructing network diagrams that are
readable and help people understand networks better.)
• How many different ways are there of travelling by train between two specific
cities?
• How many different timeslots are needed for an exam timetable in which no student
has a clash?
This chapter introduces the basic concepts of graph theory and some foundational
results relating to vertex degrees and various kinds of paths and cycles.
A graph consists of a set of vertices and a set of edges. The set of vertices can be any
set, and is intended to represent the objects we are interested in. Each edge must be a
pair of vertices.
We now state this definition more formally.
A graph is a pair (𝑉, 𝐸) where 𝑉 is a set and 𝐸 is a set of unordered pairs of
elements of 𝑉. Each member of 𝑉 is called a vertex and each member of 𝐸 is called an
edge.
edge
If 𝐺 is a graph, we may write 𝐺 = (𝑉, 𝐸) to make clear that 𝑉 is its set of vertices
and 𝐸 is its set of edges. We can also write 𝑉(𝐺) for the set of vertices of 𝐺 and 𝐸(𝐺)
for the set of edges of 𝐺.
So, to specify a graph, we need to specify its vertex set and edge set. Each member
of the edge set is an unordered pair of vertices. So, in a graph 𝐺 = (𝑉, 𝐸), an edge 𝑒 ∈ 𝐸
between two vertices 𝑣, 𝑤 ∈ 𝑉 is the set {𝑣, 𝑤} containing just those two vertices.
11.1𝛼 B A S i C D E F i N i T i O N S 391
This graph has seven vertices and five edges. It is depicted in Figure 11.1, where the
dots represent vertices and the lines represent edges.
𝑑 𝑓
𝑎
𝑒
𝑐
𝑏 𝑔
We say that two vertices 𝑣, 𝑤 of a graph are adjacent if {𝑣, 𝑤} is an edge in the
graph. We also use the convenient shorthand 𝑣 ∼ 𝑤 for this, since it is a bit briefer than
writing {𝑣, 𝑤} ∈ 𝐸, but they mean the same thing. If 𝑣 and 𝑤 are not adjacent, then
we can write 𝑣 ≁ 𝑤. For example, in our graph 𝐺 in Figure 11.1, we have 𝑎 ∼ 𝑏, 𝑎 ∼ 𝑐,
𝑏 ∼ 𝑐, 𝑐 ∼ 𝑑, and 𝑓 ∼ 𝑔. For any other pair 𝑣, 𝑤 of distinct vertices of 𝐺, we have 𝑣 ≁ 𝑤.
For example, 𝑑 ≁ 𝑓.
If 𝑣 is a vertex, then every vertex 𝑤 that is adjacent to 𝑣 is called a neighbour of
𝑣. The set of all neighbours of 𝑣 is the neighbourhood of 𝑣.
If 𝑣 and 𝑤 are adjacent, then they are said to be endpoints
endpoints, or endvertices
endvertices, of the
edge {𝑣, 𝑤} between them.
Adjacency refers to the relationship between two vertices that share an edge. We
use a different term for describing the relationship between a vertex and an edge. If
vertex 𝑣 is one of the two vertices in edge 𝑒 — which means that there is some other
vertex 𝑥 for which 𝑒 = {𝑣, 𝑥} — then we say that 𝑣 is incident with 𝑒.
The same term is used to describe the relationship between two edges that have a
common vertex. Let {𝑣, 𝑤} and {𝑣, 𝑥} be edges that meet at the vertex 𝑣, with 𝑤 ≠ 𝑥.
Then we say that these two edges are incident with each other.
In the graph of Figure 11.1:
• Vertex 𝑐 is incident with edges {𝑎, 𝑐}, {𝑏, 𝑐} and {𝑐, 𝑑}, but not with any others.
• Edge {𝑎, 𝑐} has endpoints 𝑎 and 𝑐. This edge is incident with {𝑎, 𝑏}, since they
meet at vertex 𝑎, and it is also incident with {𝑏, 𝑐} and {𝑐, 𝑑}, since they meet at
vertex 𝑐.
• Edge {𝑓, 𝑔} is incident with vertices 𝑓 and 𝑔 (its endpoints), but it is not incident
with any other edges.
392 G R A P H T H E O RY i
A vertex is isolated if it does not belong to (i.e., is not incident with) any edges.
This happens if and only if it is not adjacent to any other vertices. In the graph in
Figure 11.1, vertex 𝑒 is isolated.
A vertex is a leaf if it belongs to exactly one edge. In Figure 11.1, there are three
leaves: 𝑑, 𝑓 and 𝑔.
• no vertex is adjacent to itself. Since an edge is a set of two vertices, those vertices
must be distinct (else we’d have a set consisting of the same vertex appearing
twice, but duplicates are not allowed in sets).
• no two edges can have the same pair of endpoints. To see this, suppose we have
two edges with the same two endpoints 𝑣 and 𝑤. Then the two edges must both be
{𝑣, 𝑤}. (Writing one as {𝑣, 𝑤} and the other as {𝑤, 𝑣} makes no difference, because
order does not matter in a set, so they are still really the same set, even though
we have written them differently.) And since the graph’s edges are considered to
be a set (recalling our definition of a graph as (𝑉, 𝐸), where 𝐸 is the set of edges),
there cannot be duplicate edges in that set.
In graph theory, a graph that satisfies both these conditions is called a simple graph.
graph
We will focus on simple graphs in this unit. We will usually omit the adjective “simple”,
and will just assume that our graphs are all simple graphs unless we say otherwise.
In focusing on simple graphs, we are not saying that they are the only type that
matters. There are broader classes of graphs in which the two conditions above are
relaxed.
• Sometimes, loops are allowed. A loop is an edge whose two endpoints are identical.
It therefore joins a vertex to itself, and is shown in diagrams as a closed curve
through the vertex.
• Sometimes, we allow more than one edge between a pair of vertices. We call these
multiple edges or parallel edges.
edges
There are other classes of graphs in which we allow the vertices and/or edges to
carry other information. A weighted graph has a number on each edge, which could
represent a length, or size, or cost of some kind.
In our graphs, the edges have no direction. An edge {𝑣, 𝑤}, being a set, has no
notion of order between 𝑣 and 𝑤. Such a graph is said to be undirected
undirected. So far,
we have been working entirely with undirected graphs, and we will omit the adjective
“undirected” and just assume that graphs are of this type unless otherwise stated.
11.3𝜔 G R A P H S A N D R E L AT i O N S 393
But there are many situations where the order of vertices in an edge does matter.
For example, a one-way street in a road network has a strict order between its endpoints.
A hyperlink between two webpages goes from the linking page to the linked page. A
phone call has a caller and a receiver. To model these situations, we use ordered pairs for
edges, instead of unordered pairs. So, a directed edge from vertex 𝑣 to vertex 𝑤 is an
ordered pair (𝑣, 𝑤). A directed edge is also called an arc
arc. A directed graph is defined
just as we have done, except that we change from unordered pairs to ordered pairs for
edges. In other words, a directed graph is a graph in which all edges are directed. It is
also possible to define “mixed” or “hybrid” graphs in which edges may be either directed
or undirected.
All these different types of graphs are used as models in a wide variety of practical
contexts. We focus on simple (undirected) graphs because
• they are simple! Relatively so, anyway.
• they appear as special cases in most other classes of graphs, so we’d have to master
them anyway if we want to learn about other classes of graphs.
• they are still complex enough to capture all the main computational issues that
arise in more general classes of graphs.
11.3𝜔 G R A P H S A N D R E L AT i O N S
We first encountered graphs informally — under the more informal term network —
when discussing binary relations in § 2.13. (See p. 66.) The relationship between graphs
and binary relations is very close.
Adjacency is a binary relation defined on the set of vertices of a graph. It consists of
all ordered pairs (𝑣, 𝑤) such that 𝑣 is adjacent to 𝑤. Our graphs are undirected, so if 𝑣
is adjacent to 𝑤 then 𝑤 is adjacent to 𝑣. In other words, the pair (𝑣, 𝑤) belongs to the
adjacency relation if and only if (𝑤, 𝑣) also belongs to it. This tells us that adjacency is a
symmetric binary relation. However, it is not reflexive. In fact, no vertex is adjacent to
itself, so the adjacency relation does not contain any pairs (𝑣, 𝑣) where the two vertices
in the pair are the same. So we say that the adjacency relation is irreflexive.1 We can
think of the adjacency binary relation as being obtained by replacing each edge {𝑣, 𝑤}
by the two ordered pairs (𝑣, 𝑤) and (𝑤, 𝑣).
So simple graphs may be regarded as irreflexive symmetric binary relations. Every
simple graph gives rise to such a relation, via adjacency. Conversely, every irreflexive
symmetric relation may be used to define a simple graph whose vertex set is the domain
of the relation.
1 This is a stronger condition than just “not reflexive”; it is not merely a logical negation of reflexivity, but a
kind of extreme opposite of it.
394 G R A P H T H E O RY i
If we discard irreflexivity, then we are allowing (but not enforcing) loops. If, instead,
we discard symmetry, then we have directed graphs rather than undirected graphs. If
we discard both of these, then we have a directed graph which may have loops but is
not allowed to have multiple edges. (Here, we mean that, if we have an edge (𝑣, 𝑤) from
vertex 𝑣 to vertex 𝑤, we do not have any extra edges frm 𝑣 to 𝑤, i.e., no other edges
“parallel” to (𝑣, 𝑤) and in the same direction. But we do allow the reverse edge (𝑤, 𝑣).
In directed graphs, forbidding multiple edges does not forbid us from having both (𝑣, 𝑤)
and (𝑤, 𝑣).) So we may regard any binary relation as a directed graph, possibly with
loops but with no multiple edges. Similarly, any directd graph with no multiple edges
gives rise to a binary relation.
It is natural to display graphs as diagrams, as we did in Figure 11.1, when they are small
enough for this to be practical. But diagrams have their limits. Many graphs from real
applications are too large to be depicted in a diagram on a page or a device screeen. In
any case, to run algorithms on graphs, we need to be able to store them in computer
memory. To do this, we need formal, purely symbolic ways of representing graphs.
We now consider the four main ways of representing graphs symbolically. We will
not discuss the details of using these representations in programs; that will be considered
in FIT2004 Algorithms and Data Structures.
Our first representation corresponds to the formal definition of graphs that we have just
given.
An edge list of a graph is just a list of its edges. Typically, the order does not
matter, so we can think of this as a set, although inside a computer information is
always stored in some specific order.
To represent a graph completely, listing the edges alone is not sufficient, in general.
This is because it’s possible for a vertex to belong to no edges. Such a vertex will not
be apparent just from looking at all the edges. So, we should specify both the vertex
set and the set of edges. This just means specifying exactly the information required in
our formal definition of graphs.
An edge list does not necessarily group edges together according to the vertices they
share. This may mean it is less efficient, for some tasks, than other representations we
will cover shortly.
1, if 𝑣 ∼ 𝑤;
0, if 𝑣 ≁ 𝑤.
We think of 1 as meaning “edge” and 0 as meaning “no edge”. Another way to put this
is to say that every entry of the matrix gives the number of edges between the two
vertices. For simple graphs, this number must be 0 or 1, so we can represent it by a
single bit.
Here is an adjacency matrix for the graph 𝐺 in Figure 11.1. We show the vertices
corresponding to each row and column, for convenience, but they are not part of the
matrix itself.
𝑎 𝑏 𝑐 𝑑 𝑒 𝑓 𝑔
𝑎 0 1 1 0 0 0 0
⎛ ⎞
𝑏 ⎜ 1 0 1 0 0 0 0 ⎟
𝑐 ⎜ 1 1 0 1 0 0 0 ⎟
⎜ ⎟
𝑑 ⎜ 0 0 1 0 0 0 0 ⎟
⎜ ⎟
⎜ ⎟
𝑒 ⎜ 0 0 0 0 0 0 0 ⎟
⎜ ⎟
𝑓 ⎜ 0 0 0 0 0 0 1 ⎟
𝑔 ⎝ 0 0 0 0 0 1 0 ⎠
We see, for example, that 𝑎 ∼ 𝑐, since the entry in the 𝑎-row and 𝑐-column is 1, while
𝑐 ≁ 𝑓, since the entry in the 𝑑-row and 𝑓-column is 0.
We make two observations about the adjacency matrix.
• The entries in the main diagonal (top left to bottom right) are all 0, because a
vertex is never adjacent to itself.
• The matrix is symmetric about the main diagonal. This means that, for all 𝑣 and
𝑤, the entry in row 𝑣 and column 𝑤 is the same as the entry in row 𝑤 and column
𝑣. This means that the matrix is completely determined by all the entries that lie
above the main diagonal. Similarly, it is completely determined by all the entries
that lie below the main diagonal.
As we often do with tables of data, it is natural to ask about the sums of the entries
in each row and each column, and the sum of all entries in the entire matrix.
• For a given vertex 𝑣, what information about it is captured by the number of 1s
in the 𝑣-row of the adjacency matrix? What about the 𝑣-column? We return to
this question in § 11.7.
396 G R A P H T H E O RY i
• Consider the number of 1s in the entire adjacency matrix. The 1s indicate the
edges in the graph, but each edge is counted twice: edge {𝑣, 𝑤} gives 1 in the
entry for the 𝑣-row and 𝑤-column, and also for its “mirror image” entry in the
𝑤-row and 𝑣-column. So the number of 1s in the adjacency matrix is exactly twice
the number of edges of the graph.
The rules for adjacency matrices can be relaxed, if we are working with a wider class
of graphs than just simple graphs. If we allow loops, then some of the diagonal entries
can be 1. If we have a directed graph, then the entry in row 𝑣 and column 𝑤 need not
equal the entry in row 𝑤 and column 𝑣. If we allow multiple edges between two vertices,
then we can have entries that are neither 0 nor 1; the entry for row 𝑣 and column 𝑤 can
give the number of edges between 𝑣 and 𝑤.
We referred to an adjacency matrix as an array, and it can be represented in pro-
grams using two-dimensional array structures, which most programming languages have.
It can also be thought of as a “table”, with 𝑛 rows and 𝑛 columns, with the row and
column headings not included in the counts of rows and columns.
But there is a reason why we called it an adjacency matrix rather than an adjacency
“array” or “table”. The use of the term “matrix” is not just about the way it stores the
adjacency information in a two-dimensional 𝑛 × 𝑛 way. If that were all, then the terms
“2D array” and “table” would serve just as well. The term “matrix” is also about what
you can do with it. In mathematics, we can do operations with matrices, to calculate
important numbers from them or form other matrices. For adjacency matrices of graphs,
these operations can shed light on the graph, revealing aspects of its structure that might
have been hard to determine otherwise. They are also used in some algorithmic problems
on graphs. We don’t cover these graph-theoretic applications of matrices in this unit.
But it is good to be aware that they are “out there”, even if you mainly just use adjacency
matrices as a method for storing graphs.
One virtue of the adjacency matrix is that you can efficiently test whether or not
two given vertices are adjacent: you just have to look up an entry in a matrix, and
most computer representations of matrices (using 2D arrays) enable this to be done very
quickly. But other operations — like searching through all the neighbours of a vertex —
can be done more efficiently using adjacency lists, which we come to next (§ 11.4.3).
The adjacency matrix takes the same amount of space regardless of how many edges
the graph has. This may not be a problem, if the graph has many edges. But many
real-world networks are relatively “sparse”, meaning (roughly speaking) that most pairs
of vertices do not have an edge between them. If you are dealing with a class of large
sparse graphs, then the adjacency matrix representation may take up too much space in
memory.
An adjacency list specifies the vertices in some order and, with each vertex, it gives a
list of all the vertices it is adjacent to.
11.4𝛼 R E P R E S E N T i N G G R A P H S 397
For example, the graph 𝐺 in Figure 11.1 has the following adjacency list representa-
tion.
𝑎 : 𝑏, 𝑐
𝑏 : 𝑎, 𝑐
𝑐 : 𝑎, 𝑏, 𝑑
𝑑 : 𝑐
𝑒 :
𝑓 : 𝑔
𝑔 : 𝑓
Each line nominates a vertex, followed by a list of all its neighbours. We have used a colon
here to separate each vertex, at the start of its line, from the list of all its neighbours.
But that is just a superficial detail; other ways of conveying this information can be
used, provided it is done clearly and consistently. We have used punctuation for human
readers, but when graphs are represented as adjacency lists in computers, the separation
of the different types of information is done differently, using data structures such as
arrays or lists which you will learn about in programming units.
We can see from the adjacency list that, for example, 𝑐 is adjacent to 𝑎, 𝑏 and 𝑑, so
that the neighbourhood of 𝑐 is {𝑎, 𝑏, 𝑑}. We can also readily see that 𝑒 has no neighbours
and that the graph has three leaves, namely 𝑑, 𝑓 and 𝑔.
This is probably the most widely used representation of graphs in computers. It
enables efficient searching of neighbourhoods of vertices, which is a common task in
many graph algorithms. It is compact, more so than the adjacency matrix.
• for every edge 𝑒 and vertex 𝑣, the entry in the 𝑒-row and 𝑣-column is
1, if 𝑒 is incident with 𝑣;
0, if 𝑒 is not incident with 𝑣.
11.5 SUBGRAPHS
Suppose we want to focus our attention only on a portion of a graph 𝐺. If this portion
is a graph in its own right, we say it is a subgraph of 𝐺. This is not yet a precise formal
definition, because we have not defined what we mean by a “portion” of a graph.
Let 𝐺 = (𝑉, 𝐸) be a graph. A subgraph of 𝐺 is a graph 𝐹 = (𝑈, 𝐷), with vertex set
𝑈 and edge set 𝐷, such that
𝑈 ⊆ 𝑉, and 𝐷 ⊆ 𝐸.
In other words, every vertex of 𝐹 is also a vertex of 𝐺, and every edge of 𝐹 is also an
edge of 𝐺. When 𝐹 is a subgraph of 𝐺, we write 𝐹 ≤ 𝐺. In effect, we are “overloading”
the ≤ symbol so that, as well as standing for ordinary numerical inequality, it also stands
for the subgraph relation. Context should make it clear which is meant.
For example, let 𝐹 = (𝑈, 𝐷) be the graph defined by
𝑈 = {𝑎, 𝑏, 𝑐, 𝑒, 𝑔},
𝐷 = {{𝑎, 𝑏}, {𝑎, 𝑐}},
𝑒
𝑐
𝑏 𝑔
A graph is a subgraph of itself: 𝐺 ≤ 𝐺. You can check that the definition is satisfied.
But there are times when we want to exclude this possibility and only consider those
subgraphs that are not the whole graph. We say that 𝐹 is a proper subgraph of 𝐺 if
it is a subgraph of 𝐺 but is not equal to 𝐺. Symbolically, 𝐹 ≤ 𝐺 and 𝐹 ≠ 𝐺. We can
write this as 𝐹 < 𝐺.
The complete graph on 𝑛 vertices, denoted by 𝐾𝑛 , has every pair of vertices joined by
an edge.
At the other extreme, the null graph on 𝑛 vertices, denoted by 𝐾𝑛 , has no edges
at all.
11.7 D E G R E E 399
𝐾4 𝐾4 𝑃3 𝐶4
Figure 11.3: 𝐾4 , 𝐾4 , 𝑃3 , 𝐶4 .
The path graph of length 𝑛 − 1, denoted by 𝑃𝑛−1 , has 𝑛 vertices and 𝑛 − 1 edges
with the property that the vertices can be put in a sequence so that two vertices are
adjacent if and only if they are consecutive in the sequence.
The cycle of length 𝑛, denoted by 𝐶𝑛 , has 𝑛 vertices and 𝑛 edges and can be formed
from a path graph on the same vertices by adding a new edge between the first and last
vertex of the path.
Examples of these graphs, with four vertices, are shown in Figure 11.3.
11.7 DEGREE
The degree of a vertex is the number of neighbours it has. Since our graphs are simple,
it also equals the number of edges that are incident with the vertex.
We denote the degree of vertex 𝑣 in graph 𝐺 by
deg𝐺 (𝑣).
If the graph 𝐺 is clear from the context, we may drop the subscript and just write
deg(𝑣).
This gives us some alternative descriptions of some concepts we introduced earlier.
• A vertex is isolated if and only if it has degree 0.
Now we study the sum of all the vertex degrees of a graph. One reason for studying
this is because it enables us to determine the average degree of a vertex in the graph,
which helps in understanding what the graph looks like locally.
Our main theorem on this is called the Handshaking Lemma. The name comes
the following scenario. At a social function with 𝑛 people, some pairs of people shake
hands when they first meet, others do not. How many handshakes occur? One way
to determine this is to find out, from each person, how many times they shook hands.
Adding up these individual numbers of handshakes counts each handshake twice: it
takes two hands to shake! Think of the people as vertices and the handshakes as edges.
Theorem 57 (Handshaking Lemma). Lemma) For every graph 𝐺 = (𝑉, 𝐸), the sum of the
degrees of the vertices of a graph is twice its number of edges:
deg(𝑣) = 2𝑚,
𝑣∈𝑉
From this, we immediately obtain the average degree of the vertices of a graph.
Corollary 58.
58 For any graph 𝐺,
2𝑚
average degree of 𝐺 = ,
𝑛
where 𝑛 is the number of vertices of 𝐺 and 𝑚 is its number of edges.
Proof.
deg(𝑣)
sum of degrees 𝑣∈𝑉(𝐺) 2𝑚
average degree of 𝐺 = = = ,
number of vertices 𝑛 𝑛
by Theorem 57.
The average degree of a graph is therefore very simple to calculate. You only need
to know the two most fundamental parameters of the graph: its number of vertices, and
its number of edges. These parameters are “global” in the sense that they pertain to
the graph as a whole rather than any part of it. But, once you have them, this simple
calculation of 2𝑚/𝑛 gives you the average degree, which tells you something about
the “local” structure of the graph, namely what happens, on average, in the immediate
vicinity of the vertices.
In principle, the average degree can be as low as 0, for graph with no edges, and as
high as 𝑛 − 1, for complete graphs on 𝑛 vertices. In practice, real-world graphs tend to
have average degrees that are much lower than the maximum possible. The worldwide
human social network currently has about 8, 000, 000, 000 people, but the average degree
is tiny by comparison. (The average degree depends on how you define the edges. If the
graph uses close friendships only, the average degree is said to be about 4.)
Another consequence of the Handshaking Lemma follows from the fact that the sum
of the degrees is even.
Corollary 59.
59 Every graph has an even number of vertices of odd degree.
Proof. Since 2𝑚 is even (where 𝑚 is the number of edges of the graph), the sum of
all the degrees is even (by Theorem 57). Now, the sum of the even degrees (i.e., those
vertex degrees that are even numbers) is also even, since the sum of even numbers is
always even. It follows that the sum of the odd degrees is even too.
Recall that
It follows that there must be an even number of odd degrees, else the sum of the odd
degrees would be odd, whereas we observed above that it’s even.
Theorem 56 and Corollary 59 give some necessary conditions for a set of numbers
to be a valid set of degrees of some graph. They show that not every set of numbers
within the required range (0 to 𝑛 − 1) can be the set of degrees of a graph.
We often want to move from one part of a graph to another, using the edges to step
from vertex to vertex.
A walk from a vertex 𝑣 to a vertex 𝑤 is a sequence of vertices and edges,
𝑣0 , 𝑒1 , 𝑣1 , 𝑒2 , 𝑣2 , … , 𝑣𝑘−1 , 𝑒𝑘 , 𝑣𝑘 , (11.1)
where
• each 𝑣𝑖 is a vertex of 𝐺,
• each 𝑒𝑖 is an edge of 𝐺,
• ∀𝑖 ∶ 𝑒𝑖 = {𝑣𝑖−1 , 𝑣𝑖 }. This means that the edge 𝑒𝑖 links the two vertices listed on
either side of it. So, to go from 𝑣𝑖−1 to 𝑣𝑖 , we step along edge 𝑒𝑖 = {𝑣𝑖−1 , 𝑣𝑖 }.
The length of a walk is the number of edges in it. The shortest possible walk is the
walk of length zero, which consists of just a single vertex and no edges.
For example, consider in the graph in Figure 11.4. Here are some examples of walks
in this graph.
𝑎, {𝑎, 𝑏}, 𝑏, {𝑏, 𝑑}, 𝑑, {𝑑, 𝑒}, 𝑒, {𝑒, 𝑓}, 𝑓 a walk of length 4 from 𝑎 to 𝑓
𝑎, {𝑎, 𝑏}, 𝑏, {𝑏, 𝑑}, 𝑑, {𝑑, 𝑐}, 𝑐, {𝑐, 𝑏}, 𝑏, {𝑏, 𝑒}, 𝑒 a walk of length 5 from 𝑎 to 𝑒
𝑏, {𝑏, 𝑒}, 𝑒, {𝑒, 𝑓}, 𝑓, {𝑒, 𝑓}, 𝑒, {𝑒, 𝑑}, 𝑑 a walk of length 4 from 𝑏 to 𝑑
𝑎, {𝑎, 𝑏}, 𝑏, {𝑏, 𝑐}, 𝑐, {𝑐, 𝑎}, 𝑎 a closed walk of length 3 from 𝑎 to 𝑎
𝑎, {𝑎, 𝑏}, 𝑏, {𝑏, 𝑐}, 𝑐, {𝑐, 𝑏}, 𝑏, {𝑏, 𝑐}, 𝑐, {𝑐, 𝑎}, 𝑎 a closed walk of length 5 from 𝑎 to 𝑎
𝑎, {𝑎, 𝑐}, 𝑐, {𝑐, 𝑏}, 𝑏, {𝑏, 𝑑}, 𝑑, {𝑑, 𝑒}, 𝑒, {𝑒, 𝑏}, 𝑏, {𝑏, 𝑎}, 𝑎 a closed walk of length 6 from 𝑎 to 𝑎
𝑏, {𝑏, 𝑐}, 𝑐, {𝑐, 𝑑}, 𝑑, {𝑑, 𝑏}, 𝑏, {𝑏, 𝑎}, 𝑎, {𝑎, 𝑐}, 𝑐, {𝑐, 𝑑}, 𝑑, {𝑑, 𝑒}, 𝑒 a walk of length 7 from 𝑏 to 𝑒
𝑐 a walk of length 0 from 𝑐 to 𝑐
𝑏
𝑎
𝑓
𝑒
𝑐
𝑑
• We are even allowed to move along an edge more than once, and there is no
requirement to always move along a particular edge in the same direction. For
example, we could move along the edge {𝑎, 𝑏} from 𝑎 to 𝑏, and later in the walk we
could move along it from 𝑏 to 𝑎 (and/or we could go along it from 𝑎 to 𝑏 again).
– Note that, if we use an edge twice, we must necessarily use each of its end-
points at least twice too. But it is possible, in many graphs, to visit some
vertices more than once without using any edge more than once.
For simple graphs (which are our focus), there can only be one edge between two
specific vertices. So, if our walk steps from vertex 𝑎 to vertex 𝑏, then it must do so
along the edge {𝑎, 𝑏}, and there is only one such edge. So, in fact, the edge we use to go
from 𝑎 to 𝑏 is completely determined by the two vertices. This means that, to specify a
walk in a simple graph, it is sufficient to specify its sequence of vertices. So, our walk
in (11.1) could be specified by just listing the vertices in order:
𝑣 = 𝑣0 , 𝑣1 , 𝑣2 , … , 𝑣𝑘−1 , 𝑣𝑘 = 𝑤.
Let us do this for our examples of walks for the graph in Figure 11.4.
But if we are walking in graphs that may have multiple edges, then we need to specify
the edges between the vertices as well as the vertices themselves.
A walk is closed if its start and end vertices are the same. With our above notation,
this is when 𝑣 = 𝑤, i.e., 𝑣0 = 𝑣𝑘 . In our examples of walks in the graph of Figure 11.4,
the closed walks are the following.
404 G R A P H T H E O RY i
𝑎, 𝑏, 𝑐, 𝑎
𝑎, 𝑏, 𝑐, 𝑏, 𝑐, 𝑎
𝑎, 𝑐, 𝑏, 𝑑, 𝑒, 𝑏, 𝑎
𝑐
We now consider walks that are more restricted, in the sense that we forbid one or
both of the two freedoms we mentioned above.
A trail is a walk in which no edge is used more than once. Using (11.1), this means
that all the edges
{𝑣0 , 𝑣1 }, {𝑣1 , 𝑣2 }, {𝑣2 , 𝑣3 }, … , {𝑣𝑘−1 , 𝑣𝑘 }
are different. We cannot get around this restriction by going along an edge once in one
direction and then again later in the other direction; this still counts as using the edge
twice, which is forbidden for trails.
So, for trails, the second of our two walking freedoms (i.e., the freedom to use an
edge more than once) is prohibited. But we are still allowed to repeat vertices if we wish
(although we don’t have to).
In the graph of Figure 11.4, the following walks are trails:
𝑎, 𝑏, 𝑑, 𝑒, 𝑓
𝑎, 𝑏, 𝑑, 𝑐, 𝑏, 𝑒
𝑎, 𝑏, 𝑐, 𝑎
𝑎, 𝑐, 𝑏, 𝑑, 𝑒, 𝑏, 𝑎
𝑐
The following walks are not trails, for the reasons indicated.
𝑏, 𝑒, 𝑓, 𝑒, 𝑑 re-uses edge {𝑒, 𝑓}
𝑎, 𝑏, 𝑐, 𝑏, 𝑐, 𝑎 re-uses edge {𝑏, 𝑐}
𝑏, 𝑐, 𝑑, 𝑏, 𝑎, 𝑐, 𝑑, 𝑒 re-uses edge {𝑐, 𝑑}
A closed trail is just a closed walk that is also a trail. So, it finishes at the same
vertex where it starts, and uses each edge at most once (with repeat visits to vertices
being allowed).
In our list of examples for the graph of Figure 11.4, the following walks are closed
trails:
𝑎, 𝑏, 𝑐, 𝑎
𝑎, 𝑐, 𝑏, 𝑑, 𝑒, 𝑏, 𝑎
𝑐
A path is a trail in which no vertex or edge is repeated. So, both our walking
freedoms (reusing vertices and reusing edges) are now prohibited.
In the graph of Figure 11.4, the following walks are paths:
𝑎, 𝑏, 𝑑, 𝑒, 𝑓
𝑐
11.8 M O V i N G A R O U N D 405
Note that the path 𝑎, 𝑏, 𝑑, 𝑒, 𝑓 from 𝑎 to 𝑓 is not a shortest path from 𝑎 to 𝑓. The
shortest path from 𝑎 to 𝑓 is 𝑎, 𝑏, 𝑒, 𝑓, which has length 3. In this graph, there is a unique
shortest path from 𝑎 to 𝑓, but this does not always happen. For example, there are two
shortest paths from 𝑎 to 𝑑.
To help remember the distinctions between walks, trails and paths, imagine hiking
in the countryside and think of how these three terms get stronger in what they imply
about where you can go. You may be able to walk wherever you like2 ; it’s something
you do, rather than something that is laid out for you. A trail may be a simple, rough
track, more definite than an arbitrary walk but maybe lacking in official status. People
will tend to stay on a trail rather than wander off it. A path may be a more definite,
and may have been laid down by others or by some authority. Perhaps it has a special
surface; perhaps there is more expectation that you stick to it; perhaps it is more well
documented. So, in ordinary usage, the terms “walk”, “trail” and “path” get stronger,
and more restrictive, as you go from one to the next. Similarly, in graphs, the terms get
stronger and more restrictive, too. (The analogy here is very loose. It’s only designed
to help remember the order in which the terms get more restrictive. But this should
still help you remember which is which.)
Whenever there is a walk between vertices two vertices in a graph, there is also a
path between them. In fact, the shortest walk between two vertices is always a path.
(Why?)
The distance between two vertices is the length of the shortest path between them.
This is the same as the length of the shortest walk between them, by the observations
in the previous paragraph.
The prohibition on repeating vertices in paths means that a path cannot start and
end at the same vertex. But we also want a way to talk about “paths” that can do that.
A cycle is a closed trail in which no vertex except the first and last is repeated.
Because it is closed, the first and last vertex must be the same, but that vertex must
not appear anywhere else on the cycle. We might think of a cycle as a “closed path”
although, strictly speaking, that term is self-contradictory as a path cannot be closed.
Although a cycle is much more restricted, in its allowed repetitions, than an arbitrary
closed walk, it is interesting to note that, when their lengths are odd, the existence of
one implies the existence of the other.
Theorem 60.60 Let 𝐺 be any graph. 𝐺 has an odd closed walk if and only if it has an
odd cycle.
Proof.
(⟸)
2 subject, of course, to any rules in force, e.g., in nature reserves or on private property
406 G R A P H T H E O RY i
Suppose 𝐺 has an odd cycle. Now, every cycle is a closed walk. So an odd closed
cycle is an odd closed walk. Therefore 𝐺 has an odd closed walk.
(⟹)
Suppose 𝐺 has an odd closed walk. Let 𝑊 be the shortest odd closed walk in 𝐺.
The sequence of vertices of 𝑊 may be written as
𝑣0 , 𝑣1 , 𝑣2 , … , 𝑣𝑘−1 , 𝑣𝑘 = 𝑣0 ,
where 𝑘 is odd.
If no vertex of 𝐺 except the start/end vertex repeats in 𝑊, then 𝑊 is already an
odd cycle, and we are done.
We now prove, by contradiction, that no vertex of 𝐺 except the start/end vertex
repeats in 𝑊. From this it will follow that 𝑊 is actually an odd cycle.
Assume, by way of contradiction, that there is a vertex that is repeated in 𝑊. We
don’t want 𝑣0 = 𝑣𝑘 to count as a repeat here, so we actually mean a vertex in the slightly
shorter list
𝑣0 , 𝑣1 , 𝑣2 , … , 𝑣𝑘−1
that is repeated.
Let 𝑣𝑖 be the first vertex in this shorter list that appears again later in the list
(but not earlier). Suppose it appears later in the shorter list as 𝑣𝑗 . So 𝑣𝑖 = 𝑣𝑗 , and
0 ≤ 𝑖 < 𝑗 ≤ 𝑘 − 1.
Using the vertices 𝑣𝑖 and 𝑣𝑗 , we can divide the closed walk 𝑊 into two shorter closed
walks:
• 𝑣𝑖 , 𝑣𝑖+1 , … , 𝑣𝑗−1 , 𝑣𝑗 .
Recall that the length of 𝑊 is odd. Since we just divided 𝑊 into two “subwalks”, the
length of 𝑊 must equal the sum of the lengths of those two subwalks. Therefore, one
of those subwalks has odd length and the other has even length. Both are closed walks,
so the one that has odd length is an odd closed walk. Since both the subwalks are
shorter than 𝑊, we have constructed an odd closed walk in 𝐺 that is shorter than
𝑊. This contradicts our choice of 𝑊 as the shortest odd closed walk in 𝐺. So our
initial assumption, that there is a vertex 𝑣𝑖 with 𝑖 < 𝑘 that is repeated in 𝑊, is wrong.
Therefore there is no vertex repetition in 𝑊 except for 𝑣0 = 𝑣𝑘 . Therefore 𝑊 is an odd
cycle.
This theorem would not be true if we replaced “odd” by “even” in the theorem
statement. To see this, consider the complete graph on two vertices, 𝐾2 . Call its vertices
𝑣 and 𝑤. This graph has no cycles at all (even or odd). But it has an even closed walk:
𝑣, 𝑤, 𝑣. This closed walk is not a cycle, because the edge {𝑣, 𝑤} is used twice.
11.9 C O N N E C T i V i T Y 407
It is instructive to read through the proof of Theorem 60 again to find the point
where the proof would break down if “odd” were replaced by “even” throughout.
A cycle in a graph is Hamiltonian if it includes every vertex. Hamiltonian cycles
are used in planning routes which must visit every location in some set, without visiting
any location twice, and returning to the starting point. For example, suppose you must
visit every city in some region exactly once, returning to your starting point, while
using available transport links (roads or train lines, as the case may be) to go between
cities. Then your problem is to find a Hamiltonian cycle in the graph whose vertices
represent cities with edges representing available transport links between cities. In
practice, each edge may have an associated distance or cost, and you may also want to
find a Hamiltonian cycle with minimum total cost (assuming the total cost of the cycle
is the sum of the costs of its edges). The problem of finding the Hamiltonian cycle of
minimum total cost goes back to the 19th century and is traditionally known as the
Travelling Salesman Problem (TSP).
11.9 CONNECTiViTY
• It is reflexive because, for any vertex 𝑣, the zero-length walk relates 𝑣 to itself, so
𝑣𝑅𝑣.
• It is transitive because, if 𝑢𝑅𝑣 and 𝑣𝑅𝑤, then we have a walk from 𝑢 to 𝑣 and
another walk from 𝑣 to 𝑤, and we can put them together at 𝑣 to make a walk from
𝑢 to 𝑤, showing that 𝑢𝑅𝑤.
11.10 B i PA RT i T E G R A P H S
A graph 𝐺 = (𝑉, 𝐸) is bipartite if its vertex set 𝑉 caqn be written as a disjoint union
𝑉 = 𝐴 ⊔𝐵
of two parts, 𝐴 and 𝐵, such that every edge of the graph has one endpoint in 𝐴 and the
other in 𝐵.3
We often draw bipartite graphs with all the vertices in one part on the left side and
all the vertices in the other part on the right, with edges going between them as required.
Two examples are given in Figure 11.5.
(a) (b)
Figure 11.5: Two bipartite graphs. The one on the right, (b), is the complete bipartite graph 𝐾2,3 .
3 This might not be a partition of 𝑉 because one of the parts may be empty, although that could only happen
if 𝐺 had no edges. As long as the graph has at least one edge, then the sets 𝐴 and 𝐵 are both nonempty
and they form the two parts of a partition of 𝑉.
11.10 B i PA RT i T E G R A P H S 409
(⟸)
Suppose 𝐺 = (𝑉, 𝐸) has a 2-colouring 𝑓 ∶ 𝑉 → {Black, White}. Let 𝐴 = 𝑓 −1 (Black)
be the set of all vertices coloured Black, and let 𝐵 = 𝑓 −1 (White) be the set of all vertices
coloured White. Every vertex of 𝑉 belongs to one of these two sets, since every vertex
is mapped to one of those two colours. Furthermore, no vertex can belong to both sets,
because 𝑓 is a function and therefore cannot assign two different values to any element of
its domain. Therefore 𝑉 = 𝐴⊔𝐵. Also, since adjacent vertices get different colours under
𝑓, adjacent vertices must belong to different sets (𝐴 or 𝐵); they cannot both belong to
the same set, for then they would be given the same colour by 𝑓. These properties of
the two parts 𝐴 and 𝐵 show that 𝐺 is bipartite with 𝐴 and 𝐵 as the two sides.
Theorem 62.
62 A graph is bipartite if and only if it has no odd closed walk.
Proof.
(⟹)
410 G R A P H T H E O RY i
If 𝐺 is bipartite, then every walk must alternate between vertices in 𝐴 and vertices
in 𝐵. So, it goes from a vertex in 𝐴, to a vertex in 𝐵, to one in 𝐴, to one in 𝐵, and so
on, possibly with repetition of vertices and edges. At every stage, if we are at a vertex
in 𝐴, then two steps later we are back in 𝐴. This alternation ensures that a closed walk
has even length. Therefore 𝐺 has no odd closed walk.
(⟸)
If 𝐺 has no odd closed walk, then let us colour the vertices of 𝐺 as follows. In each
component of 𝐺, we do the following.
1. Choose one vertex in the component as the “ground vertex” of that component;
this will be the first vertex in the component to be coloured. Call it 𝑔.
2. Colour 𝑔 Black.
Once this is done, for all components of 𝐺, we will have given a colour to every
vertex of 𝐺.
We claim that this is a 2-colouring of 𝐺. We prove this by contradiction.
Assume, by way of contradiction, that our assignment of colours is not a 2-colouring
of 𝐺. Then there must be two adjacent vertices that get the same colour. Let 𝑋 be
this colour, which could be Black or White. Let 𝑣 and 𝑤 be these two adjacent vertices,
each with colour 𝑋 . Since they are adjacent, they must belong to the same component
of 𝐺. Let 𝑔 be the ground vertex for that component.
Key observation: the distances from 𝑔 to 𝑣, and from 𝑔 to 𝑤, have the same parity.
In other words, they are both even or both odd. This is because the colours of 𝑣 and
𝑤 are completely determined by the parity of their distances from 𝑔, so the fact that
they get the same colour implies that the parities of these distances are the same. Since
these two distances have the same parity, their sum is even.
Let 𝑃 be a shortest path in 𝐺 from 𝑔 to 𝑣, and let 𝑄 be a shortest path in 𝐺 from 𝑔
to 𝑤. The lengths of these two paths have the same parity, as explained in the previous
paragraph. So
length of 𝑃 + length of 𝑄
is even. Now consider the walk formed by starting at 𝑣, going along 𝑃 to 𝑣, then
stepping along edge {𝑣, 𝑤} to 𝑤, then going along 𝑄 from 𝑤 back to 𝑔, finishing up at
𝑔. This is a closed walk, since it finishes where it starts. Its length is
length of 𝑃 + length of 𝑄 + 1,
11.11 E U L E R T O U R S 411
where we now have an extra +1 because of the edge {𝑣, 𝑤}. We have already seen that
the sum of the lengths of 𝑃 and 𝑄 is even. It follows that the length of this closed walk
is odd.
So we have constructed an odd closed walk in 𝐺. But this is a contradiction, since
this part of the proof starts with a graph with no odd closed walk. Therefore our
assumption, that our assignment of colours is not a 2-colouring of 𝐺, is wrong. Therefore
it is, in fact, a 2-colouring. Therefore 𝐺 is 2-colourable. Therefore 𝐺 is bipartite (using
Theorem 61).4
Corollary 63.
63 A graph is bipartite if and only if it has no odd cycle.
An Euler tour is a closed trail that uses every edge. Since it is a trail, this means each
edge gets used exactly once. Since it is closed, it finishes at its starting vertex.
This concept goes back to what is often considered to be the start of graph theory.5
In the city of Königsberg in Prussia (now Kaliningrad, in Russia), there were seven
bridges linking two islands in the River Pregel to each other and to the two sides of the
river. It was a popular pastime to try to do a walk which crossed each bridge exactly
once and returned to its starting point. No-one had succeeded in doing this, so it was
widely believed to be impossible, but no-one had pinned down why it couldn’t be done.
The great Swiss-German mathematician Leonhard Euler tackled this problem, modelled
it using a graph, developed some theory, and solved it, showing that such a tour around
the seven bridges was impossible. (Euler also developed some of the number theory we
have done: § 7.12 and Theorem 38.) He first presented his work to the St Petersburg
Academy of Sciences in 1735 and it was published in 1736.
A map showing the seven bridges, taken from Euler’s paper, is shown in Figure 11.6.6
From this geographic setting, Euler constructed the graph shown in Figure 11.7.
This is a classic case of abstraction: identifying the essential elements of the problem
at hand (the relationship between the bridges and the land masses) and omitting all
4 We only used 2-colourings in this part of the proof since it seemed easiest and neatest to write about giving
colours to vertices rather than putting them in 𝐴 or 𝐵. But it would not have made much difference. Had
we done everything in terms of 𝐴 and 𝐵, we would not have needed to appeal to Theorem 61) at the end.
5 This may be true for graph theory as a field of study. But there are graph-theoretic definitions and concepts
going back much further in time. For example, connectivity is used in the ancient game of Go, known as
Igo in Japan, Baduk in Korea and Wéiqí in China. This game was invented in China about 2,500 years
ago, and can be played on any graph.
6 Diagram taken from: Leonhard Euler, Solutio problematis ad geometriam situs pertinentis, Commentarii
Academiae Scientiarum Imperialis Petropolitanae, vol. 8 (1736) 128–140, with figures between pp. 158
& 159; this copy is from the copy in the Biodiversity Heritage Library, https://www.biodiversitylibrary.
org/.
412 G R A P H T H E O RY i
Figure 11.6: Diagram of the Seven Bridges of Königsberg, from Leonhard Euler’s paper published
in 1736.
irrelevant information (the sizes and shapes of the land masses, the lengths and widths
of the bridges, the curves of the river, etc.).
𝑐 𝑔
𝑑
𝑒
𝐴 𝐷
𝑏
𝑎 𝑓
The graph is a precise model of the Königsberg Bridges problem, since a walk around
all the bridges of the required type exists if and only if the graph has an Euler tour.
At this point, you can try to construct an Euler tour for this graph and, through
this exploration, try and gain some insight as to why it is not possible.
Euler not only solved the problem for the Königsberg Bridges graph, but provided a
general characterisation of graphs that have Euler tours, giving a simple test that does
not require an exhaustive search.
11.11 E U L E R T O U R S 413
Proof.
(⟹)
Let 𝐺 be a connected graph with an Euler tour. An Euler tour is a trail, and when a
trail passes through a vertex, it uses two of the edges incident with that vertex: one on
the way in, and one on the way out. An Euler tour uses each edge of the graph exactly
once. It follows that it must revisit each vertex as many times as necessary in order to
use each of its incident edges exactly once, and it uses two edges for each visit, so the
number of incident edges at the vertex must be a multiple of 2. Therefore each vertex
degree is even.
This reasoning applies to the vertex where the tour starts and ends, too, provided
we consider the tour to be entering the vertex at the end (via the last edge of the tour)
and leaving it at the start (via the first edge of the tour). Alternatively, we can treat
that vertex as a special case, noting that we use one of its incident edges at the start and
another at the very end, making two edges so far. Then, every other visit to that vertex
uses two of its incident edges, so again, the number of incident edges at that vertex must
be a multiple of 2, by the reasoning of the previous paragraph.
(⟸)
Let 𝐺 be a connected graph in which every vertex has even degree.
We give an algorithm for constructing an Euler tour for 𝐺.
1. Initialisation:
a) Choose any vertex 𝑣 of 𝐺.
b) Let 𝑇 be the trivial closed trail consisting just of the vertex 𝑣, with no edges.
3. The algorithm finishes when there is no edge of 𝐺 that has not been used by the
closed trail 𝑇. Then, the closed trail 𝑇 necessarily uses each edge of 𝐺 exactly
once, and it is therefore an Euler tour. So we output 𝑇.
would prefer to not walk again along any stretch of road where you have already done
your deliveries, because that seems like wasted time and effort.
You can model this situation by a graph, where vertices represent intersections of
streets and edges represent segments of streets between intersections (with these seg-
ments having no other intersections in the middle of them, otherwise they are really
more than one segment). You would like to find an Euler tour in this graph.
The Euler tour problem has various generalisations that also have practical applica-
tions. For example, you could try to find a closed walk which includes every edge and
re-uses the fewest edges. Or the edges might be weighted, and you could try to find the
closed walk of minimum total weight.
If a connected graph 𝐺 does not have an Euler tour, then it must have some vertices
of odd degree. By Corollary 59, the number of vertices of odd degree must be even.
Suppose that the number of vertices of odd degree is 2𝑘, where 𝑘 ∈ ℕ. We can, if we
wish, pair these odd-degree vertices off with each other, however we like. Let us call the
vertices in the 𝑖-th pair 𝑣𝑖 and 𝑤𝑖 . So the pairs are
𝑣1 and 𝑤1 ,
𝑣2 and 𝑤2 ,
⋮ ⋮ ⋮
𝑣𝑘 and 𝑤𝑘 ,
making 2𝑘 odd-degree vertices altogether. Note, each of these pairs may or may not be
adjacent.
Now suppose we add new edges linking each of these pairs of vertices: one between
𝑣1 and 𝑤1 , another between 𝑣2 and 𝑤2 , and so on, making 𝑘 new edges altogether. This
may create some multiple edges, but that’s ok. Once we’ve done this, all degrees are
even. By Theorem 64 (or rather, its extension to multigraphs), this new graph has an
Euler tour. This tour includes each of the edges of 𝐺 exactly once, plus each of the 𝑘
new edges exactly once. If we then delete all those new edges, then the tour is broken
into 𝑘 trails such that each edge of 𝐺 appears exactly once in exactly one of these trails
(and not at all in any of the others). In other words, these trails give a partition of the
edges of 𝐺. Each of these 𝑘 Euler trails must start at one vertex of odd degree and end
at another vertex of odd degree.
We have shown that the edge set of a graph 𝐺 with 2𝑘 vertices of odd degree can
be partitioned into 𝑘 trails, and that the 2𝑘 endpoints of the trails are precisely the 2𝑘
odd-degree vertices of the graph. Furthermore, it is not possible to partition the edges
of 𝐺 into fewer than 𝑘 trails. (Proving this, using the concepts we have been discussing,
is a good exercise.)
We have outlined the proof of the following extension of Theorem 64.
416 G R A P H T H E O RY i
Theorem 65. 65 For any 𝑘 ∈ ℕ, the edges of a graph can be partitioned into 𝑘 trails if
and only if the graph has at most 2𝑘 vertices of odd degree. The open trails in such a
partition start and end at odd-degree vertices. □
11.12 EXERCiSES
1. Let 𝑉 and 𝐸 be two finite sets. We’ll call the members of 𝑉 “vertices” and the
members of 𝐸 “edges”.
Suppose that
• 𝑣, 𝑤 and 𝑧 are variables with domain 𝑉;
• incident is a binary predicate whose first argument has domain 𝑉 and whose second
argument has domain 𝐸.
It is our intention to use these ingredients (i.e., the given sets 𝑉 and 𝐸, the variables
and the predicate) to describe graphs and some of their properties. In particular, we
intend that incident(𝑣, 𝑒) means that the vertex 𝑣 is incident with the edge 𝑒. But we
need to establish logical rules to ensure that these intentions are actually carried out.
(b) No edge is incident with exactly one vertex (i.e., there are no loops).
2. Consider the four types of graph representation we looked at in § 11.4𝛼 : edge lists,
adjacency matrices, adjacency lists, and incidence matrices. Suppose that
• the graphs we want to represent have 𝑛 vertices and 𝑚 edges,
• we represent vertex numbers in binary, using ⌊log2 𝑛⌋ + 1 bits for each vertex
number. (This is the number of bits required to represent 𝑛 in binary, and we
assume that smaller numbers have extra leading zeros if necessary to ensure that
all vertex numbers have the same number of bits.)
11.12 E X E R C i S E S 417
Under these assumptions, determine the total number of bits used to represent a graph
using
(a) an edge list, with the vertex sets and edge sets listed in full;
3. Rewrite the proof of Theorem 56 so that the two cases are based on whether or
not the graph has an isolated vertex.
4. Determine (a) the number of edges, and (b) the average degree of the graph of the
caffeine molecule.
Here, vertices represent atoms and edges represent bonds. The graph has some
multiple edges, in the form of double bonds. Its chemical formula is C8 H10 N4 O2 . In
molecules, the valency of an atom is the number of bonds it has; this is just the degree
of the corresponding vertex of the graph that represents the molecule.7 The valencies of
Carbon, Nitrogen, Oxygen and Hydrogen are 4,3,2,1, respectively.
5. Prove that the shortest walk between two vertices is always a path.
6. Prove that every graph with minimum degree at least 2 has a cycle.
7. Prove that the distance between two vertices in a graph satisfies the triangle inequality.
inequality
This means that, for every triple of vertices 𝑢, 𝑣, 𝑤 in any graph 𝐺,
8. How many 2-colourings does a bipartite graph with 𝑛 vertices, 𝑚 edges and 𝑘
components have?
Consider the four-vertex multigraph in Figure 11.8. Determine if it has a trail that
includes every edge, and also if it has an Euler tour.
7 In fact, some graph theorists borrowed the term valency from chemistry and used it instead of “degree”,
but that is uncommon these days.
418 G R A P H T H E O RY i
10. Each of the five platonic solids (tetrahedron, cube, octahedron, dodecahedron,
icosahedron) has a skeleton consisting of its vertices and edges, which is a graph. In fact,
the graph terminology “vertex” and “edge” came from their use for polyhedra.
For each of these five graphs: find a Hamiltonian cycle; determine if it is Eulerian,
and if it is, find an Euler tour.
11. Let 𝑄𝑛 be the graph whose vertices are all strings of 𝑛 bits, with two vertices
being adjacent if they differ in just one bit.
(c) Describe a method for constructing a Hamiltonian cycle in 𝑄𝑛 , for each 𝑛. Think
recursively.
12. Using Theorem 64 for simple graphs, how would you prove it for multigraphs
(where multiple edges are allowed)?
To answer this, don’t re-do the proof of Theorem 64. Treat that theorem just as
a black box. So, your task is to show that a connected multigraph is Eulerian if and
only if every vertex has even degree, using Theorem 64 somehow. Since Theorem 64 is
only about simple graphs, you should try to construct, from any given multigraph 𝐺, a
simple graph 𝐻 such that applying Theorem 64 to 𝐻 helps prove this extension of the
theorem for 𝐺.
12
G R A P H T H E O RY I I
We continue our study of graphs by looking at two of the most important classes of
graphs: trees (and the closely related class of forests), which are ubiquitous in com-
puter science, and planar graphs, which are fundamental in many applications including
network layout and information visualisation.
We finish our exploration of graph theory by focusing on fun, through some games
that can be played on any graph. These games have a rich theory and are the source of
many good puzzles and challenges as well as being interesting to play.
12.1𝛼 TREES
Some of the contexts in which trees are used as abstract models include:
419
420 G R A P H T H E O RY i i
• text analysis. The classical example of this is the use of trees in grammars, for
human languages or programming languages. A parse tree shows how a string of
text can be generated according to grammatical rules.
In some of these examples, the trees have a special vertex called the root on which
everything else ultimately depends. For example, in a directory hierarchy in Unix or
Linux, there is a root directory, denoted simply by /, which (unlike all other directories)
1 assuming each employee has only one supervisor
12.2 P R O P E RT i E S O F T R E E S 421
12.2 P R O P E RT i E S O F T R E E S
When mentioning communications networks (p. 420 in § 12.1𝛼 ), we said that trees arise
as minimal networks that keep everything connected. We now formalise and prove this,
as a general statement about trees.
Suppose a graph 𝐺 = (𝑉, 𝐸) is connected, but deleting any edge of the graph discon-
nects it (but we retain all the vertices). Such a graph can be called a minimal connected
graph on 𝑉, since it’s connected but every proper subgraph with the same vertex set
is disconnected. It turns out that minimal connected graphs on 𝑉 are the same as trees
with vertex set 𝑉.
Theorem 66.
66 A graph 𝐺 = (𝑉, 𝐸) is a tree if and only if it is a minimal connected
graph on 𝑉.
Proof.
(⟹)
Let 𝐺 be a tree. By definition, it is connected. If it is not a minimal connected
graph on 𝑉, then there is some edge 𝑒 of 𝐺 such that the graph remains connected even
if 𝑒 is deleted. Let 𝑣 and 𝑤 be the endpoints of 𝑒, so that 𝑒 = {𝑣, 𝑤}. Since 𝐺 remains
connected after 𝑒 is deleted, there must be a path from 𝑣 to 𝑤 that does not use the
edge 𝑒. But adding 𝑒 to this path creates a cycle in 𝐺. Since 𝐺 contains a cycle, it is
not a tree, which is a contradiction.
(⟸)
Let 𝐺 be a minimal connected graph on 𝑉.
Since 𝐺 is already connected, we only need to show that it has no cycle. Then we
will know it is connected and has no cycle, which means it’s a tree.
Assume, by way of contradiction, that 𝐺 has a cycle 𝐶. Let 𝑒 be any edge in 𝐶.
Now, let 𝑣 and 𝑤 be any vertices in 𝐺. Since 𝐺 is connected, there is a walk from 𝑣 to
𝑤 in 𝐺. If this walk includes 𝑒, then we can construct an alternative walk from 𝑣 to 𝑤
that avoids 𝑒 by going all the way round 𝐶 instead. Since we can do this for any 𝑣 and
𝑤, it follows that every pair of vertices in 𝐺 are connected by a walk that avoids 𝑒.
Therefore, in the proper subgraph of 𝐺 obtained from it by deleting 𝑒, every pair of
vertices are connected by a walk (and hence by a path). Therefore this proper subgraph
of 𝐺 is connected. It also has the same vertex set, 𝑉, as 𝐺. So this subgraph contradicts
the minimality of 𝐺. Therefore our assumption, that 𝐺 has a cycle, was wrong. So 𝐺
has no cycle.
422 G R A P H T H E O RY i i
Trees are very diverse. The two most “extreme” trees on 𝑛 vertices are
• the path graph 𝑃𝑛−1 , with 𝑛 vertices and 𝑛−1 edges. This tree contains the longest
path of any tree on 𝑛 vertices. No other tree on 𝑛 vertices has a path of length
𝑛 −1. The maximum vertex degree of 𝑃𝑛−1 is 2; every other tree on 𝑛 vertices has
higher maximum degree.
• the star graph consisting of one central vertex and 𝑛 − 1 leaves, each adjacent to
the central vertex. Using complete bipartite graph notation, this is 𝐾1,𝑛−1 . By
contrast with the path graph, its paths are extremely short. It has no path of
length > 2; no other tree has this property. Its maximum degree is 𝑛 − 1; every
other tree on 𝑛 vertices has lower maximum degree.
Figure 12.2 shows three trees on five vertices: a path, a star, and another shown between
them. In fact, this is a complete list of the structures that trees on five vertices can have;
every tree on five vertices may be identified as one of these three (possibly after renaming
and redrawing to make the relationship clear).
𝑃4 𝐾1,4
See if you can go beyond Figure 12.2 by drawing all possible trees on six vertices.
You will start to see more of the variety of shapes they have. You should also study their
12.2 P R O P E RT i E S O F T R E E S 423
structure and see if you can make some general observations about the structure of trees.
How many edges does a tree on 𝑛 vertices have? What values can their average degree
take? What values can the maximum degree of a vertex in a tree take? What values
can the minimum degree take? What possible values can the length of the longest path
take?
We now answer some of these questions. First, we consider minimum degree.
Theorem 67.
67 Every tree with at least two vertices has a leaf
Proof. See Exercise 11.6.
Inductive Basis:
When 𝑛 = 1, we only have one vertex, and there is only one graph with one vertex,
and it has no edge. This is the simplest tree, and its number of edges is 0, which is 1−1,
so the number of edges is indeed 𝑛 − 1 in this case. So 𝑃(0) is true.
Inductive Step:
Let 𝑘 ≥ 1.
Assume that 𝑃(𝑘) holds, i.e., that every tree on 𝑘 vertices has 𝑘 − 1 edges. This is
our Inductive Hypothesis.
We need to show that 𝑃(𝑘 + 1) holds. This is a statement about all trees on 𝑘 + 1
vertices, so it’s a universal statement. So our proof strategy for the Inductive Step is
to consider a general tree on 𝑘 + 1 vertices and prove that it has the required property,
looking out for a chance to apply the Inductive Hypothesis.
Let 𝑇 be any tree on 𝑘 + 1 vertices. By Theorem 67, 𝑇 has a leaf.
We now make a new tree 𝑆 from 𝑇 by deleting a leaf from it (which means we remove
the leaf vertex and also its incident edge). 𝑆 is indeed a tree, because removing a leaf
from a graph never disconnects it and never creates a cycle.
This new tree 𝑆 has one fewer vertex and one fewer edge than 𝑇, because of the
deletion of the leaf (and its incident edge). Since 𝑇 has 𝑘 + 1 vertices, this means 𝑆 has
𝑘 vertices. This in turn means that we can apply the Inductive Hypothesis to 𝑆.
By the Inductive Hypothesis, 𝑆 has 𝑘 − 1 edges.
Now we use the relationship between 𝑆 and 𝑇. Since 𝑆 has one fewer edge than 𝑇
(due to the way it was constructed from 𝑇), the fact that 𝑆 has 𝑘 − 1 edges tells us that
the number of edges of 𝑇 is
(𝑘 − 1) + 1 = 𝑘.
424 G R A P H T H E O RY i i
Conclusion:
Therefore, by the Principle of Mathematical Induction, 𝑃(𝑛) holds for all 𝑛 ∈ ℕ.
12.3 FORESTS
𝑚𝑖 = 𝑛𝑖 − 1.
2 But the graph in Figure 11.5(b) is certainly not a tree or a forest.
12.4 S PA N N i N G T R E E S 425
12.4 S PA N N i N G T R E E S
• is a tree, and
In Figure 12.3, we show a graph at the top, followed by two further drawings of it
with two different spanning trees of the graph. (This graph is the same as the one in
Figure 11.4.) See if you can find some other spanning trees for this graph.
Since a tree is a minimal connected graph on its vertex set (Theorem 66), a spanning
tree of 𝐺 is a minimal connected subgraph of 𝐺 that includes every vertex of 𝐺. In other
words, it’s a subgraph of 𝐺 in which every vertex is connected to every other vertex but
deleting any edge from the subgraph disconnects the subgraph.
We can find a spanning tree of a connected graph 𝐺 by deleting edges (but not ver-
tices) until we can’t delete any more edges without disconnecting the subgraph. We can
usually do this in many different ways. Often, this process will give different spanning
trees, but sometimes we will get the same spanning tree by deleting the same edges but
doing so in a different order.
For example, in the graph of Figure 12.3, suppose we delete edge {𝑎, 𝑏}, then {𝑐, 𝑑},
then {𝑏, 𝑒}. You can check that, at each stage in this process, the remaining graph is
still connected (and includes all the vertices, since we are not deleting vertices). After
we have deleted all these three edges, we have the spanning tree in the middle diagram.
Order of deletion does not matter here; we could have instead deleted {𝑐, 𝑑}, then {𝑏, 𝑒},
then {𝑎, 𝑏}, and we would still end up with the same spanning tree.
If, instead, we delete {𝑑, 𝑒}, then {𝑎, 𝑐}, then {𝑐, 𝑑}, then we get the spanning tree
shown at the bottom of the figure.
This always works, for any connected graph. It follows that every connected graph
has a spanning tree.
426 G R A P H T H E O RY i i
𝑏
𝑎
𝑓
𝑒
𝑐
𝑑
𝑏
𝑎
𝑓
𝑒
𝑐
𝑑
𝑏
𝑎
𝑓
𝑒
𝑐
𝑑
Figure 12.3: A graph (top), and two of its spanning trees, shown with thick edges (middle and
bottom).
This method of finding a spanning tree starts with the entire graph and deletes edges
until we are left with a spanning tree. Another method of finding a spanning tree in a
graph works in the opposite way: we start with nothing and build up. We now describe
this.
At the start, we pick any edge of the graph. (If the graph has no edge, then it has
just one vertex and nothing else, otherwise it is not connected. A one-vertex graph is
its own spanning tree, so in that case there is no need to do anything.) Then we keep
trying to add edges, to the graph we have built up so far, provided that this does not
create a cycle. While we are doing this, the subgraph we are building must be a forest
(as it has no cycle), but it might not be a connected subgraph, so it might not be a tree.
Eventually, the process stops, when it is no longer possible to add another edge without
creating a cycle. When that happens, it turns out that we will have a spanning tree of
the original graph.
Let us now set this out as an algorithm.
2. 𝑋 ∶= ∅
This algorithm starts with a set of edges containing no cycle, and it only ever adds
edges that do not create a cycle. Therefore the output subgraph 𝐹 , with vertex set 𝑉
and edge set 𝑋 , has no cycles. Therefore it is a forest.
Since 𝐺 is connected, the output subgraph 𝐹 is connected too. (It is a good exercise
to try to prove this.) Since 𝐹 is a connected forest, it is a tree, and since it is also a
subgraph of 𝐺 that includes all vertices of 𝐺, it is a spanning tree of 𝐺.
If 𝐺 is disconnected, then the algorithm can be applied to each component of 𝐺 to
construct a spanning tree in each component.
Suppose you have a set 𝑉 of nodes that need to be joined up somehow into a
network. Applications include planning for transport or communications. There is a set
𝐸 of possible links that could be built between pairs of nodes, and each possible link
carries a cost. How do you find the minimum cost network that connects all the nodes?
We model this by a graph 𝐺 = (𝑉, 𝐸), where the vertices represent nodes in the
network and the edges represent pairs of nodes which could be linked. We assume 𝐺
is connected, otherwise it’s impossible to find a subgraph that connects all the vertices.
For each edge 𝑒 there is a cost 𝑤(𝑒) ∈ ℝ+ which represents the cost of building that link.
We want to find a subset 𝑋 of the edges such that
• (𝑉, 𝑋 ) is connected, so that every pair of vertices has a path using only edges in
𝑋 ; and
• the total cost 𝑤(𝑋 ) is minimum, where the total cost of 𝑋 is just the sum of all
the costs of the edges in 𝑋 :
𝑤(𝑋 ) = 𝑤(𝑒).
𝑒∈𝑋
In order to minimise the total cost, we do not want to include any unnecessary edges
in 𝑋 . Suppose 𝑋 contains an edge 𝑒 such that 𝑋 ∖ {𝑒} still connects all vertices of 𝐺.
Then we prefer to omit 𝑒, since that reduces the total cost by 𝑤(𝑒), the cost of 𝑒. So,
any minimum-cost subgraph (𝑉, 𝑋 ) will not only be a connected subgraph of 𝐺, but will
also be a minimal connected subgraph that includes all vertices of 𝐺. As we remarked
428 G R A P H T H E O RY i i
early in this section, a minimal connected subgraph that includes all vertices is just a
spanning tree of 𝐺. So what we want is a minimum-cost spanning tree of 𝐺.
Our earlier algorithm for finding a spanning tree takes no notice of any costs on the
edges. So, in general, it won’t find a minimum-cost spanning tree.
We can easily modify the algorithm to make use of the edge costs. In the key step
where the next edge is chosen (step 2.1), we could choose the edge of minimum cost,
among all those not in 𝑋 which would not create a cycle if we chose them. This algorithm
is known as Kruskal’s Greedy Algorithm,
Algorithm after Joseph Kruskal who introduced it in
1956. It is greedy because it always makes the choice that gives the greatest immediate
benefit, rather than looking further ahead to see what gives greatest benefit overall.
Here is the Greedy Algorithm in full, which is the same as the previous algorithm
except for the part shown in blue, where we choose the next edge greedily instead of
arbitrarily.
2. 𝑋 ∶= ∅
greedy algorithms to find optimal solutions, the benefits when that does happen can be
enormous. So it is important to be able to recognise those situations. A whole branch of
mathematics, matroid theory, is based on the study of structures for which the greedy
algorithm always finds optimum solutions.
12.5 PLANARiTY
• To help people study graphs visually, they are usually displayed on a screen or
drawn/printed on paper.
• Electrical circuits are laid out on flat surfaces such as circuit boards or wafers, with
wires routed along these surfaces.
• each edge {𝑣, 𝑤} is drawn as a curve between 𝑣 and 𝑤 which does not intersect
itself anywhere, meets 𝑣 and 𝑤 only at the ends of the curve, and meets no other
vertices at all;
• the curves representing different edges cannot meet each other at all, except if the
two edges are incident at a common vertex, in which case they only meet at that
vertex.
• The drawing of 𝐾4 on the left of Figure 11.3 is not planar, since it has an edge-
crossing. But the graph 𝐾4 itself is planar. Can you find a planar drawing of
it?
• The drawings of two bipartite graphs in Figure 11.5 each have edge-crossings, so
they are not planar drawings. But both graphs are planar. Can you show this,
using an appropriate drawing?
• The multigraph in Figure 11.8 is planar, although the drawing given there is not a
planar drawing since there is an edge-crossing in the middle. How can you redraw
it so that there are no crossings?
Null graphs are of course planar, since they have no edges to form crossings with. Paths,
cycles and trees are always planar. What about two other special families: complete
graphs, and complete bipartite graphs?
• complete graphs: 𝐾1 , 𝐾2 , 𝐾3 and 𝐾4 are all planar. For 𝐾1 and 𝐾2 , this is because
they do not have enough edges to form edge-crossings. For 𝐾3 , it is a special case
of the fact that all cycles are planar. Hopefully you have just shown that 𝐾4 is
planar, using a different drawing to the one given on the left in Figure 11.3. But
what about 𝐾5 ?
• complete bipartite graphs: 𝐾1,𝑘 , the star graph, is a tree, and therefore planar.
𝐾2,2 is really the same as the 4-cycle, 𝐶4 , and is therefore planar. Hopefully you
have now redrawn 𝐾2,3 (see Figure 11.5(b)) to show that it is planar. But what
about the next two cases, 𝐾2,4 and 𝐾3,3 ?
Figure 12.4 shows 𝐾5 and 𝐾3,3 .
𝐾5 𝐾3,3
The planarity, or otherwise, of 𝐾5 and 𝐾3,3 has been the basis of numerous puzzles.
12.5 P L A N A R i T Y 431
• Five towns are planning a rail network in which each pair of towns has a direct
route between them. Can this be done without building any crossings?
• An old puzzle based on 𝐾3,3 has the vertices on the left side representing houses
and those on the right side representing utility services, typically water, electricity
and gas. The puzzle is to connect each house to each utility so that none of the
pipes or wires carrying the services to the houses cross each other.
We will soon be able to determine whether or not these two specific graphs are planar
without doing an exhaustive search of the many different ways of drawing them.
Every planar drawing of a graph 𝐺 divides the plane up into regions. Informally,
these are the areas of the plane that are surrounded by vertices and edges but have no
vertices and edges within them. You can imagine cutting the plane along all the edges
(including their endpoints), leaving the plane to fall apart into separate “pieces”. These
pieces are the regions of the plane graph. Each region is referred to as a face of the
plane graph.
Each face has the property that any two points that lie inside a face (and not on any
of the vertices or edges that surround the face) can be joined by a curve that also lies
entirely within the face. So the curve does not meet any vertex or edge of 𝐺. In fact, a
face may be defined formally as a maximal subset of points in the plane such that any
two points in the subset can be joined by a curve that does not meet any vertices or
edges of 𝐺.
The region of the plane “outside” the graph is considered to be a face just as much as
the other regions. It is called the outer face,
face and must not be forgotten when counting
the faces in the graph.
Consider the plane graph in Figure 12.5. This graph divides the rest of the plane
into four faces, shown in the figure as 𝐹1 , 𝐹2 , 𝐹3 and 𝐹4 . Here, 𝐹4 is the outer face.
𝑐
𝑔
𝐹2
𝐹1 𝐹3
𝑎 𝑒 𝑓
𝑏
ℎ
𝑑 𝐹4
Figure 12.5: A plane graph with four faces: 𝐹1 , 𝐹2 , 𝐹3 and the outer face 𝐹4 .
432 G R A P H T H E O RY i i
Each face has a boundary consisting of those vertices and edges that are next to it.
You can walk along this boundary, all around the face, much as you might walk around
the boundary of a park or field. You can do so in either of two directions: one direction
keeps the face on your left as you walk around it, while the other keeps the face on
your right as you walk around it. Either way, and wherever you start, this walk will
eventually return to your starting point, and it actually constitutes a closed walk in the
graph.
The boundaries of the faces in Figure 12.5 are as follows.
face boundary comment
𝐹1 𝑎, 𝑐, 𝑒, 𝑑, 𝑏, 𝑑, 𝑎 uses edge {𝑏, 𝑑} twice
𝐹2 𝑐, 𝑔, 𝑓, 𝑒, 𝑐 this boundary is actually a cycle
𝐹3 𝑓, 𝑔, ℎ, 𝑓 so is this
𝐹4 𝑎, 𝑑, 𝑒, 𝑓, ℎ, 𝑔, 𝑐, 𝑎 outer face; also a cycle in this case.
Consider any edge 𝑒 in a plane graph, and imagine walking along it. As you do so,
you can look to your left or to your right.
• On your left is a face that has that edge in its boundary. We imagine a small
portion of that face that lies alongside the edge on your left, and call it a side of
that edge.
• On your right there is also a face with that same edge in its boundary. Again,
imagine a small portion of that face lying alongside the edge on your right, and
call it a side of that edge.
Often, the faces we see on either side of an edge are two different faces, as in Fig-
ure 12.6(a). But they can also be the same face, as in Figure 12.6(b). In any case, each
edge has two sides, and these sides may belong to different faces or to the same face.
Observe that each edge belongs to at most two faces. If we construct a boundary
walk around every face, then each edge appears exactly twice in the full set of boundary
walks. This could happen in either of two ways:
• The edge might appear once in the boundary walk around one face, and once in
the boundary walk around a different face.
• The edge might appear twice in the boundary walk around a single face.
In each case, the edge appears in no other boundary walk around any other face.
You can, if you wish, check that each edge of the plane graph in Figure 12.5 appears
twice in the set of boundaries of the four faces, using the list of boundary walks we gave
above. For example, edge {𝑒, 𝑓} appears once in the boundary of 𝐹2 and once in the
boundary of 𝐹4 , while edge {𝑏, 𝑑} appears twice in the boundary of 𝐹1 .
The numbers of vertices, edges and faces in a plane graph satisfy an equation which,
again, is due to Euler.
12.5 P L A N A R i T Y 433
𝑎 𝑒 𝑎 𝑒
𝑏 𝑓 𝑏 𝑓
(a) (b)
Figure 12.6: Two plane graphs, each showing the two sides of an edge {𝑐, 𝑑}. (a) The two sides
are in different faces. (b) The two sides are in the same face.
𝑛 − 𝑚 + 𝑓 = 2.
𝑛 − 𝑚 + 𝑓 = 2, (12.3)
Inductive Basis:
We don’t need to consider any 𝑚 < 𝑛 −1, since no graph with fewer than 𝑛 −1 edges
can be connected. This is because trees are minimal connected graphs on a given vertex
set, and they have 𝑛 − 1 edges by Theorem 68.
If 𝑚 = 𝑛 − 1, then 𝐺 must be a tree, since it is connected. When a tree is drawn in
the plane, the only face is the outer face, consisting of the entire plane except for the
points representing the vertices and the edges of the tree. So, in this case, 𝐺 has one
face, so 𝑓 = 1. Then, using 𝑚 = 𝑛 − 1 and 𝑓 = 1, we have
𝑛 − 𝑚 + 𝑓 = 𝑛 − (𝑛 − 1) + 1 = 𝑛 − 𝑛 + 1 + 1 = 2.
Inductive Step:
Let 𝑚 ≥ 𝑛 − 1.
434 G R A P H T H E O RY i i
# faces of 𝐻 = (# faces of 𝐺) − 1 = 𝑓 − 1.
Since we deleted one edge from 𝐺 to get 𝐻 , the number of edges of 𝐻 is also one less
than the number of edges of 𝐺, so
# edges of 𝐻 = (# edges of 𝐺) − 1 = (𝑚 + 1) − 1 = 𝑚.
Although the deletion of 𝑒 has reduced the numbers of edges and faces, it has not changed
the number of vertices:
# vertices of 𝐻 = # vertices of 𝐺 = 𝑛.
Now, since 𝐻 has only 𝑚 edges, we can apply the Inductive Hypothesis to it. So we
have
(# vertices of 𝐻 ) − (# edges of 𝐻 ) + (# faces of 𝐻 ) = 2.
Substituting the expressions we have derived for these three quantities, we have
Therefore
(# vertices of 𝐺) − (# edges of 𝐺) + (# faces of 𝐺) = 2,
so (12.3) holds for 𝐺 as well. This completes the Inductive Step.
3 We are treating this fact as intuitively obvious and not justifying it further. But it is actually surprisingly
deep, and is not trivial to prove rigorously. It is called the Jordan Curve Theorem.
12.5 P L A N A R i T Y 435
Conclusion:
Therefore, by Mathematical Induction, (12.3) holds for all 𝑚.
Corollary 72.
72 For any planar graph 𝐺 with 𝑛 ≥ 3 vertices and 𝑚 edges,
𝑚 ≤ 3𝑛 − 6.
Proof. Since 𝐺 is planar, it has a planar drawing. Let 𝑓 be the number of faces in some
planar drawing of 𝐺.
We are going to use the faces of 𝐺, together with (i) what we can say about the sizes
(i.e., numbers of edges) of each of those faces, and (ii) Euler’s Theorem, to derive an
inequality between 𝑚 and 𝑛.
Since 𝐺 is simple, every face has at least three sides. So, for each face, we can pick
any three of its sides and mark them. A side can only be marked from the face it belongs
to, so each side gets marked at most once. Each edge has exactly two sides, and the
number of sides is twice the number of edges. So the number of sides that get marked
by this process is ≤ 2𝑚. Since the number of marks is 3𝑓, it follows that
3𝑓 ≤ 2𝑚,
which gives
2𝑚
𝑓 ≤ .
3
But Theorem 71 tells us that 𝑛 − 𝑚 + 𝑓 = 2, which after rearranging gives
−𝑛 + 𝑚 + 2 = 𝑓.
This fact is already strong enough to settle our earlier question about the planarity
or otherwise of 𝐾5 .
Corollary 73.
73 𝐾5 is nonplanar.
Proof. 𝐾5 has five vertices and ten edges: 𝑛 = 5 and 𝑚 = 10. We have 3𝑛−6 = 3⋅5−6 =
15 − 6 = 9 < 10, so
𝑚 > 3𝑛 − 6.
436 G R A P H T H E O RY i i
But Corollary 72 is not strong enough to answer our question about 𝐾3,3 .
We can get a stronger bound on the number of edges in a planar graph when the
graph has no triangles.
Corollary 74.
74 For any triangle-free planar graph 𝐺 with 𝑛 ≥ 3 vertices and 𝑚 edges,
𝑚 ≤ 2𝑛 − 4.
Proof. The proof is very similar to the proof of Corollary 72. The key difference is that,
when we mark edges around each face, we know that each face has at least four sides,
because 𝐺 has no triangles, so we mark four of them. Instead of the inequality 3𝑓 ≤ 2𝑚,
we have
4𝑓 ≤ 2𝑚,
which simplifies to
2𝑓 ≤ 𝑚.
Combining this with Euler’s Theorem, 𝑛 − 𝑚 + 𝑓 = 2, gives
𝑚 ≤ 2𝑛 − 4.
Corollary 75.
75 𝐾3,3 is nonplanar.
Proof. 𝐾3,3 has six vertices and nine edges: 𝑛 = 6 and 𝑚 = 9. We have 2𝑛−4 = 2⋅6−4 =
12 − 4 = 8 < 9, so
𝑚 > 2𝑛 − 4.
Note also that, since 𝐾3,3 is bipartite, it has no odd cycles, by Corollary 63. In particular,
it has no cycles of length 3, i.e., it has no triangles. Therefore, by Corollary 74, 𝐾3,3 is
not planar.
We finish our exploration of graph theory with some games you can play on graphs.
The Shannon Switching Game is named after Claude Shannon, one of the most important
and influential scientists of the twentieth century. His first major contribution was in
his Masters thesis, where he introduced the algebra of switching. He went on to lay
12.6𝜔 G A M E S O N G R A P H S 437
• Cut crosses out an edge, with the aim of ensuring that there is no path between
𝑠 and 𝑡 in 𝐺. Once an edge is crossed out, it cannot be used in such a path.
• Join thickens an edge, with the aim of ensuring that there is a path of thickened
edges between 𝑠 and 𝑡 in 𝐺.
Once an edge is crossed out, it can never be thickened, and once an edge is thickened,
it can never be crossed out.
The game ends in one of two ways.
• As soon as there is a path of thickened edges between 𝑠 and 𝑡, Join wins. (It is not
required that all the thickened edges form such a path, but only that the thickened
edges include a thickened path between 𝑠 and 𝑡.)
𝑠 𝑡
𝑏
438 G R A P H T H E O RY i i
Join’s first move: thicken {𝑠, 𝑎}. Cut’s first move: cross out {𝑎, 𝑡}.
𝑎 𝑎
X
𝑠 𝑡 𝑠 𝑡
𝑏 𝑏
Join’s second move: thicken {𝑎, 𝑏}. Cut’s second move: cross out {𝑏, 𝑡}.
𝑎 𝑎
X X
𝑠 𝑡 𝑠 𝑡
X
𝑏 𝑏
At this point, Cut wins the game, because once edges {𝑎, 𝑏} and {𝑏, 𝑡} are deleted,
the terminals 𝑠 and 𝑡 are in different components of the remaining graph.
But Join played poorly in this game. In particular, their second move was a serious
blunder! What should they have done? Could they have won? Do they have a winning
strategy in this game?
The theory of connectivity (§ 11.9) tells us that there is always a winner in this
game. There is no possibility of a draw. Suppose the players keep playing until all edges
are either thickened or crossed out, and consider the graph consisting of all the original
vertices but only the thickened edges. Either there is a path beteen 𝑠 and 𝑡 in this graph,
or there isn’t. If there is such a path, then since this path only has thickened edges,
Join wins. If there is no such path, then Cut wins. As soon as a thickened-edge path
is created or becomes impossible, the game is over, so the game could finish before all
edges are used, but in any case, there is always a winner.
At any stage while playing the game, there is at least one move for the player whose
turn it is that is best possible for them. It may be that one of the players has a winning
strategy. What would it mean for the player who starts the game to have a winning
strategy on a particular graph? This means that:
12.6𝜔 G A M E S O N G R A P H S 439
eventually leading to a win for the starting player no matter what their opponent does.
This is not saying that they can force a win from every conceivable configuration in the
game, but rather that they can force a win, eventually, from the starting configuration
in the game (in which no edges are thickened or crossed out). It’s also not saying that
they will win even if they play badly. A player with a winning strategy may well lose if
they depart from that strategy.
On the other hand, it’s possible that the second player to move (i.e., the one who
does not start the game) has a winning strategy. This would mean that:
eventually leading to a win for the second player no matter what their opponent (the
first player) does.
Since every play of the game always results in a win for one of the two players, it
follows that if one of the two players does not have a winning strategy, then the other
does. This may, or may not, depend on who goes first. So, for any given graph, there
may be four possibilities:
• Join has a winning strategy, whether they go first or second.
• The first player has a winning strategy, whether that’s Join or Cut.
• The second player has a winning strategy, whether that’s Join or Cut.
For each of the following graphs (or multigraphs), see if you can determine which of
those four cases apply.
𝑠 𝑡
440 G R A P H T H E O RY i i
𝑠 𝑡
𝑠 𝑡
𝑠 𝑡
𝑠 𝑡
Try making some other graphs, and playing the Shannon Switching Game on them,
and try to determine who has a winning strategy in each.
After you have done this for the above graphs and maybe some others, you will
be able to classify each graph into one of our four categories, according to who has a
winning strategy: Join, Cut, first player, or second player. You should notice that there
is one category with no graphs: there is actually no graph for which the second player
always has a winning strategy regardless of whether Join or Cut starts the game.
This is not just an empirical observation from the graphs you have tried. (Such
observations are important in formulating conjectures, but of course they are not proofs.)
12.6𝜔 G A M E S O N G R A P H S 441
It is, in fact, a general property of this game. To see this, imagine that the second player
always has a winning strategy, regardless of who starts. Then what can the first player
do? Consider the following strategy for the first player: make any initial move, and
from then on, play as if they are the second player, which should result in them winning
if they use one of the winning strategies for the second player. This gives a contradiction:
we started by assuming that the second player can force a win, and then showed how
this same strategy could be exploited by the first player to force a win. So, in fact, there
can be no graph for which the second player always has a winning strategy.
The reason this argument works is that, in this game, it is never a disadvantage
to move. For Join, it can never be a disadvantage to thicken an edge (compared with
doing nothing at all), and for Cut, it can never be a disadvantage to cross out an edge.
In other words, Join thickening an edge can never be better, for Cut, than Join doing
nothing, and Cut crossing out an edge can never be better, for Join, than Cut doing
nothing. (In this game, you are not allowed to do nothing on your move, but it wouldn’t
matter if you were allowed to do nothing, because if you play sensibly, you would always
choose to make an actual move — joining or cutting, depending on your role — rather
than doing nothing.)
This argument won’t work in all two-player games. For example, in Chess, Draughts
(Checkers), Reversi (Othello) and Backgammon, situations can arise where any move
by a player is disadvantageous for them. Other games where moving is never a disad-
vantage include Noughts-and-Crosses (Tic-tac-toe). In that particular case, draws are
possible, so we cannot argue that there is always a winning strategy for one of the two
players.
You can just enjoy playing the game for fun. You can also try to develop some theory
for it. If you’d like a challenge, try to characterise those graphs that belong to each of our
three categories: Join wins, Cut wins, or first player wins. This is not easy, but the char-
acterisation uses some concepts discussed in this chapter. Such a characterisation should
be based purely on the structure of the graph, without examining all possible plays of
the game on the graph (of which there is, in general, a huge number). But, of course,
it’s ok to play the game as many times as you like, to help explore how it works and to
help discover what it is, about the structure of graphs, that helps a particular player win.
Here is a much more complex graph you can play the game on if you wish.
442 G R A P H T H E O RY i i
𝑠 𝑡
A commercial version of this game, using this last graph, has been produced. This
is the game Bridg-It, invented by David Gale and manufactered by Hasenfeld in 1960.
http://abstractstrategy.com/bridg-it.html
12.6.2𝜔 Slither
The game of Slither was invented by David L. Silverman and described by Martin
Gardner in 1972.4
The game can be played on any graph, as follows. Two players take turns choosing
edges so that the set of chosen edges always forms a path in the graph. Initially, no
edges are chosen. The first player chooses any edge they like, which is a path of length 1.
Then the second player must choose an edge which is incident with the first edge and
4 Martin Gardner, Mathematical games, Scientific American 226 (no. 6) (June 1972) 114–118.
12.6𝜔 G A M E S O N G R A P H S 443
creates a path of length 2. Then the first player chooses any edge that extends the path
created so far. And so on, for as long as possible. In order to extend the path chosen so
far, each new edge chosen must satisfy the following conditions:
• One of its endpoints must be one of the two endpoints of the path chosen so far.
– Either of those endpoints can be used, at each turn. Players are not required
to keep extending the path at the same end.
• The other endpoint of the new edge must be a vertex that does not yet belong to
the path. That new vertex then becomes an end of the path.
The game ends when no legal move is possible. When that happens, the last player to
have made a legal move wins. So, the first player who cannot make a legal move loses.
𝑐
𝑒
𝑎
𝑓
𝑑
First player’s first move: {𝑎, 𝑒}. Second player’s first move: {𝑒, 𝑓}.
𝑏 𝑏
𝑐 𝑐
𝑒 𝑒
𝑎 𝑎
𝑓 𝑓
𝑑 𝑑
444 G R A P H T H E O RY i i
First player’s second move: {𝑎, 𝑐}. Second player’s second move: {𝑐, 𝑑}.
𝑏 𝑏
𝑐 𝑐
𝑒 𝑒
𝑎 𝑎
𝑓 𝑓
𝑑 𝑑
In this play of the game, the game stops at this point because no further legal
move is possible. The path cannot be extended at either end without destroying the
path property. The player to move (the first player, in this case) cannot choose {𝑑, 𝑓},
because that creates a cycle of chosen edges: 𝑎, 𝑒, 𝑓, 𝑑, 𝑐, 𝑎. Similarly, neither {𝑎, 𝑓} nor
{𝑎, 𝑑} can be chosen, because they also create cycles of chosen edges (triangles, in fact).
The edges {𝑏, 𝑐}, {𝑎, 𝑏} and {𝑏, 𝑒} cannot be chosen because they are not incident with
an endpoint (𝑑 or 𝑓) of the chosen path.
Could the first player have done better? Does the first player have a winning strat-
egy when the game is played on this graph?
This game must end in a win for one or the other of the two players. For each graph,
we have one of two possibilities:
Try the game on a few small graphs, and classify them as to whether you think the
first or second player has a winning strategy. This time, you should find that there are
graphs of each type: some for which the first player has a winning strategy, and others
for which the second player has a winning strategy.
Then try and investigate how to determine, just from the graph itself and without
doing an exhaustive search of all possible plays of the game, who has a winning strategy
on a given graph.
One important difference between this game and the Shannon Switching Game is
that, in this game, the two players have the same role, since their effect on an edge,
when they choose it, is the same. (This contrasts with the Shannon Switching Game,
where one player thickens edges and the other player crosses them out.) Of course, the
details of the games are very different, too. But this point about the different roles of
the players is a very fundamental one in the theory of games. A game is said to be
12.7 E X E R C i S E S 445
impartial if, in any given configuration, the options available to each player would be
the same if it were their turn to move. A game is partisan if it is not impartial. So,
with this terminology, the Shannon Switching Game is a partisan game, while Slither is
an impartial game.
12.7 EXERCiSES
1. Draw all trees that have six vertices. Your set of drawings should be comprehensive,
so that every tree on six vertices can be seen to be identical to one of those in your set
(potentially after some relabelling and redrawing).
2. The diameter of a graph is the length of the longest path in it. This is the
maximum, over all pairs of vertices 𝑣, 𝑤, of the length of any path between 𝑣 and 𝑤.
Prove that every tree of diameter ≥ 3 has a vertex 𝑣 such that
2 ≤ deg(𝑣) ≤ 𝑛 − 2,
3. If a tree has 𝑘 vertices of degree 3 and all other vertices are leaves, how many
leaves does it have?
4.
(a) Prove that every tree with at least three vertices has a vertex of degree at least 2.
(b) Is there a number 𝑁 such that every tree with at least 𝑁 vertices has a vertex of
degree at least 3?
5. Prove that a graph 𝐺 is a tree if and only if for every pair 𝑢, 𝑣 of vertices, there is
a unique path between them in 𝐺.
𝑏
𝑎
𝑓
𝑒
𝑐
𝑑
446 G R A P H T H E O RY i i
𝑏 1 𝑑 6 𝑓
9 2 7
3 4
𝑎 8 𝑐 5 𝑒
10. Check that Euler’s Theorem holds for the following graphs:
(a) the cycle 𝐶𝑛 ;
(b) the wheel graph 𝑊𝑛 , which is obtained from 𝐶𝑛 by adding one new vertex (the
“hub”) that is adjacent (via “spokes”) to every vertex in 𝐶𝑛 (the “rim”);
(c) 𝐾4 ;
11.
(a) Give an upper bound on the average degree of a planar graph.
(b) Give an upper bound on the average degree of a planar graph with no triangles.
(c) Give an upper bound on the average degree of a bipartite planar graph.
13.
Let 𝑑 ∈ ℕ. A graph is 𝑑-regular if every vertex has degree 𝑑.
12.7 E X E R C i S E S 447
(a) Find the 3-regular planar graph with the fewest vertices.
(b) Find the 3-regular bipartite planar graph with the fewest vertices.
(c) Find the 3-regular nonplanar graph with the fewest vertices.
(e) What is the minimum number of vertices that a 5-regular planar graph can have?
Can you find such a graph?