Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
31 views459 pages

FIT1058 Course Notes

The FIT1058 Foundations of Computing course at Monash University focuses on the theoretical foundations of computer science, emphasizing abstract modeling, logical reasoning, and mathematical analysis. It requires prior knowledge in mathematics, particularly in topics such as sets, functions, sequences, and probability. The course notes will be released in installments throughout the semester, with sections for pre-reading and extra material to enhance understanding.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
31 views459 pages

FIT1058 Course Notes

The FIT1058 Foundations of Computing course at Monash University focuses on the theoretical foundations of computer science, emphasizing abstract modeling, logical reasoning, and mathematical analysis. It requires prior knowledge in mathematics, particularly in topics such as sets, functions, sequences, and probability. The course notes will be released in installments throughout the semester, with sections for pre-reading and extra material to enhance understanding.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 459

FIT1058 FOUNDATiONS OF COMPUTiNG

COURSE NOTES

Graham Farr, Alexey Ignatiev, and Rebecca Robinson

Faculty of Information Technology


Monash University

June 2, 2025
ii

Monash University
Faculty of Information Technology
P R E FA C E

Welcome to FIT1058 Foundations of Computing!


Computation uses abstract formal models of real objects and systems. This unit
lays the theoretical foundations for working with the most fundamental abstract models
used in computer science. It will develop skills in abstract modelling, logical reasoning,
rigorous proof, and mathematical analysis of computational methods and structures.
These skills will be used throughout your degree in both theoretical and practical settings.
The material we cover is considered core knowledge for computer science students and
graduates by the major national and international professional associations.
This document contains the course notes for the unit. It will be released in instal-
ments through the semester.

PREREQUiSiTES

This unit assumes successful prior study in mathematics at least to the standard of
Mathematical Methods units 3 & 4 in the Victorian Certificate of Education (VCE).
That VCE subject in turn builds on mathematics studied earlier in high school.
The specific topics from school mathematics that we make most use of in this unit
are:

• sets (including Venn diagrams, subsets, supersets, complement, union, intersection,


counting subsets);

• functions (incl. their domain, codomain, rule), inverse functions, relations;

• sequences and series (arithmetic and geometric, finite and infinite);

• numbers (natural numbers, integers, rational numbers, real numbers, represen-


tations in a base, divisors, factorisation, composite numbers, primes, arithmetic,
remainders);

• counting (incl. using addition and multiplication, permutations, combinations, bi-


nomial coefficients);

• probability (incl. using counting, combinations of events, conditional probability,


random variables, probability distributions, binomial distribution, normal distri-
bution, probability density functions);

• exponentiation and logarithms (which are ubiquitous in computer science);

iii
iv

• calculus (not a lot, but we make some use of limits and integrals, and derivatives
can give useful insight even in situations where we don’t make specific use of them).

We also make pervasive use of standard high-school algebra when working with polyno-
mials and other functions.
We review parts of some of these topics in the pre-reading (see next section), but
the pace and level still assumes prior knowledge. So it is important that you work to
fill in any gaps or hazy areas in your knowledge of these topics.

PRE-READiNG

Sections whose numbers have superscript 𝛼 contain pre-reading. For example:

§ 2.5𝛼 Functions with multiple arguments and values

These pre-reading sections should be read and studied before the seminars on that topic.
Some of them review work you did in school, but you should still read them, for several
reasons: they establish the notation, terminology and other conventions we will use,
which sometimes differ from those used in schools; they give some important computer
science context to the concepts discussed, which you may not be aware of even if you
have studied the concepts themselves before; and our experience is that most students
benefit from some reminders and revision of this school material anyway. Other sections
marked 𝛼 may cover new material that is so fundamental to the topic to be discussed
that reading about it before the seminar will significantly increase your ability to learn
the material and master it.
The amount of pre-reading varies, depending on the topic. Some chapters spend a lot
of their early sections reviewing concepts and topics covered in school (e.g., the chapters
on sets and functions). These have more pages of pre-reading, but reading them should
not take as long as reading completely new material. Other chapters contain material
that is almost entirely new. These have fewer pages of pre-reading, but those pages will
need to be read more slowly and carefully.

E X T R A M AT E R i A L

Sections whose numbers have superscript 𝜔 contain extra material that is beyond the
FIT1058 curriculum. For example:

§ 2.11𝜔 Loose composition

This may include discussion of alternative approaches that we don’t study, or more
advanced aspects of the topic, or some historical background. The specific content
introduced in these extra sections won’t be on tests or exams in this unit. But reading
them may still be of indirect benefit, by consolidating the material studied in the other
sections.
v

ACKNOWLEDGEMENTS

Thanks very much to those who have given feedback on earlier versions of these Course
Notes, including: David Albrecht, Mathew Baker, Annalisa Calvi, Nathan Companez,
Michael Gill, Thomas Hendrey, Alexey Ignatiev, Roger Lim, Rebecca Robinson, James
Sherwood, Alejandro Stuckey de la Banda, Tham Weng Kee, Nelly Tucker, Joel Wills,
and some anonymous student reviewers.
Thanks also to the FIT1058 students, including Tiancheng Cai, Shshank Jha, Ian
Ko, Cody Lincoln, Zijing Song, Timothy Tong, Jing Yap, and Michael Zeng, who have
pointed out some errors in earlier versions, enabling us to fix them. Further error-
spotting is very welcome. We look forward to acknowledging more students in future.
CONTENTS

Preface iii

1 Sets 1
1.1𝛼 Sets and elements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2𝛼 Specifying sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3𝛼 Cardinality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.4 Sets of numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.5 Sets of strings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.6 Subsets and supersets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.7 Minimal, minimum, maximal, maximum . . . . . . . . . . . . . . . . . . 9
1.8 Counting all subsets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.9 The power set of a set . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.10 Counting subsets by size using binomial coefficients . . . . . . . . . . . . 12
1.11 Complement and set difference . . . . . . . . . . . . . . . . . . . . . . . . 17
1.12 Union and intersection . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
1.13 Symmetric difference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
1.14 Cartesian product . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
1.15 Partitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
1.16 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

2 Functions 37
2.1𝛼 Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
2.1.1𝛼 Domain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
2.1.2𝛼 Codomain & co. . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
2.1.3𝛼 Rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
𝛼
2.2 Functions in computer science . . . . . . . . . . . . . . . . . . . . . . . . 44
2.2.1𝛼 Functions from analysis . . . . . . . . . . . . . . . . . . . . . . . 44
2.2.2𝛼 Functions in mathematics and programming . . . . . . . . . . . . 44
2.3𝛼 Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
2.4𝛼 Some special functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
2.5𝛼 Functions with multiple arguments and values . . . . . . . . . . . . . . . 46
2.6 Restrictions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
2.7 Injections, surjections, bijections . . . . . . . . . . . . . . . . . . . . . . . 49
2.8 Inverse functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
2.9 Composition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
2.10 Cryptosystems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
2.11𝜔 Loose composition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62

vii
viii CONTENTS

2.12 Counting functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63


2.13 Binary relations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
2.14 Properties of binary relations . . . . . . . . . . . . . . . . . . . . . . . . 68
2.15 Combining binary relations . . . . . . . . . . . . . . . . . . . . . . . . . . 69
2.16 Equivalence relations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
2.17 Relations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
2.18 Counting relations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
2.19 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76

3 Proofs 85
3.1𝛼 Theorems and proofs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
3.2 Logical deduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
3.3 Proofs of existential and universal statements . . . . . . . . . . . . . . . 92
3.4 Finding proofs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
3.5 Types of proofs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
3.6 Proof by symbolic manipulation . . . . . . . . . . . . . . . . . . . . . . . 95
3.7 Proof by construction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
3.8 Proof by cases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
3.9 Proof by contradiction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
3.10 Proof by mathematical induction . . . . . . . . . . . . . . . . . . . . . . 97
3.11 Induction: more examples . . . . . . . . . . . . . . . . . . . . . . . . . . 100
3.12𝜔 Induction: extended example . . . . . . . . . . . . . . . . . . . . . . . . . 104
3.13 Mathematical induction and statistical induction . . . . . . . . . . . . . 108
3.14 Programs and proofs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
3.15 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110

4 Propositional Logic 121


4.1𝛼 Truth values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121
4.2𝛼 Boolean variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122
4.3𝛼 Propositions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122
4.4𝛼 Logical operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
4.5 Negation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
4.6 Conjunction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124
4.7 Disjunction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
4.8 De Morgan’s Laws . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126
4.9 Implication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
4.10 Equivalence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128
4.11 Exclusive-or . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
4.12 Tautologies and logical equivalence . . . . . . . . . . . . . . . . . . . . . 132
4.13𝜔 History . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132
4.14 Distributive Laws . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
4.15 Laws of Boolean algebra . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
4.16 Disjunctive Normal Form . . . . . . . . . . . . . . . . . . . . . . . . . . . 134
CONTENTS ix

4.17 Conjunctive Normal Form . . . . . . . . . . . . . . . . . . . . . . . . . . 138


4.18 Representing logical statements . . . . . . . . . . . . . . . . . . . . . . . 139
4.19 Statements about how many variables are true . . . . . . . . . . . . . . . 142
4.20 Universal sets of operations . . . . . . . . . . . . . . . . . . . . . . . . . 143
4.21 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144

5 Predicate Logic 155


5.1𝛼 Relations, predicates, and truth-valued functions . . . . . . . . . . . . . 155
5.2𝛼 Variables and constants . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157
5.3𝛼 Predicates and variables . . . . . . . . . . . . . . . . . . . . . . . . . . . 158
5.4 Arguments of predicates . . . . . . . . . . . . . . . . . . . . . . . . . . . 160
5.5 Building logical expressions with predicates . . . . . . . . . . . . . . . . 163
5.6 Existential quantifier . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164
5.7 Restricting existentially quantified variables . . . . . . . . . . . . . . . . 167
5.8 Universal quantifier . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 168
5.9 Restricting universally quantified variables . . . . . . . . . . . . . . . . . 169
5.10 Multiple quantifiers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 170
5.11 Predicate logic expressions . . . . . . . . . . . . . . . . . . . . . . . . . . 172
5.12 Doing logic with quantifiers . . . . . . . . . . . . . . . . . . . . . . . . . 173
5.13 Duality between quantifiers . . . . . . . . . . . . . . . . . . . . . . . . . 175
5.14 Summary of rules for logic with quantifiers . . . . . . . . . . . . . . . . . 175
5.15 Some examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 176
5.16 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 178

6 Sequences & Series 183


6.1𝛼 Definitions and notation . . . . . . . . . . . . . . . . . . . . . . . . . . . 184
6.2𝛼 Recursive definitions of sequences . . . . . . . . . . . . . . . . . . . . . . 185
6.3𝛼 Arithmetic sequences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187
6.4𝛼 Geometric sequences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 188
6.5 Harmonic sequences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189
6.6 From recursive definitions to formulas . . . . . . . . . . . . . . . . . . . . 190
6.7 The Fibonacci sequence . . . . . . . . . . . . . . . . . . . . . . . . . . . . 192
6.7.1 Upper bounds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193
6.7.2 Lower bounds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 196
6.7.3 Asymptotic behaviour . . . . . . . . . . . . . . . . . . . . . . . . 199
6.7.4 An exact formula . . . . . . . . . . . . . . . . . . . . . . . . . . . 200
6.8 Limits of infinite sequences . . . . . . . . . . . . . . . . . . . . . . . . . . 204
6.9 Big-O notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 210
6.10 Sums and summation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 214
6.11 Finite series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 217
6.12 Finite arithmetic series . . . . . . . . . . . . . . . . . . . . . . . . . . . . 218
6.13 Finite geometric series . . . . . . . . . . . . . . . . . . . . . . . . . . . . 219
6.14 Infinite geometric series . . . . . . . . . . . . . . . . . . . . . . . . . . . . 221
x CONTENTS

6.15 Harmonic numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 221


6.16 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223

7 Number Theory 233


7.1𝛼 Multiples and divisors . . . . . . . . . . . . . . . . . . . . . . . . . . . . 233
7.2𝛼 Prime numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 235
7.3 Remainders and the mod operation . . . . . . . . . . . . . . . . . . . . . 236
7.4 Parity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 238
7.5 The greatest common divisor . . . . . . . . . . . . . . . . . . . . . . . . . 239
7.6 The Euclidean algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . 241
7.7 The gcd and integer linear combinations . . . . . . . . . . . . . . . . . . 243
7.8 The extended Euclidean algorithm . . . . . . . . . . . . . . . . . . . . . 247
7.9 Coprimality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 250
7.10 Modular arithmetic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 253
7.11 Modular inverses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 259
7.12 The Euler totient function . . . . . . . . . . . . . . . . . . . . . . . . . . 260
7.13 Fast exponentiation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 262
7.14 Modular exponentiation . . . . . . . . . . . . . . . . . . . . . . . . . . . 263
7.15 Primitive roots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 266
7.16 One-way functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 268
7.17 Modular exponentiation with fixed base . . . . . . . . . . . . . . . . . . 269
7.18 Diffie-Hellman key agreement scheme . . . . . . . . . . . . . . . . . . . . 270
7.19 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 274

8 Counting & Combinatorics 281


8.1𝛼 Counting by addition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 281
8.2𝛼 Counting by multiplication . . . . . . . . . . . . . . . . . . . . . . . . . . 282
8.3 Inclusion-exclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 283
8.4 Inclusion-exclusion: derangements . . . . . . . . . . . . . . . . . . . . . . 289
8.5 Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 293
8.6 Ordered selection with replacement . . . . . . . . . . . . . . . . . . . . . 294
8.7 Ordered selection without replacement . . . . . . . . . . . . . . . . . . . 295
8.8 Unordered selection without replacement . . . . . . . . . . . . . . . . . . 295
8.9 Unordered selection with replacement . . . . . . . . . . . . . . . . . . . . 296
8.10 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 299

9 Discrete Probability I 305


9.1𝛼 The nature of randomness . . . . . . . . . . . . . . . . . . . . . . . . . . 306
9.2𝛼 Probability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 307
9.3𝛼 Choice of sample space . . . . . . . . . . . . . . . . . . . . . . . . . . . . 313
9.4𝛼 Mutually exclusive events . . . . . . . . . . . . . . . . . . . . . . . . . . 317
9.5 Operations on events . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 319
9.6 Inclusion-Exclusion for probabilities . . . . . . . . . . . . . . . . . . . . . 325
CONTENTS xi

9.7 Independent events . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 327


9.8 Conditional probability . . . . . . . . . . . . . . . . . . . . . . . . . . . . 333
9.9 Bayes’ Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 337
9.10 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 342

10 Discrete Probability II 347


10.1𝛼 Random variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 347
10.2𝛼 Probability distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . 349
10.3𝛼 Expectation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 351
10.4 Median . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 358
10.5 Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 360
10.6 Variance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 360
10.7 Uniform distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 365
10.8 Binomial distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 367
10.9 Poisson distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 370
10.10 Geometric distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . 374
10.11 The coupon collector’s problem . . . . . . . . . . . . . . . . . . . . . . . 380
10.12 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 385

11 Graph Theory I 389


11.1𝛼 Basic definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 390
11.2𝛼 Types of graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 392
11.3𝜔 Graphs and relations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 393
11.4𝛼 Representing graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 394
11.4.1 Edge list . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 394
11.4.2 Adjacency matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . 394
11.4.3 Adjacency list . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 396
11.4.4 Incidence matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . 397
11.5 Subgraphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 398
11.6 Some special graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 398
11.7 Degree . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 399
11.8 Moving around . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 402
11.9 Connectivity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 407
11.10 Bipartite graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 408
11.11 Euler tours . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 411
11.12 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 416

12 Graph Theory II 419


12.1𝛼 Trees . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 419
12.2 Properties of trees . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 421
12.3 Forests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 424
12.4 Spanning trees . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 425
12.5 Planarity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 429
xii CONTENTS

12.6𝜔 Games on graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 436


12.6.1𝜔 Shannon’s Switching Game . . . . . . . . . . . . . . . . . . . . . 436
12.6.2𝜔 Slither . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 442
12.7 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 445
1
SETS

The raw material of computation and communication is information. This takes many
different forms, due to the great variety of things we might want to represent and the
many different ways of representing them.
In this unit and in other units in your degree, you will learn about many different
structures that are used to represent information in order to store it, communicate it or
compute with it.
We will start with sets because these are among the simplest possible information
structures. Most other structures can be defined in terms of sets, so sets are a founda-
tional topic.
Sets are used extensively to define the types of objects that we compute with. When
we work with a whole number, we say it is of Integer type because it belongs to the set
of integers and satisfies the various laws and properties of integers. When we work with
a string of characters, we might say that it is of String type because it belongs to the
set of all strings over a suitable alphabet and satisfies the properties expected of strings.
Many programming languages take careful account of the types of the objects they work
with, and sets always underlie any notion of type.

1.1𝛼 SETS AND ELEMENTS

A set is just a collection of objects, without order or repetition. The objects in a set are
called its elements or members
members. To specify a set, we can just give a comma-separated
list of them between curly braces. So the following are all sets:

{Harry, Ginny, Hermione, Ron, Hagrid}


{42, −273.15, 1729, 10100 }
{CSIRAC, Manchester Baby, EDSAC}
{}

In the last example, the set is empty. This set, simple as it is, is so fundamental that it
has its own symbol, ∅, not to be confused with zero, 0 (which, in the early days of the
computer industry, was often written with a slash through it to distinguish it from the
letter O).

1
2 SETS

When we write a set by listing its elements in the above way, we will inevitably
list the elements in some specific order. But different orders of writing do not affect
the identity of the set. Our third set above contains three of the earliest computers
ever built, but they are not listed there in chronological order. If we wrote them in
chronological order, the set would be written

{Manchester Baby, EDSAC, CSIRAC}.

But it would still be the same set:

{CSIRAC, Manchester Baby, EDSAC} = {Manchester Baby, EDSAC, CSIRAC}.

We state that an object is an element of a set using the symbol ∈:

object ∈ set.

For example,

CSIRAC ∈ {Manchester Baby, EDSAC, CSIRAC}.

To state that an object does not belong to a set, we use ∉. For example,

SILLIAC ∉ {Manchester Baby, EDSAC, CSIRAC}.

1.2𝛼 SPECiFYiNG SETS

We will be working with many sets that are far larger than these examples, and many
will be infinite. So it is often not practical to write out all the elements. So we need a
succinct way of specifying precisely the elements of a set. One way is to give a condition
that, in general, is either true or false, with the members of the set being precisely those
objects for which the condition is true. For example,

{𝑥 ∶ 𝑥 is even}

is the set of all even numbers. The variable 𝑥 here is simply a name for elements of
this set, so that we can talk about them. The colon, “:”, separates the name 𝑥 from the
condition on 𝑥 that must be satisfied in order for it to be an element of this set. We read
this as “the set of all 𝑥 such that 𝑥 is even”. The choice of name, 𝑥, is not important;
we could equally well write the set as

{𝑛 ∶ 𝑛 is even}

In this definition, the reader will naturally infer that the variable (𝑥 or 𝑛) represents a
whole number, since the concept of a number being even or not only applies to whole
1.2𝛼 S P E C i F Y i N G S E T S 3

numbers; it makes no sense, in general, for rational numbers or real numbers. But it
is often preferable to spell out the kind of numbers we are talking about, so that the
reader does not have to fill in any gaps in our description. In this example, we might
also want to remove any doubt in the reader’s mind as to whether we are working with
integers in general or just natural numbers. So we might rewrite our definition as

{𝑥 ∶ 𝑥 ∈ ℤ and 𝑥 is even}

Set definitions of this type have the general form

{name ∶ condition},

where name is a convenient name for an arbitrary member of the set and condition is a
statement involving name which is true precisely when the object called name belongs
to the set and is false otherwise.
It is a common convention to include, in our statement of the name (before the
colon), a specification of a larger set that the object must belong to. For example, the
set of even integers could be written

{𝑥 ∈ ℤ ∶ 𝑥 is even}.

This can be read as “the set of 𝑥 in ℤ such that 𝑥 is even” or “the set of integers 𝑥 such
that 𝑥 is even”. In general we can write

{name ∈ larger set ∶ condition}.

It is necessary that the condition be precise and clear. To ensure this, it will often be
specified in a formal symbolic way. It is ok to use English text in the condition provided
it is used clearly and precisely. It is also important for the text to be succinct, subject
to ensuring precision and clarity.
Another way to specify a set is to give a rule by which each member is constructed.
For example, the set of even integers could be written

{2𝑛 ∶ 𝑛 ∈ ℤ}

We read this as “the set of 2𝑛 such that 𝑛 belongs to ℤ” or “the set of 2𝑛 such that 𝑛 is
an integer”. The rule is a formula for converting a named object into a member of the
set, and after the colon we give a condition that the named object must satisfy in order
for the formula to be used. Taking all objects that satisfy this condition, and applying
the formula to each one of them, must give all members of the set. In general, we can
write
{rule expressed in terms of name ∶ condition on name}.
Since the curly braces are read as “the set of”, it’s ok to write, for example, {even
integers} for the set of even integers or {people on Earth} for the set of all people on
4 SETS

Earth. This way of defining sets — using just English text between the braces — is
fine when the English is completely precise and not too long. But it should be used
with care, because of the risk of imprecision, and only works well for sets that can be
described very simply.
People sometimes describe large sets by listing a few of their elements and expecting
readers to spot the pattern and infer what the entire set is. For example, the set of even
integers might sometimes be written as

{0, 2, −2, 4, −4, 6, −6, …}

or
{… , −6, −4, −2, 0, 2, 4, 6, …}.
While this sort of description might help communicate ideas in an informal conversation,
it is not a definition of the set, since it does not precisely specify which elements are
in the set, but rather turns that task over to the reader by the use of “…”. Informal
descriptions have their place, and we will use them sometimes, but they are not formal
definitions.

1.3𝛼 CARDiNALiTY

The size or cardinality of a set is just the number of elements it has. If 𝐴 is a set, then
its size is denoted by |𝐴| or sometimes #𝐴. When a set is specified by just listing its
elements, we can determine its size by just counting those elements, which can be done
manually if the set is small enough. For the above examples, we have

|{Harry, Ginny, Hermione, Ron, Hagrid}| = 5,


|{42, −273.15, 1729, 10100 }| = 4,
|{CSIRAC, Manchester Baby, EDSAC}| = 3,
|∅| = |{}| = 0.

Determining the cardinality of a set is a fundamental skill in computer science. For


example, if a set represents all the objects that an algorithm must examine in order to
find one that is best in some sense, then determining the size of that set helps determine
how long the algorithm will take. If another set represents all the data items that must
be stored in some memory in a device, then determining its size helps determine how
much storage will be used by the data. If yet another set represents all the possible
outcomes of some computation, then its size is an indicator of how uncertain you are
about that outcome before you do the computation.
We will often have to deal with very large sets, due to the huge amounts of data
that computers work with. It is sometimes useful to focus on the logarithm of the size
of a set.
1.4 S E T S O F N U M B E R S 5

1.4 SETS OF NUMBERS

Some sets are so commonly used that they have special names. We have already met ∅
which denotes the empty set. There are names for some fundamental sets of numbers:
ℕ the set of positive integers
ℕ0 the set of nonnegative integers
ℤ the set of all integers
ℚ the set of rational numbers
ℝ the set of real numbers
Usually, when we work with these fundamental number sets, we are not only interested
in them as plain sets: we may also be interested in the natural order they have (with
≤), and in some operations we can do with their elements (like +, −, × and more). So,
the symbol ℤ stands for the set of integers (as above), but it is also used to represent
that same set together with some selection of operations that we are interested in at
the time. We will not dwell on this point further; it would be too fussy to start using
different names for a number set depending on what operations on it were being used
at the time.
To restrict any of these sets to only its positive or negative members, we can use
superscript + or −. So ℤ+ is another way of denoting ℕ, and ℝ− is the set of negative
real numbers. To denote the set of nonnegative members of one of these sets of num-
bers, we combine superscript + with subscript 0, as in ℝ+ 0 for the set of nonnegative
real numbers (since the nonnegative real numbers are just the positive real numbers
together with zero). Similarly, ℚ− 0 is the set of nonpositive rational numbers.

For intervals of real numbers, there is some standard notation to indicate which, if
any, of the two endpoints of the interval are included:
notation definition terminology
[𝑎, 𝑏] {𝑥 ∈ ℝ ∶ 𝑎 ≤ 𝑥 ≤ 𝑏} closed interval
[𝑎, 𝑏) {𝑥 ∈ ℝ ∶ 𝑎 ≤ 𝑥 < 𝑏}
half-open half-closed interval
(𝑎, 𝑏] {𝑥 ∈ ℝ ∶ 𝑎 < 𝑥 ≤ 𝑏}
(𝑎, 𝑏) {𝑥 ∈ ℝ ∶ 𝑎 < 𝑥 < 𝑏} open interval
Sometimes we want to restrict the contents of the interval to one of our other special
sets of numbers. We will indicate this using a subscript on the interval notation. For
example, if we only want integers within the interval [𝑎, 𝑏], we write [𝑎, 𝑏]ℤ , which is an
abbreviation for [𝑎, 𝑏] ∩ ℤ.

1.5 SETS OF STRiNGS

For data to be stored, processed, or communicated, it first needs to be encoded in some


symbolic form. So we start by specifying the symbols we are allowed to use, which we
6 SETS

often call characters or letters


letters. The set of allowed characters is called the alphabet
alphabet.
We only consider finite alphabets.
For example, depending on the context, we could use the 26-letter English alphabet
{a, b, c, … , y, z} (restricting here to words consisting entirely of lower-case letters, and ig-
noring accents and apostrophes), or the alphabet of the ten decimal digits {0, 1, 2, … , 9},
or the set of two bits {0, 1}.
To represent data symbolically, we use strings of characters, where the characters
all belong to some alphabet. If 𝐴 is an alphabet, a string over 𝐴 is a finite sequence
of characters, each of which is drawn from that alphabet. In other words, a string is
anything you can get by taking a character from your alphabet, then taking another
(which might be the same one, or might not be) and putting it after the first, and then
taking a third (which may or may not be one of the characters you have already used)
and putting it after the first two, and so on, for as long as you like (but it must be finite).
The length of a string is its number of characters. So, for example, the length of the
string “babbage” is 7.
We allow the empty string, string which has no characters. Because the empty string is,
by its nature, impossible to see, there is a special symbol for it to help us write about
it: this is 𝜀, the Greek letter epsilon. The length of the empty string is 0.
So, every English word (written in lower case and without accents) is a string over the
26-letter alphabet {a, b, c, … , y, z}. Every positive integer can be represented in decimal
notation as a string over the ten-digit alphabet {0, 1, 2, … , 9}, or in binary notation as a
string over the two-bit alphabet {0, 1}. Every file on your computer may be regarded
as a string over an appropriate alphabet, which might be the latest Unicode alphabet
of about 155,000 characters. Strands of DNA are modelled as strings over the alphabet
{C, G, A, T}.
If 𝐴 is an alphabet and 𝑘 ∈ ℕ0 , then 𝐴 𝑘 denotes all the strings of exactly 𝑘 characters
in which each character belongs to 𝐴. So 𝐴 1 is just the alphabet itself, and 𝐴 2 is the set
of two-characters strings from that alphabet, and so on. For example, if 𝐴 = {0, 1} then

𝐴 3 = {000, 001, 010, 011, 100, 101, 110, 111}.

For any alphabet 𝐴, the set 𝐴 0 is the set containing just the empty string: 𝐴 0 = {𝜀}.
This is not to be confused with the empty set!

How many strings of length 𝑘 over the alphabet 𝐴 are there? In other words, what
is |𝐴 𝑘 |? If 𝑘 = 1, then we are just counting strings of length 1, which is the same as the
number of letters in the alphabet, so |𝐴 1 | = |𝐴|. If 𝑘 = 2, then we are counting strings
of length 2. For each possible first letter, we have |𝐴| choices for the second letter, since
there is no restriction on the second letter. Since each choice of first letter gives the
same number of choices for the second letter, and since there are |𝐴| choices for the
first letter, we find that the number of strings of length 2 is |𝐴| × |𝐴| = |𝐴|2 . If 𝑘 = 3,
then we have |𝐴|2 choices for the first two letters (as we just saw), with each such choice
followed by |𝐴| choices for the third letter, with this number being independent of the
1.6 S U B S E T S A N D S U P E R S E T S 7

choice we made for the first two letters. So the total number of strings of length 3 is
|𝐴|2 × |𝐴| = |𝐴|3 . This reasoning extends to any value of 𝑘. So the number of strings
over 𝐴 of length 𝑘 is given by
|𝐴 𝑘 | = |𝐴|𝑘 .

We also write 𝐴 ∗ for the set of all finite strings (of all possible lengths) over the
alphabet 𝐴. This is always an infinite set (provided 𝐴 ≠ ∅). For 𝐴 = {0, 1}, we give a
few of its smallest members:

𝐴 ∗ = {𝜀, 0, 1, 00, 01, 10, 11, 000, 001, 010, 011, 100, …}.

1.6 SUBSETS AND SUPERSETS

A subset 𝐴 of a set 𝐵 is a set 𝐴 with the property that every element of 𝐴 is also an
element of 𝐵. We write 𝐴 ⊆ 𝐵. For example,

{Manchester Baby, CSIRAC} ⊆ {Manchester Baby, EDSAC, CSIRAC},


ℕ ⊆ ℤ,
ℚ ⊆ ℚ,
{0, 1}3 ⊆ {0, 1}∗ ,
∅ ⊆ [1, 3).

We can also write 𝐴 ⊈ 𝐵 to mean that 𝐴 is not a subset of 𝐵, as in

{𝜋, 𝑖, 𝑒} ⊈ {𝜋, 𝑖, 𝜙},


[0, 1) ⊈ (0, 1],
ℚ+ ⊈ ℤ,
{0, 1, 2} ⊈ {0, 1}∗ ,
{∅} ⊈ ∅.

On a Venn diagram, we illustrate 𝐴 ⊆ 𝐵 by drawing the set 𝐴 entirely within the


set 𝐵. See Figure 1.1.
We can think of the subset relation as specifying a logical implication: 𝐴 ⊆ 𝐵 means
that membership of 𝐴 implies membership of 𝐵. We can write this using the implication
symbol ⇒:
membership of 𝐴 ⇒ membership of 𝐵
In other words, for all 𝑥, if 𝑥 ∈ 𝐴 then 𝑥 ∈ 𝐵:

𝑥 ∈ 𝐴 ⇒ 𝑥 ∈ 𝐵.
8 SETS

Figure 1.1: 𝐴 ⊆ 𝐵.

This is sometimes read as “𝑥 ∈ 𝐴 only if 𝑥 ∈ 𝐵”.


Suppose we need to prove that 𝐴 ⊆ 𝐵. This means we must prove that every member
of 𝐴 is also a member of 𝐵. In other words, we prove that every object that satisfies the
condition for membership of 𝐴 must also satisfy the condition for membership of 𝐵.
Every set is a subset of itself. Sometimes, we want to talk about subsets that are
not the entire set. We say 𝐴 is a proper subset of 𝐵 if 𝐴 ⊆ 𝐵 and 𝐴 ≠ 𝐵. So, if 𝐴
is a proper subset of 𝐵, then there exists at least one element of 𝐵 that is not also an
element of 𝐴. We write 𝐴 ⊂ 𝐵 to mean that 𝐴 is a proper subset of 𝐵.
The empty set is a subset of every set (including itself). So, if 𝐵 is any set, we can
write ∅ ⊆ 𝐵, and if 𝐵 is nonempty then ∅ is a proper subset of 𝐵 and we can write
∅ ⊂ 𝐵.
If 𝐴 and 𝐵 are finite, then 𝐴 ⊆ 𝐵 implies |𝐴| ≤ |𝐵|. This explains the form of the
subset symbol. But note that the converse does not hold in general: if |𝐴| ≤ |𝐵|, it does
not follow that 𝐴 ⊆ 𝐵.

We say 𝐴 is a superset of 𝐵, and write 𝐴 ⊇ 𝐵, to mean that 𝐵 is a subset of 𝐴. If


in addition 𝐴 ≠ 𝐵 then 𝐴 is a proper superset of 𝐵, written 𝐴 ⊃ 𝐵.
All we are doing here is writing the subset relation in reverse. Similarly, we can
write the implication in reverse too. So 𝐴 ⊇ 𝐵 means that membership of 𝐴 is implied
by membership of 𝐵, which we can write using the reversed implication symbol:

membership of 𝐴 ⇐ membership of 𝐵

In other words, for all 𝑥,


𝑥 ∈ 𝐴 ⇐ 𝑥 ∈ 𝐵.
This is sometimes read as “𝑥 ∈ 𝐴 if 𝑥 ∈ 𝐵”.
If we need to prove that 𝐴 ⊇ 𝐵, then we just need to prove that 𝐵 ⊆ 𝐴. We discussed
proving subset relations above.
1.7 M i N i M A L , M i N i M U M , M A X i M A L , M A X i M U M 9

If we have both 𝐴 ⊆ 𝐵 and 𝐴 ⊇ 𝐵, then the two sets must actually be identical:
𝐴 = 𝐵. The converse certainly holds too: if 𝐴 = 𝐵 then 𝐴 ⊆ 𝐵 and 𝐴 ⊇ 𝐵. This
suggests a way of proving that two sets 𝐴 and 𝐵 are equal: prove that each is a subset
of the other. So the task of proving set equality is broken down into two subtasks, each
requiring proof of a subset relationship, which is usually easier to prove than equality.
In fact, this is a very common strategy for proving set equality.
We can think of 𝐴 ⊆ 𝐵 and 𝐴 ⊇ 𝐵 as giving logical implication in both directions:
membership of 𝐴 implies membership of 𝐵 (because 𝐴 ⊆ 𝐵), and is implied by mem-
bership of 𝐵 (because 𝐴 ⊇ 𝐵). More succinctly: membership of 𝐴 is equivalent to
membership of 𝐵, just as we would expect because 𝐴 = 𝐵 here. Because we have impli-
cation in both directions, ⇒ and ⇐, it is convenient to put them together in a single
symbol, ⇔, which means that implication goes in both directions:

membership of 𝐴 ⇔ membership of 𝐵.

In other words, for all 𝑥,


𝑥 ∈ 𝐴 ⇔ 𝑥 ∈ 𝐵.
This is often read as “𝑥 ∈ 𝐴 if and only if 𝑥 ∈ 𝐵”.
We will come across this “if and only if” wording often, so it is worth reflecting on
its meaning. We think of it as stating two conditions that either both hold or both do
not hold. In other words, either they both hold or neither of them holds. So they two
conditions mean the same thing; they are equivalent.
In the case of “𝑥 ∈ 𝐴 if and only if 𝑥 ∈ 𝐵”, we have:

• the “if
if part”, saying that “𝑥 ∈ 𝐴 if 𝑥 ∈ 𝐵”, which means (writing the membership
statements the other way round) that “if 𝑥 ∈ 𝐵 then 𝑥 ∈ 𝐴”, or equivalently, “𝑥 ∈
𝐵 ⇒ 𝑥 ∈ 𝐴”, or equivalently, “𝑥 ∈ 𝐴 ⇐ 𝑥 ∈ 𝐵”;

• the “only
only if part”, saying that “𝑥 ∈ 𝐴 only if 𝑥 ∈ 𝐵”, which means that “if 𝑥 ∈ 𝐴
then 𝑥 ∈ 𝐵”, or equivalently, “𝑥 ∈ 𝐴 ⇒ 𝑥 ∈ 𝐵”.

1.7 MiNiMAL, MiNiMUM, MAXiMAL, MAXiMUM

Suppose we have a set and we are interested in those subsets of it that have some specific
property. For example, let 𝐵 be a set of people. A clique in 𝐵 is a set of people who
all know each other. In other words, it’s a subset 𝐴 ⊆ 𝐵 such that, for each 𝑥, 𝑦 ∈ 𝐴,
person 𝑥 and person 𝑦 know each other. Given a set of people and their social links, we
may wonder how “cliquey” they can be. To help us describe “peak cliques”, we make a
precise distinction between the adjectives “maximum” and “maximal”.

• A maximum clique is a clique of maximum size; there is no larger clique.

• A maximal clique is a clique that is not a proper subset of any other clique.
10 SETS

Observe that these are different concepts, although they are related. Consider carefully
the second of these, the concept of a maximal clique. Such a clique is not necessarily
as large as the largest possible clique in 𝐵 (although it might be). If 𝐴 is a “maximal
clique”, then it’s a clique with the extra property that, if we add any other person in
𝐵 to the set, it’s no longer a clique: that new person will be a stranger to at least one
person already in 𝐴. So 𝐴 cannot be enlarged while preserving the clique property. But
that does not mean it is as large as any clique in 𝐵 can be. There may be other quite
different cliques that are even larger than 𝐴. So a maximal clique may be smaller in size
than a maximum clique.
On the other hand, a maximum clique is also a maximal clique. A clique that is
largest, in size, among all possible cliques in 𝐵 cannot possibly be enlarged; it cannot
possibly be a proper subset of another clique, because then the latter clique would be
larger in size than the former one.
So,
maximum ⟹ maximal.
The reverse implication does not hold in general. (Typically, there are maximal cliques
that are not maximum cliques. See if you can construct an example social network
where this happens. But there do exist unusual situations where every maximal clique
is maximum; can you construct one?)
We make this distinction between the meanings of “maximum” and “maximal” when-
ever we are talking about subsets with some property.

• A maximum subset with the property has largest size among all subsets with
the property.

• A maximal subset is a subset with the property that is not a proper subset of
any other subset with the property. In other words, it cannot be enlarged while
still maintaining the property.

We make a similar distinction between “minimum” and “minimal”.

• A minimum subset with some property has smallest size among all subsets with
the property.

• A minimal subset with some property is a subset with the property that is not
a proper superset of any other subset with the property. So, no proper subset
has the property. In other words, if we remove anything from it, the property no
longer holds.

In many situations in life, and especially if we are just talking about real numbers
(rather than sets), this distinction between “maximum” and “maximal” is unnecessary
(and likewise for “minimum” and “minimal”), and the terms are often treated as synonyms.
What is the maximum numerical score you have ever made in your favourite game? You
could replace “maximum” by “maximal” in this sentence, with no ambiguity (though it
1.8 C O U N T i N G A L L S U B S E T S 11

would be less common wording in practice).1 This is because real numbers are totally
ordered; for every pair 𝑥, 𝑦 ∈ ℝ, if 𝑥 ≠ 𝑦 then either 𝑥 < 𝑦 or 𝑦 < 𝑥. So, if a number
has some property and cannot be increased while maintaining that property (i.e., it’s
maximal), then it’s also the largest number with that property (i.e., it’s maximum).
But the subset relation is different to the kind of order relation we are used to for
real numbers. The subset relation does not give a total ordering; you can have two
different sets 𝐴 and 𝐵 that are incomparable in the sense that 𝐴 ⊈ 𝐵 and 𝐵 ⊈ 𝐴, i.e.,
neither is a subset of the other. Such incomparability cannot occur among real numbers.
But now that we are working with subsets, the terms “maximal” and “minimal” must
be used with care, both in reading and writing. Unfortunately, they are often confused,
even in technical publications in situations where the distinction matters.
From now on, we will mostly drop the underlining when using “maximum”, “maximal”,
“minimum” and “minimal”. But be observant about which suffix, -um or -al, is being
used, and what the usage implies.

1.8 COUNTiNG ALL SUBSETS

If 𝐵 is a finite set, with |𝐵| = 𝑛 say, how many subsets does it have? A subset of 𝐵 is
determined by a choice, for each element of 𝐵, of whether or not to include it in the
subset. Now, 𝐵 has 𝑛 elements, and for each of these we have two choices. These choices
are independent, in the sense that making a choice for one element puts no restrictions
whatsoever on the choices we may make for other elements. So the total number of
choices we make is
2×2×2×⋯⋯⋯×2×2

for each element of 𝐵,
choose between two options

which is just 2|𝐵| = 2𝑛 . This tells us that the number of subsets of a set grows very
quickly — in fact, grows exponentially — as the size of the set increases.

1.9 THE POWER SET OF A SET

The power set of a set 𝐵 is the set of all subsets of 𝐵. We denote it by 𝒫(𝐵). The
observations of the previous paragraph tell us that

|𝒫(𝐵)| = 2𝑛 . (1.1)

1 So, although a maximal clique is not necessarily a maximum clique, a maximal size clique is in-
deed just a maximum size clique. This is because, in “maximal/maximum size clique”, the adjective
“maximal/maximum” is applied to the size, which is a number (and therefore part of a total order), rather
than to the set itself. Nonetheless, we will avoid applying the term “maximal” to sizes and other numbers,
since there we can use “maximum” which is more common.
12 SETS

This is true even if 𝐵 is empty, when 𝑛 = 0 and 2𝑛 = 20 = 1, in keeping with the fact that
∅ has one subset, namely itself. This expression for |𝒫(𝐵)| explains the term “power
set”.
In algorithm design, we often need to find the “best” among all subsets of a set.
Consider, for example, some social network analysis tasks, where we have a set of people
and a set of pairs that know each other. Questions we might ask include: What is
the largest clique, i.e., the largest set of people who all know each other? What is the
largest set of mutual strangers? What is the smallest set of people who collectively
know everyone? We could, in principle, solve these problems by examining all subsets
of the set of people, or in other words, all members of its power set, provided we can
easily determine, for each subset, whether or not it has the property we are interested
in (being a clique, etc.). However, for reasonably large 𝑛, the number of sets to examine
is prohibitive and the search would take too long. So we need to find smarter methods
where we use the properties of networks and of the structures we are interested in to
solve the problem without examining every single subset.
The power set of 𝐵 is also often denoted by 2𝐵 .

1.10 COUNTiNG SUBSETS BY SiZE USiNG BiNOMiAL COEFFiCiENTS

Sometimes we are focused on subsets of a specific size 𝑘. How many subsets of size
𝑘 does a set 𝐵 of size 𝑛 have? This quantity is denoted by a binomial coefficient,
coefficient
written
𝑛
⒧ ⒭
𝑘
and read as “𝑛 choose 𝑘” because we are interested in choosing 𝑘 elements from 𝑛
available elements. Between them, the binomial coefficients (taken over the full range of
subset sizes, 𝑘 = 0, 1, 2, … , 𝑛) count every subset of 𝐵 exactly once, so we already have

𝑛 𝑛 𝑛 𝑛
⒧ ⒭+⒧ ⒭+⋯+⒧ ⒭ + ⒧ ⒭ = 2𝑛 .
0 1 𝑛−1 𝑛

This is an important and useful fact, but it does not yet give us a method for working
out ⒧𝑛𝑘⒭. We now consider how to work this out.
We start with some simple cases. If 𝑘 = 0, then we are choosing no elements at all,
and this can be done in just one way, by doing nothing. (In this context, there’s only
one way to do nothing!) So, for all 𝑛,

𝑛
⒧ ⒭ = 1.
0
1.10 C O U N T i N G S U B S E T S B Y S i Z E U S i N G B i N O M i A L C O E F F i C i E N T S 13

At the other extreme, if 𝑘 = 𝑛, then we choose all elements. Again, this can be done in
only one way, because for each element of our set, we have no choice but to take it. So

𝑛
⒧ ⒭ = 1.
𝑛

Now suppose 𝑘 = 1. We choose just one element from 𝑛 elements, so we have 𝑛 options:

𝑛
⒧ ⒭ = 𝑛.
1

What about 𝑘 = 𝑛 − 1? This time, we are choosing one element not to include in our
subset; once that choice is made, everything else is determined. So, again, we have 𝑛
options:
𝑛
⒧ ⒭ = 𝑛.
𝑛−1
The symmetry we have seen here — firstly between 𝑘 = 0 and 𝑘 = 𝑛, and then between
𝑘 = 1 and 𝑘 = 𝑛−1 — is more general. To see this, observe that deciding which elements
are included also determines which elements are excluded, and vice versa. The number
of ways of choosing 𝑘 elements to include in our subset is the same as the number of
ways of choosing 𝑘 elements to exclude from our subset, which in turn is just the number
of ways of choosing 𝑛 − 𝑘 elements to include. Therefore we have

𝑛 𝑛
⒧ ⒭=⒧ ⒭. (1.2)
𝑘 𝑛−𝑘

We now turn to methods for counting 𝑘-element subsets in general.


Although order does not matter within a set, sometimes it is easier to count them as
if order did matter, and then correct for the overcounting. It’s a bit like counting pairs
of socks in your drawer by counting all the socks and then dividing by two.
So let’s first count all ways of choosing 𝑘 different elements, in order, from a set of 𝑛
elements. Our first element can be any one of the 𝑛 elements, so we have 𝑛 choices. For
our second element, we must not choose the first, but our choice is otherwise unrestricted,
so we have 𝑛−1 choices. Our third element can be anything except the first two already-
chosen ones, so we have 𝑛 − 2 choices. This process continues, so that when we come to
choose the 𝑖-th element (where 1 ≤ 𝑖 ≤ 𝑘), we have 𝑛 − 𝑖 + 1 choices. At the very end,
for our 𝑘-th element, we have 𝑛 − 𝑘 + 1 choices. So, altogether we have

# ways to choose 𝑘 elements in order = 𝑛 ⋅ (𝑛 − 1) ⋅ (𝑛 − 2) ⋅ ⋯ ⋅ (𝑛 − 𝑘 + 1). (1.3)

When 𝑘 = 𝑛 we are just asking for the number of ways in which 𝑛 elements can all be
chosen in order, and that is just the factorial of 𝑛, written 𝑛! and defined by

𝑛! = 𝑛 ⋅ (𝑛 − 1) ⋅ (𝑛 − 2) ⋅ ⋯ ⋅ 3 ⋅ 2 ⋅ 1.
14 SETS

In fact, factorials allow a different way of writing (1.3):

𝑛!
# ways to choose 𝑘 elements in order = . (1.4)
(𝑛 − 𝑘)!

Compare the number of arithmetic operations in each of these expressions, (1.3) and
(1.4). It will be evident that the first expression, (1.3), is more efficient. The second is
still important in understanding and using these counting problems, though.
We now return to our main aim of counting the unordered choices of 𝑘 elements
from 𝑛 elements. Our ordered counting above will count each subset of size 𝑘 some
number of times. In fact, our sequence of choices was designed to count every possible
ordering of the 𝑘 elements exactly once. How many orderings are there? Since we drew
these elements from a set (namely 𝐵), and each element of 𝐵 is chosen at most once, all
these chosen elements must be distinct. So there is no possibility of any of them looking
identical to each other. So there are 𝑘! ways to order the 𝑘 elements, and therefore
each subset of 𝑘 elements gets counted 𝑘! times by this process. Since this overcounting
factor 𝑘! is the same for all subsets of size 𝑘, we have

# ways to choose 𝑘 elements in order = 𝑘!⋅(# ways to choose a subset of 𝑘 elements).

It follows that

# ways to choose a subset of 𝑘 elements


1
= ⋅ (# ways to choose 𝑘 elements in order)
𝑘!
𝑛 ⋅ (𝑛 − 1) ⋅ (𝑛 − 2) ⋅ ⋯ ⋅ (𝑛 − 𝑘 + 1)
= (using (1.3)) (1.5)
𝑘!
𝑛!
= (using (1.4)). (1.6)
(𝑛 − 𝑘)! 𝑘!

Again, compare the number of arithmetic operations in the expressions (1.5) and
(1.6), and consider which would be more efficient for computation. It is also worth
thinking about the order in which the various multiplications and divisions are done.
It makes no difference mathematically, but on a computer the order of operations can
affect the accuracy of the result, because of limitations on the sizes and precision of
numbers stored in the computer. In particular, the calculation works better, in general,
if intermediate numbers used during the computation are not too large or small in
magnitude. So, how can the computation be organised to best keep the sizes of those
intermediate numbers under control?
1.10 C O U N T i N G S U B S E T S B Y S i Z E U S i N G B i N O M i A L C O E F F i C i E N T S 15

A couple of special cases deserve special treatment because of their ubiquity in the
analysis of algorithms and data structures.

𝑛 𝑛(𝑛 − 1)
⒧ ⒭ = ,
2 2
𝑛 𝑛(𝑛 − 1)(𝑛 − 2)
⒧ ⒭ = .
3 6

Counting subsets of a given size can also be done recursively. A recursive method
for doing a task is one based on breaking the task down into simpler tasks of the same
type. In this case, our task is to count the subsets, of a given size, in a given set. How
can we reduce this to simpler subset-counting tasks?
Consider again our set 𝐵 of size 𝑛 and suppose we want to determine the number
⒧𝑛𝑘⒭ of 𝑘-element subsets of 𝐵. Let 𝑏 ∈ 𝐵. We divide the 𝑘-element subsets of 𝐵 into
those that include 𝑏 and those that do not. How many of each kind do we have?
Let’s work through an example. Suppose 𝐵 = {1, 2, 3, 4, 5}, so 𝑛 = 5, and 𝑘 = 3. So
we want the number ⒧53⒭ of 3-element subsets of 𝐵. (This example is small enough that
you can just list these by hand, so please do so! It will be a handy check on what we
are about to do.) Pick 𝑏 ∈ 𝐵, say 𝑏 = 1. Some 3-element subsets of 𝐵 include 1, others
do not. The point is that

total # 3-element subsets = # 3-element subsets that include 1


+ # 3-element subsets that do not include 1.

So we have expressed the answer to our subset-counting problem in terms of answers


to simpler subset-counting problems. Furthermore, these simpler subset-counting prob-
lems are of the same type:

• Observe that choosing a 3-element subset that includes 1 is really just choosing
the rest of the subset that isn’t 1, and we need exactly two of those non-1 elements
to make up three elements altogether. So, counting 3-element subsets that include
1 is the same as counting 2-element subsets of the four-element set {2, 3, 4, 5}. So

4
# 3-element subsets that include 1 = ⒧ ⒭.
2

• Observe that choosing a 3-element subset that does not include 1 is really just
choosing three elements from among the non-1 elements. So, counting 3-element
subsets that don’t include 1 is the same as counting 3-element subsets of the four-
element set {2, 3, 4, 5}. So

4
# 3-element subsets that don’t include 1 = ⒧ ⒭.
3
16 SETS

So
5
⒧ ⒭ = total # 3-element subsets of 𝐵
3
= # 3-element subsets that include 1
+ # 3-element subsets that do not include 1
4 4
= ⒧ ⒭ + ⒧ ⒭.
2 3

Now let’s look at how it works in general.


• For those 𝑘-element subsets that include 𝑏, we choose any 𝑘 − 1 elements from
among all elements of 𝐵 other than 𝑏. So we must choose 𝑘 − 1 elements from
𝑛 − 1 available elements. This can be done in ⒧𝑛−1
𝑘−1 ⒭ ways. (We also choose 𝑏, to
complete our 𝑘-element subset, but there’s only one way to do that!)

• For those 𝑘-element subsets that do not include 𝑏, we choose all 𝑘 elements for our
subset from among all elements of 𝐵 other than 𝑏. So we now choose 𝑘 elements
from 𝑛 − 1 available elements. This can be done in ⒧𝑛−1
𝑘 ⒭ ways.

The total number of 𝑘-element subsets is obtained by adding these two quantities to-
gether. So we have
𝑛 𝑛−1 𝑛−1
⒧ ⒭=⒧ ⒭+⒧ ⒭. (1.7)
𝑘 𝑘−1 𝑘
So we can compute ⒧𝑛𝑘⒭ by doing two simpler computations of the same type (each with
𝑛−1 instead of 𝑛) and adding the results. Those two simpler computations can, in turn,
be done in terms of other even simpler computations (with 𝑛−2), and so on. Eventually,
the numbers get so small that we can use the simple cases 𝑘 = 0 and 𝑘 = 𝑛, which are
so simple that they can be solved without reducing them any further. We call these
the base cases:
cases they sit at the “base” of the whole reduction process, ensuring that the
process does stop eventually, instead of just “descending forever”.
It is worth comparing this method of computing ⒧𝑛𝑘⒭ with direct computation using
(1.5) or (1.6).
This recursive method is especially useful when you want to compute ⒧𝑛𝑘⒭ for all 𝑛
and 𝑘 up to some limits. We start with base cases ⒧𝑛0⒭ = ⒧𝑛𝑛⒭ = 1. The simplest case not
covered by these is ⒧12⒭, and applying (1.7) gives ⒧12⒭ = ⒧10⒭ + ⒧11⒭ = 1 + 1 = 2. The next
simplest cases are ⒧13⒭ and ⒧32⒭. For the first of these, (1.7) gives ⒧13⒭ = ⒧20⒭+⒧12⒭ = 1+2 = 3;
the second can be computed similarly, or even better, we can use symmetry: ⒧32⒭ = ⒧3−2 3
⒭,
by (1.2), which we just calculated to be 3. Similar calculations for 𝑛 = 4, using the
values we have just worked out, give ⒧14⒭ = 4, ⒧42⒭ = 6, and ⒧43⒭ = 4. And so on.
We can visualise the relation (1.7) using Pascal’s triangle,
triangle shown symbolically in
Figure 1.2a and with some actual values in Figure 1.2b. The binomial coefficients ⒧𝑛𝑘⒭
are arranged so that each one is the sum of the two immediately above it. In general,
⒧𝑛𝑘⒭ has ⒧𝑛−1 𝑛−1 𝑛−1
𝑘−1 ⒭ and ⒧ 𝑘 ⒭ just above it, in that order, with ⒧ 𝑘−1 ⒭ to its upper left and
1.11 C O M P L E M E N T A N D S E T D i F F E R E N C E 17

⒧00⒭ 1

⒧10⒭ ⒧11⒭ 1 1

⒧20⒭ ⒧12⒭ ⒧22⒭ 1 2 1

⒧30⒭ ⒧13⒭ ⒧32⒭ ⒧33⒭ 1 3 3 1

⒧40⒭ ⒧14⒭ ⒧42⒭ ⒧43⒭ ⒧44⒭ 1 4 6 4 1

⒧50⒭ ⒧15⒭ ⒧52⒭ ⒧53⒭ ⒧54⒭ ⒧55⒭ 1 5 10 10 5 1

(a) (b)

Figure 1.2: Pascal’s triangle.

⒧𝑛−1 𝑛
𝑘 ⒭ to upper right, and we saw in (1.7) that adding these two gives ⒧ 𝑘 ⒭. To take a
specific example, consider ⒧52⒭ in Figure 1.2a. We know from (1.7) that ⒧52⒭ = ⒧14⒭ + ⒧42⒭,
and we see in the triangular array in Figure 1.2a that ⒧14⒭ and ⒧42⒭ sit just above ⒧52⒭. The
actual values ⒧52⒭ = 10, ⒧14⒭ = 4 and ⒧42⒭ = 6 are shown in the corresponding positions in
the triangular array in Figure 1.2b. The equation ⒧52⒭ = ⒧14⒭ + ⒧42⒭ becomes 10 = 4 + 6.

1.11 COMPLEMENT AND SET DiFFERENCE

Often, the sets we are discussing may all be subsets of some universal set, set also called
the universe of discourse or simply the universe
universe.
For example, if we are working with various sets of integers (such as the even integers,
or the odd integers, or the negative integers, or the primes), then the set ℤ of all integers
can be used as the universal set. If we are working with sets of strings over the English
alphabet 𝐴 (such as the set of nouns, or the set of three-letter strings, or the set of
names in the FIT1058 class list), then the set 𝐴 ∗ of all strings over that alphabet may
be a suitable universal set.
Suppose 𝐴 is any set and 𝑈 is some universal set, so that 𝐴 ⊆ 𝑈. Then the
complement of 𝐴, denoted by 𝐴, is the set of all elements of 𝑈 that are not in 𝐴.
See Figure 1.3.
The notation 𝐴 has the shortcoming that it does not include the universal set 𝑈,
even though the definition depends on 𝑈. This is ok if the universal set has been clearly
stated earlier or is clear from the context. But there is alternative notation that makes
the dependence on 𝑈 clear. We write 𝑈 ∖ 𝐴 for everything (in the universal set) that is
not in 𝐴. So 𝐴 = 𝑈 ∖ 𝐴.
18 SETS

Figure 1.3: The complement 𝐴, shaded.

𝐵
𝐴

Figure 1.4: The set difference 𝐵 ∖ 𝐴, shaded.

When 𝐴 and 𝑈 are finite sets, the size of 𝐴 is given by

|𝐴| = |𝑈| − |𝐴|. (1.8)

Taking the complement of the complement gives the original set:

𝐴 = 𝐴,
𝑈 ∖ (𝑈 ∖ 𝐴) = 𝐴.

The operation ∖ is called set difference and can be used between any two sets. So,
if 𝐴 and 𝐵 are any sets, then 𝐵 ∖ 𝐴 is the set of elements of 𝐵 that are not in 𝐴:

𝐵 ∖ 𝐴 = {𝑥 ∈ 𝐵 ∶ 𝑥 ∉ 𝐴}.

See Figure 1.4.


If 𝐴 ⊆ 𝐵, then
|𝐵 ∖ 𝐴| = |𝐵| − |𝐴|. (1.9)
1.12 U N i O N A N D i N T E R S E C T i O N 19

𝐵
𝐴

Figure 1.5: The union 𝐴 ∪ 𝐵, shaded.

In the special case when 𝑈 is the universal set, this equation is just (1.8).
The size of the set difference does not satisfy (1.9) unless 𝐴 ⊆ 𝐵. Why is this?
How would you modify eq:size-of-set-difference-when-A-subset-B so that it covers any
set difference 𝐵 ∖ 𝐴? What extra information about the sets would you need, in order
to determine |𝐵 ∖ 𝐴|?

The subset and superset relations are complementary in a precise sense:

𝐴 ⊆ 𝐵 ⟺ 𝐴 ⊇ 𝐵. (1.10)

This gives us another approach to proving that 𝐴 ⊆ 𝐵 (as well as the approach described
on p. 8 in § 1.6). Instead of taking a general member 𝑥 of 𝐴 and proving that it also
belongs to 𝐵, we could take a general nonmember of 𝐵 and prove that it also does not
belong to 𝐴. In other words, we show that, every time the condition for membership of
𝐵 is violated, then the condition for membership of 𝐴 must be violated too.

1.12 UNiON AND iNTERSECTiON

The union 𝐴 ∪ 𝐵 of two sets 𝐴 and 𝐵 is the set of all elements that belong to at least
one of the two sets:
𝐴 ∪ 𝐵 = {𝑥 ∶ 𝑥 ∈ 𝐴 or 𝑥 ∈ 𝐵}. (1.11)
The “or” here is inclusive in the sense that it includes the possibility that 𝑥 ∈ 𝐴 and
𝑥 ∈ 𝐵 are both true. This is how we will use the word “or” in set definitions and logical
statements, unless stated otherwise at the time.
The union is illustrated in Figure 1.5.
The intersection 𝐴 ∩ 𝐵 of two sets 𝐴 and 𝐵 is the set of all elements that belong
to both the two sets:
𝐴 ∩ 𝐵 = {𝑥 ∶ 𝑥 ∈ 𝐴 and 𝑥 ∈ 𝐵}. (1.12)
See Figure 1.6.
20 SETS

𝐵
𝐴

Figure 1.6: The intersection 𝐴 ∩ 𝐵, shaded.

Intersection gives us another way of writing set difference.

𝐵 ∖ 𝐴 = 𝐴 ∩ 𝐵.

When we count all the elements of 𝐴 and all the elements of 𝐵, we are counting
everything in either set except that everything in both sets is counted twice. Therefore

|𝐴| + |𝐵| = |𝐴 ∪ 𝐵| + |𝐴 ∩ 𝐵|. (1.13)

This means that, if we know |𝐴| and |𝐵|, then knowing either one of |𝐴 ∪ 𝐵| and |𝐴 ∩ 𝐵|
will enable us to determine the other.

An important special case is when two sets 𝐴 and 𝐵 are disjoint


disjoint, meaning that
𝐴 ∩𝐵 = ∅. In that case, the size of the union is just the sum of the sizes of the members.
The notation ⊔ is sometimes used for the disjoint union of 𝐴 and 𝐵, which is the
ordinary union when the sets are disjoint, but is undefined otherwise:

𝐴 ∪ 𝐵, if 𝐴 ∩ 𝐵 = ∅;
𝐴 ⊔𝐵 = 
undefined, otherwise.

See Figure 1.7. There are some alternative symbols for disjoint union, the most common
being obtained from the ordinary union symbol by placing a dot over it or + inside it: ∪̇
and ⊎.
When the disjoint union is defined, its size is just the sum of the sizes of the sets:

|𝐴 ⊔ 𝐵| = |𝐴| + |𝐵|. (1.14)

We will use disjoint union occasionally, but mostly will focus on the normal, and more
general, union.

The complement of the union of two sets is the intersection of their complements.
1.12 U N i O N A N D i N T E R S E C T i O N 21

𝐵
𝐴

Figure 1.7: The disjoint union 𝐴 ⊔ 𝐵, shaded. It is only defined when 𝐴 and 𝐵 are disjoint.

Theorem 1.
1
𝐴 ∪ 𝐵 = 𝐴 ∩ 𝐵.

Proof.

𝑥 ∈ 𝐴 ∪𝐵 ⟺ 𝑥 ∉ 𝐴 ∪𝐵
⟺ 𝑥 ∉ 𝐴 and 𝑥 ∉ 𝐵
⟺ 𝑥 ∈ 𝐴 and 𝑥 ∈ 𝐵
⟺ 𝑥 ∈ 𝐴 ∩𝐵

Similarly, the complement of the intersection of two sets is the union of their com-
plements. We could prove this in a similar way, but we can prove it even more easily
using Theorem 1.

Corollary 2.
2
𝐴 ∩ 𝐵 = 𝐴 ∪ 𝐵.

Proof.

𝐴 ∩𝐵 = 𝐴 ∩𝐵
= 𝐴 ∪𝐵 (by Theorem 1)
= 𝐴 ∪ 𝐵.

Theorem 1 and Corollary 2 are known as De Morgan’s Laws for Sets. Sets They de-
scribe a duality between union and intersection. We will meet a similar duality later,
when studying logic.
22 SETS

𝐵
𝐴

𝐵 𝐵
𝐴 𝐴

𝐶 𝐶

Figure 1.8: 𝐴 ∩ (𝐵 ∪ 𝐶) (top); compare with 𝐴 ∩ 𝐵 (left) and 𝐴 ∩ 𝐶 (right), and observe that
𝐴 ∩ (𝐵 ∪ 𝐶) = (𝐴 ∩ 𝐵) ∪ (𝐴 ∩ 𝐶).

How do union and intersection interact with each other? If we take the union of
two sets, and then the intersection with a third, what happens? What about taking an
intersection first, then a union?
Consider 𝐴 ∩ (𝐵 ∪ 𝐶), shown in Figure 1.8. It is evident from the Venn diagrams
that this is the same as (𝐴 ∩ 𝐵) ∪ (𝐴 ∩ 𝐶).
Now consider 𝐴 ∪ (𝐵 ∩ 𝐶). It is a good exercise to draw Venn diagrams to show how
this relates to (𝐴 ∩ 𝐵) ∪ (𝐴 ∩ 𝐶).
In summary, we have the following.

Theorem 3.
3 For any sets 𝐴, 𝐵 and 𝐶,

𝐴 ∩ (𝐵 ∪ 𝐶) = (𝐴 ∩ 𝐵) ∪ (𝐴 ∩ 𝐶), (1.15)
𝐴 ∪ (𝐵 ∩ 𝐶) = (𝐴 ∪ 𝐵) ∩ (𝐴 ∪ 𝐶). (1.16)


1.13 S Y M M E T R i C D i F F E R E N C E 23

𝐵
𝐴

Figure 1.9: The symmetric difference 𝐴△𝐵, shaded.

Equations (1.15) and (1.16) are known as the Distributive Laws for sets.sets The
first law, (1.15), is sometimes described as saying that “intersection distributes over
union”. This means that, when taking the intersection of 𝐴 with a union of several
other sets, we can “distribute” the intersection among those other sets, taking all the
intersections separately, and then take the union. Similarly, the second law, (1.16), is
sometimes described as saying that “union distributes over intersection”. We will meet
very similar Distributive Laws later, in logic. As for De Morgan’s Laws, the algebra of
sets will be seen to mirror the algebra of logic.
Although this may be a new Distribute Law for you, the notion of a Distributive
Law should be familiar. You already know a Distributive Law for numbers. For any
real numbers 𝑎, 𝑏, 𝑐, we have

𝑎 × (𝑏 + 𝑐) = (𝑎 × 𝑏) + (𝑎 × 𝑐).

So multiplication distributes over addition. But, for numbers, addition does not dis-
tribute over multiplication: in general,

𝑎 + (𝑏 × 𝑐) ≠ (𝑎 + 𝑏) × (𝑎 + 𝑐).

(There are some cases where equality just happens to hold here, but they are atypical
and very rare.) So it is refreshing to work with sets, where the two operations are
distributive in all possible ways!

1.13 SYMMETRiC DiFFERENCE

The symmetric difference 𝐴△𝐵 of 𝐴 and 𝐵 is the set of elements that are in exactly
one of 𝐴 and 𝐵.
𝐴△𝐵 = {𝑥 ∶ 𝑥 ∈ 𝐴 or 𝑥 ∈ 𝐵 but not both}.
The “or” here is now exclusive in the sense that the possibility of belonging to both sets
is excluded. See Figure 1.9.
24 SETS

There are other ways of writing the symmetric difference in terms of our other
operations.

𝐴△𝐵 = (𝐴 ∖ 𝐵) ∪ (𝐵 ∖ 𝐴) (1.17)
= (𝐴 ∩ 𝐵) ∪ (𝐴 ∩ 𝐵),
𝐴△𝐵 = (𝐴 ∪ 𝐵) ∖ (𝐴 ∩ 𝐵). (1.18)

The symmetric difference of a set with itself is the empty set,

𝐴△𝐴 = ∅.

This is the only situation where the symmetric difference of two sets is empty. So the
symmetric difference enables a neat characterisation of when two sets are identical.
Theorem 4. 4 For any two sets 𝐴 and 𝐵, they are identical if and only if their symmetric
difference is empty.
Proof.

𝐴=𝐵 ⟺ 𝐴 ⊆ 𝐵 and 𝐵 ⊆ 𝐴
⟺ 𝐴 ∖ 𝐵 = ∅ and 𝐵 ∖ 𝐴 = ∅
⟺ (𝐴 ∖ 𝐵) ∪ (𝐵 ∖ 𝐴) = ∅
⟺ 𝐴△𝐵 = ∅

The symmetric difference of two sets is the same as the symmetric difference of the
complements.
Theorem 5.
5
𝐴△𝐵 = 𝐴△𝐵.
Proof.

𝐴△𝐵 = (𝐴 ∩ 𝐵) ∪ (𝐴 ∩ 𝐵)
= (𝐴 ∩ 𝐵) ∪ (𝐴 ∩ 𝐵)
= 𝐴△𝐵

1.14 C A RT E S i A N P R O D U C T

If we have two objects 𝑎 and 𝑏, then the ordered pair (𝑎, 𝑏) consists of both of them
together, in that order. You have used ordered pairs many times, for example as co-
ordinates of points in the 𝑥, 𝑦-plane, or as rows in a table with two columns.
1.14 C A RT E S i A N P R O D U C T 25

The Cartesian product 𝐴 × 𝐵 of two sets 𝐴 and 𝐵 is the set of all ordered pairs
consisting of an element of 𝐴 followed by an element of 𝐵:

𝐴 × 𝐵 = {(𝑎, 𝑏) ∶ 𝑎 ∈ 𝐴, 𝑏 ∈ 𝐵}.

So, if 𝐴 is the set of all possible values of 𝑥, and 𝐵 is the set of all possible values of
𝑦, then 𝐴 × 𝐵 is the set of all possible ordered pairs (𝑥, 𝑦) of these values.
For example, if 𝐴 = {King, Queen, Jack} and 𝐵 = {♣, ♡}, then

𝐴 × 𝐵 = { (King, ♣), (King, ♡), (Queen, ♣), (Queen, ♡), (Jack, ♣), (Jack, ♡) }.

The Cartesian product ℝ × ℝ is the set of all coordinates of points in the plane. If, for a
given community of people, 𝑃 is the set of all first (or personal) names and 𝐹 is the set
of all family names, then 𝑃 × 𝐹 is the set of all pairs (first name, family name). This
would cover all pairings of names actually used by people in that community, but would
typically include many unused pairings of names too.
If 𝐴 and 𝐵 are both finite sets, then the size of the Cartesian product is just the
product of the sizes of the two sets:

|𝐴 × 𝐵| = |𝐴| ⋅ |𝐵|. (1.19)

This is because we have |𝐴| possibilities for the first member of a pair, and |𝐵| possibili-
ties for the second member, and these choices are made independently of each other. In
more detail, each possibility for the first member gives |𝐵| possibilities for the second
member, so the total number of pairs is

|𝐵| + |𝐵| + ⋯ + |𝐵|



|𝐴| copies

which is just |𝐴| × |𝐵|.


The Cartesian product of three sets gives the set of all triples of the appropriate
kind:
𝐴 × 𝐵 × 𝐶 = {(𝑎, 𝑏, 𝑐) ∶ 𝑎 ∈ 𝐴, 𝑏 ∈ 𝐵, 𝑐 ∈ 𝐶}
For example, ℝ × ℝ × ℝ is the set of all coordinates of points in three-dimensional space.
More generally, if we have sets 𝐴1 , 𝐴2 , … , 𝐴𝑛 , then their Cartesian product is the
set of 𝑛-tuples, or sequences of length 𝑛, in which the first member (or co-ordinate) is
in 𝐴1 , the second is in 𝐴2 , and so on, with the 𝑖-th member belonging to 𝐴𝑖 for all
𝑖 ∈ {1, 2, … , 𝑛}:

𝐴1 × 𝐴2 × ⋯ × 𝐴𝑛 = {(𝑎1 , 𝑎2 , … , 𝑎𝑛 ) ∶ 𝑎1 ∈ 𝐴1 , 𝑎2 ∈ 𝐴2 , … , 𝑎𝑛 ∈ 𝐴𝑛 }.
26 SETS

Again, if all the sets are finite then the size of the Cartesian product is the product of
the sizes of all the sets:

|𝐴1 × 𝐴2 × ⋯ × 𝐴𝑛 | = |𝐴1 | ⋅ |𝐴2 | ⋅ ⋯ ⋅ |𝐴𝑛 |.

If the sets 𝐴1 , … , 𝐴𝑛 are all the same, then we can use an exponent to indicate how
many of them are in the product:

𝐴𝑛 = 𝐴 ×𝐴 ×⋯×𝐴

𝑛 factors
= {(𝑎1 , 𝑎2 , … , 𝑎𝑛 ) ∶ 𝑎𝑖 ∈ 𝐴 for all 𝑖 ∈ {1, 2, … , 𝑖} }.

For example,

{0, 1}3 = {(0, 0, 0), (0, 0, 1), (0, 1, 0), (0, 1, 1), (1, 0, 0), (1, 0, 1), (1, 1, 0), (1, 1, 1)}.

In the special case when 𝐴 is an alphabet (i.e., a finite set of characters), we often write
𝑛-tuples in 𝐴 𝑛 as strings of length 𝑛. So, for the binary alphabet {0,1}, we can write

{0, 1}3 = {000, 001, 010, 011, 100, 101, 110, 111},

as we did in § 1.5.
For another example, the sets of coordinates of points in two- and three-dimensional
space are ℝ2 = ℝ × ℝ and ℝ3 = ℝ × ℝ × ℝ, respectively. These are used extensively to
model physical spaces, since the space around us is three-dimensional and we often deal
with surfaces (terrain, paper, screens) that are two-dimensional. Higher-dimensional
spaces are also useful. The set of coordinates in 𝑛-dimensional space is

ℝ𝑛 = ℝ ×ℝ×⋯×ℝ.

𝑛 axes

Spaces of more than three dimensions are hard to visualise, since the physical space we
live in is only three-dimensional. But they are very useful and powerful. Models devel-
oped by machine learning programs can require millions or even billions of dimensions.
For the two smallest exponents, we have, for any set 𝐴,

𝐴 0 = ∅, 𝐴 1 = 𝐴.

If 𝐴 is finite, the size of 𝐴 𝑛 is just |𝐴|𝑛 .

1.15 PA RT i T i O N S

A partition of a set 𝐴 is a set of disjoint nonempty subsets of 𝐴 whose union is 𝐴.


These subsets are called the parts (or blocks or cells
cells) of the partition.
1.15 PA RT i T i O N S 27

Equivalently, a set of nonempty subsets of 𝐴 is a partition of 𝐴 if and only if every


member of 𝐴 belongs to exactly one of the parts.
Suppose 𝐴 = {𝑎, 𝑏, 𝑐}. One possible partition of 𝐴 is

{ {𝑎, 𝑐}, {𝑏} }.

This partition has two parts: {𝑎, 𝑐} and {𝑏}. These parts are each nonempty, and
disjoint, and their union is 𝐴, so the definition is satisfied. Another partition of 𝐴 is

{ {𝑎}, {𝑏}, {𝑐} }.

This partition has three parts, namely the three sets {𝑎}, {𝑏}, {𝑐}. At the other extreme,
we have a partition of 𝐴 with just one part:

{ {𝑎, 𝑏, 𝑐} }.

Our set 𝐴 is small enough that we can list all its five partitions:

partition # parts
{ {a, b, c} } 1
{ {a,b}, {c} } 2
{ {a,c}, {b} } 2
{ {b,c}, {a} } 2
{ {a}, {b}, {c} } 3

There are several ways in which a collection of subsets of a set can fail to be a
partition. For our set 𝐴 = {𝑎, 𝑏, 𝑐}, the collection { {𝑎, 𝑏}, {𝑐}, ∅ } fails because one of
its members is empty. The collection { {𝑎, 𝑏}, {𝑏, 𝑐} } fails because its members are not
all disjoint, in particular {𝑎, 𝑏} ∩ {𝑏, 𝑐} = {𝑏} ≠ ∅, so 𝑏 belongs to two members of the
collection instead of just one. The collection { {𝑎}, {𝑏} } fails because the union of the
collection’s members is not the entire set 𝐴, i.e., {𝑎} ∪ {𝑏} = {𝑎, 𝑏} ≠ 𝐴; in particular, 𝑐
does not belong to any members of this collection.
Partitions have many applications. Consider, for example, classification. Suppose
that 𝐴 is a collection of plant specimens. We would like to classify the specimens
according to their species: specimens from the same species are grouped together, while
those from different species are kept in separate groups. (Some groups may have just
one specimen, if there is no other specimen of the same species.) These groups form the
parts of a partition of 𝐴, with each part corresponding to one of the species represented
in the collection. Finding such classifications, from data obtained from specimens, is a
major topic in machine learning.
The number of partitions of a finite set grows rapidly as the size of the set increases.

set size, 𝑛 1 2 3 4 5 6 7 8 9 10
# partitions 1 2 5 15 52 203 877 4140 21147 115975
28 SETS

We can also talk about partitions of infinite sets. For example, { {even numbers}, {odd
numbers} } is a partition of the set of nonnegative integers; this partition has two parts.
Consider also the following partition of the set 𝐴 ∗ of all strings over a finite alphabet 𝐴:

{𝐴 𝑛 ∶ 𝑛 ∈ ℕ0 }.

This partition has infinitely many parts, one for each 𝑛 ∈ ℕ0 . The parts are the sets of
all strings of a given length.
Every set 𝐴 has two partitions that might be thought of as “extreme”, but in opposite
directions.
• The coarsest partition of 𝐴 is the partition { 𝐴 } which has just one part, namely
the entire set 𝐴 itself. In effect, everything in 𝐴 is “lumped together”.
• The finest partition of 𝐴 is the partition { {𝑎} ∶ 𝑎 ∈ 𝐴 } which has one part for
each element of 𝐴, and each part contains only that element. If 𝐴 is finite, then
this partition has |𝐴| parts; if 𝐴 is infinite, then this partition has infinitely many
parts. Every part of this partition is as small as a part of a partition can be. In
effect, all the elements of 𝐴 are “kept apart” from each other.
If 𝐴 has just one element, then the coarsest and finest partitions are the same, but if
𝐴 is larger then they are different. If 𝐴 has just two elements, then these are the only
partitions of 𝐴, but if 𝐴 is larger, then it has many other partitions too, with all the
others being in a sense intermediate between these two extreme partitions.

1.16 EXERCiSES

1. Why does the set difference only satisfy (1.9) when 𝐴 ⊆ 𝐵? How would you modify
(1.9) so that it covers any set difference 𝐵 ∖ 𝐴? What extra information about the sets
would you need, in order to determine |𝐵 ∖ 𝐴|?

2. We mentioned at the start of this chapter that sets are used to define types of
objects in many programming languages. For example, in C, the statement
int monthNumber;

declares that the variable monthNumber has type int. The declaration also assigns a
piece of memory to the variable, to contain the values that the variable has during the
computation. Similarly, the statement
char monthName[10];

is C’s way of declaring that the variable monthName is a string of at most 9 characters;
again, a piece of memory is allocated to it as well.
Let Int be the set of possible values for a variable of type int. Similarly, let String
be the set of possible values for a variable that is declared to be a string of at most 9
letters.
1.16 E X E R C i S E S 29

In C, we can combine declarations in order to create more complex types.

(a) The following statement creates a new type, called aNewType, for representing objects
consisting of any int followed by any string; it also sets aside consecutive pieces of
memory, so that the int is followed by the string in memory. It also declares the variable
monthBothWays to be of this type.

struct aNewType {
int year;
char monthName[10];
} monthBothWays;

Using the sets Int and String, together with a standard set operation, what set is repre-
sented by the type aNewType?

(b) The following statement creates another new type, called anotherNewType, for rep-
resenting objects that can be either an int or a string. It sets aside a piece of memory
that is large enough to contain either an int or a string; at any one time, it will contain
just one of these. It also declares the variable monthEitherWay to be of this type.

union anotherNewType {
int year;
char monthName[10];
} monthEitherWay;

Using the sets Int and String, together with a standard set operation, what set is repre-
sented by the type anotherNewType?

3. Suppose 𝐴 and 𝐵 are subsets of some universal set 𝑈.

(a) If 𝐴 ⊆ 𝐵, what is 𝐴 ∪ 𝐵?

(b) If 𝐴 ⊈ 𝐵, what can you say about 𝐴 ∪ 𝐵?

(c) Complete the following: 𝐴 ⊆ 𝐵 if and only if 𝐴 ∪ 𝐵 = .

(d) Devise another equivalent condition for 𝐴 ⊆ 𝐵 involving intersection instead of


union.

(e) Devise another equivalent condition for 𝐴 ⊆ 𝐵 involving set difference.

4. Consider the following diagrams. The one on the left shows the set {𝑎} and its
sole subset, ∅. The one on the right shows {𝑎, 𝑏} and all its subsets.
30 SETS

{𝑎, 𝑏}

{𝑎}

{𝑎} {𝑏}

The requirements of this diagram are:


• Sets are just represented by writing them (in their usual textual form, with ele-
ments listed between curly braces). These are not Venn diagrams.

• Smaller sets are lower, on your page, than larger sets.

• Sets of the same size are shown on the same horizontal level.

• The arrows indicate when a lower set is a subset of another set that has just one
extra element. If 𝑋 ⊂ 𝑌 and |𝑌| = |𝑋 | + 1 then there is an arrow from 𝑋 to 𝑌.

• The diagram is as neat and symmetric as possible.


(a) Draw a diagram of this type that shows all subsets of {𝑎, 𝑏, 𝑐} and the subset relation
among them.

(b) For every pair of sets 𝑋 , 𝑌 such that 𝑋 ⊂ 𝑌 and |𝑌| = |𝑋 | + 1, label the correspond-
ing arrow in your diagram by the sole member of 𝑌 ∖ 𝑋 .

(c) Suppose you are now liberated from the requirement to draw your diagram on a
medium of only two dimensions such as paper or a computer screen. How could you
draw this diagram in three dimensions in a natural way?

(d) For a set of 𝑛 elements, how many sets and how many arrows does a diagram of this
type have?

(e) For each element of the 𝑛-element set considered in (d), how many arrows are la-
belled by that element (if we label them as in (c))?

(f) In such a diagram, suppose we have two sets 𝑋 , 𝑌 that satisfy 𝑋 ⊆ 𝑌. How many
directed paths are there from 𝑋 to 𝑌? Give an expression for this.
• A directed path is a path along arrows in which all arrows are directed forwards;
you can’t go backwards along an arrow. Paths are counted as different as long as
they are not identical; they are allowed to have some overlap.
1.16 E X E R C i S E S 31

(g) With 𝑋 , 𝑌 as in (f), what is the maximum number of mutually internally disjoint
paths from 𝑋 to 𝑌? Describe this

• An internal set on a path is a set on the path that is not the start or end of
the path, i.e., it’s not 𝑋 or 𝑌. Two paths are internally disjoint if no inter-
nal set on either path appears anywhere on the other path. If we have a col-
lection of some number of paths (possibly more than two), then the paths are
mutually internally disjoint if every pair of paths in the collection are inter-
nally disjoint.

(h) Explain how to use the diagram to find, for any two sets in it, their union and
intersection.

5. Given 𝑈 = {𝑒1 , 𝑒2 , … , 𝑒𝑛 }, the characteristic string of a subset 𝐴 of 𝑈 is the


string of 𝑛 bits 𝑏1 𝑏2 ⋯ 𝑏𝑛 where the 𝑖-th bit indicates whether or not 𝑒𝑖 belongs to 𝐴:

1, if 𝑒𝑖 ∈ 𝐴;
𝑏𝑖 = 
0, if 𝑒𝑖 ∉ 𝐴.

Write down the characteristic string of each of the subsets of a set of three elements.
List them, one above the other, so that each differs from the one above it in just one
bit.
See if you can extend this to subsets of a set of 𝑛 elements.
This has algorithmic applications. Suppose we want to search through all subsets
of a set, by looking at each of their characteristic strings in turn. If each characteristic
string differs from its predecessor in only one bit, then moving from one characteristic
string to the next requires fewer changes than may be required otherwise, which saves
time.

6. In a Venn diagram, the closed curves representing the sets together divide the
plane into regions. A single set divides the plane — or the portion of the plane within
the rectangular box representing the universal set, if that is shown in the diagram —
into two regions, its interior and exterior. See Figure 1.3, where the regions correspond
to 𝐴 and 𝐴 (with the latter shaded in that particular diagram, as it was being used to
explain the complement, but that does not have to be done in general). Two intersecting
sets, represented by two closed curves, divide the plane into four basic regions. (See
Figure 1.4, Figure 1.5, and Figure 1.6.) If the sets are 𝐴 and 𝐵, then the basic regions
correspond to 𝐴 ∪ 𝐵, 𝐴 ∩ 𝐵, 𝐴 ∖ 𝐵 and 𝐵 ∖ 𝐴. Three sets can be drawn to divide the
plane into eight regions.

(a) Label each basic region of a Venn diagram for three sets 𝐴, 𝐵, 𝐶 with appropriate
intersections of sets.
32 SETS

A general Venn diagram for 𝑛 sets 𝐴1 , 𝐴2 , … , 𝐴𝑛 is one in which every possible


intersection of the form

⒧𝐴1 or 𝐴1 ⒭ ∩ ⒧𝐴2 or 𝐴2 ⒭ ∩ ⋯ ∩ ⒧𝐴𝑛 or 𝐴𝑛 ⒭

corresponds to one of the basic regions of the diagram. This is intended to be an


intersection of 𝑖 sets, the 𝑖-th of which is either 𝐴𝑖 or 𝐴𝑖 . For each 𝑖, there is a choice
of two options, so the total number of basic regions is 2𝑛 .
For example, the Venn diagrams in each of these figures are general Venn diagrams:
Figure 1.3, Figure 1.4, Figure 1.5, Figure 1.6, Figure 1.8, Figure 1.9.
But the Venn diagram in Figure 1.1 is not a general Venn diagram, because there
is no region for 𝐴 ∩ 𝐵 (which is fine in that case, because it is illustrating 𝐴 ⊆ 𝐵). The
Venn diagram in Figure 1.7 is not a general Venn diagram, because there is no region
for 𝐴 ∩ 𝐵 (which is fine in that case, because it is illustrating the disjoint union 𝐴 ⊔ 𝐵).

(b) What is the maximum number of sets for which a general Venn diagram can be
drawn in which all the sets are circles of the same size?
(c) Draw a general Venn diagram in the plane for four sets. You can use closed curves
other than circles.
(d) How could you draw a three-dimensional general Venn diagram using four sets,
each represented as a sphere?

Drawing Venn diagrams (not necessarily general ones) to illustrate relationships


among a collection of sets is one of many topics studied in information visualisation.
There are many different criteria to aim for, including using only simple curves, not
having too much variability in sizes of the basic regions, and symmetry. These criteria
can conflict with each other. Algorithms have been developed and theorems have been
proved.

7. Prove that if 𝐴 ⊆ 𝐵 then 𝒫(𝐴) ⊆ 𝒫(𝐵).

8. Consider the sequence of binomial coefficients ⒧𝑛𝑟⒭ with 𝑛 fixed and 𝑟 going from
0 to 𝑛:
𝑛 𝑛 𝑛 𝑛 𝑛
⒧ ⒭, ⒧ ⒭, ⒧ ⒭, … … , ⒧ ⒭, ⒧ ⒭.
0 1 2 𝑛−1 𝑛
(a) Using one of the formulas for ⒧𝑛𝑟⒭, prove that these binomial coefficients increase as
𝑟 goes from 0 to ⌊𝑛/2⌋ and then decrease as 𝑟 goes from ⌈𝑛/2⌉ to 𝑛.
• Here, ⌊𝑛/2⌋ is the “floor” of 𝑛/2, which is the greatest integer ≤ 𝑛/2. If 𝑛 is
even, this is just 𝑛/2 itself, but if 𝑛 is odd (which means 𝑛/2 is not an integer),
its floor is the integer (𝑛 − 1)/2.
Similarly, ⌈𝑛/2⌉ is the “ceiling” of 𝑛/2, which is the least integer ≥ 𝑛/2. If 𝑛 is
even, this is just 𝑛/2 again, but if 𝑛 is odd, its ceiling is the integer (𝑛 + 1)/2.
1.16 E X E R C i S E S 33

• So, what we are saying here is that:


– If 𝑛 is even, ⒧𝑛𝑟⒭ increases as 𝑟 goes from 0 to 𝑛/2, then it decreases as 𝑟
goes from 𝑛/2 to 𝑛.
– If 𝑛 is odd, ⒧𝑛𝑟⒭ increases as 𝑟 goes from 0 to (𝑛 − 1)/2, then it decreases as
𝑟 goes from (𝑛 + 1)/2 to 𝑛.
– In the odd case, we haven’t yet said anything about what happens when 𝑟
goes from (𝑛 − 1)/2 to (𝑛 + 1)/2. See if you can work that out too.

(b) Now prove that, for every positive integer 𝑟 in the range 1 ≤ 𝑟 ≤ 𝑛 − 1,
2
𝑛 𝑛 𝑛
⒧ ⒭ > ⒧ ⒭⒧ ⒭.
𝑟 𝑟 −1 𝑟 +1

This is an important property and means that the sequence is said to be strictly
log-concave. (If the inequality is just ≥ instead of >, then it’s log-concave.)

(c) Suppose you and a friend are considering a fixed set of size 𝑛. Suppose you get to
choose an ordered pair of subsets of size 𝑟 of the set of size 𝑛, with no restriction
at all (so your two sets are allowed to overlap, or be disjoint, or be identical, or
whatever; the only rule is that they both have to have size 𝑟. Suppose also that
your friend gets to choose one subset of size 𝑟 − 1 and another of size 𝑟 + 1, with
again no restriction on the sets apart from these size requirements. Who has more
options, you or your friend? Does the answer depend on 𝑛 and 𝑟 in any way? If so,
how? If not, why not? Is there any situation where the numbers of choices that you
and your friend have are the same?

9. Express |𝐴 ∪ 𝐵| in terms of |𝐴|, |𝐵|, |𝐴 ∩ 𝐵|.

10. Express |𝐴 ∩ 𝐵| in terms of |𝐴|, |𝐵|, |𝐴 ∪ 𝐵|.

11. Express |𝐴 ∪ 𝐵 ∪ 𝐶| in terms of |𝐴|, |𝐵|, |𝐶|, |𝐴 ∩ 𝐵|, |𝐴 ∩ 𝐶|, |𝐵 ∩ 𝐶|, |𝐴 ∩ 𝐵 ∩ 𝐶|.

12. Express |𝐴 ∩ 𝐵 ∩ 𝐶| in terms of |𝐴|, |𝐵|, |𝐶|, |𝐴 ∪ 𝐵|, |𝐴 ∪ 𝐶|, |𝐵 ∪ 𝐶|, |𝐴 ∪ 𝐵 ∪ 𝐶|.

13. Suppose we have sets 𝐴1 , 𝐴2 , … , 𝐴𝑛 . For each 𝑘, let

𝑖𝑘 ∶= sum of all sizes of intersections of 𝑘 of the sets.

So 𝑖1 gives just the sum of the sizes of the 𝑛 sets, and 𝑖2 gives the sum of the sizes of
each intersection of two of the 𝑛 sets, and so on.
34 SETS

For example, if 𝑛 = 3, our sets are 𝐴1 , 𝐴2 , 𝐴3 , and

𝑖1 = |𝐴1 | + |𝐴2 | + |𝐴3 |,


𝑖2 = |𝐴1 ∩ 𝐴2 | + |𝐴1 ∩ 𝐴3 | + |𝐴2 ∩ 𝐴3 |,
𝑖3 = |𝐴1 ∩ 𝐴2 ∩ 𝐴3 |.

Rewrite your expressions for |𝐴 ∪ 𝐵| (for which 𝑛 = 2) and |𝐴 ∪ 𝐵 ∪ 𝐶| (for which 𝑛 = 3)


in terms of 𝑖1 , 𝑖2 , ….
Then write out a general expression for |𝐴1 ∪𝐴2 ∪𝐴3 ∪⋯∪𝐴𝑛 |. Pay careful attention
to the sign of the coefficient of the last term in the sum.
For this exercise, you don’t need to prove that your expression is correct. You will
learn techniques for doing that later, in Chapter 3. But, if you do try to do a proof
(even if only briefly), you will be even better prepared for that later material.

14. Now let


𝑢𝑘 ∶= sum of all sizes of unions of 𝑘 of the sets.
So, for example, if 𝑛 = 3, with the same sets as before, we have

𝑢1 = |𝐴1 | + |𝐴2 | + |𝐴3 |,


𝑢2 = |𝐴1 ∪ 𝐴2 | + |𝐴1 ∪ 𝐴3 | + |𝐴2 ∪ 𝐴3 |,
𝑢3 = |𝐴1 ∪ 𝐴2 ∪ 𝐴3 |.

Rewrite your expressions for |𝐴 ∩ 𝐵| and |𝐴 ∩ 𝐵 ∩ 𝐶| in terms of 𝑢1 , 𝑢2 , ….


Then write out a general expression for |𝐴1 ∩𝐴2 ∩𝐴3 ∩⋯∩𝐴𝑛 |. Pay careful attention
to the sign of the coefficient of the last term in the sum.
For this exercise, you don’t need to prove that your expression is correct.

15. Express |𝐴△𝐵| in terms of the sizes of some other sets in three different ways.

16. Draw a Venn diagram for general sets 𝐴, 𝐵, 𝐶 and shade the region(s) that form
𝐴△𝐵△𝐶.

17. Prove that, for any sets 𝐴 and 𝐵,

(𝐴 ∪ 𝐵)△(𝐴 ∩ 𝐵) = 𝐴△𝐵.

18. Suppose 𝐴1 , 𝐴2 , … , 𝐴𝑛 are sets. How would you characterise 𝐴1 △𝐴2 △ ⋯ △𝐴𝑛 ?
More specifically, the members of 𝐴1 △𝐴2 △ ⋯ △𝐴𝑛 are those that satisfy some specific
condition on how many of the sets 𝐴1 , 𝐴2 , … , 𝐴𝑛 they belong to. What is that condition?
1.16 E X E R C i S E S 35

For this exercise, you don’t need to construct a formal proof that your condition is
correct.

19. If 𝐴 and 𝐵 are sets, does 𝐴 × 𝐵 equal 𝐴 × 𝐵? If so, prove it. If not: give an
example when it does not hold; characterise those cases when it does hold; and determine
what can be said about the relationship between them.

20. Write down all partitions of {1,2,3,4} into three parts.

21. Suppose 𝐴, 𝐵, 𝐶 ⊆ 𝑈 are nonempty sets such that all basic regions in their general
Venn diagram are nonempty (see ??). Then the three sets 𝐴, 𝐵, 𝐶 define a partition of
𝑈 into eight parts. What are those parts? Express each part in terms of one or more of
𝐴, 𝐵, 𝐶 using set operations.

22. Does ℝ have a partition in which all parts are

(a) open intervals?

(b) half-open half-closed intervals?

(c) closed intervals?


2
FUNCTIONS

Computation helps get things done. By “things”, we mean tasks where we take some
information and transform it somehow. To specify such a task, we must specify:

• what we start with;

• what we end up with;

• what needs to be done.

For example, suppose we want to sort a list of names into alphabetical order. We
first specify exactly what kinds of lists of names we are dealing with (which alphabet?
any requirements or restrictions on the names to be considered? how long can the names
be? how many names are we allowed to have in the list? etc.). Then we specify that
we’ll end up with another list. Then we specify what we want to do to our first list,
which is to sort it. So the list we end up with has all the same names as the list we
started with, but now they are in alphabetical order.
Or suppose you want to determine the most goals kicked by any team in any season of
your favourite football competition, from records of games. We first specify exactly what
kinds of records we are working with: the information in them and how it is arranged.
Then we specify that we’ll end up with a number (to be precise, a non-negative integer).
Then we specify that this number must be the maximum, over all seasons and all teams,
of the number of goals kicked by that team in that season.
To model such tasks precisely, we use functions.

2.1𝛼 DEFiNiTiONS

A function 𝑓 consists of

• a set called the domain


domain,

• a set called the codomain


codomain,

• a specification, for each element 𝑥 of the domain, of a unique member 𝑓(𝑥) of the
codomain. This member, 𝑓(𝑥), is the value of the function for the argument 𝑥.

37
38 FUNCTiONS

Informally, the argument 𝑥 “goes in” and the value 𝑓(𝑥) “comes out”. These terms
have some common synonyms.

• The argument 𝑥 is also called the parameter or the input.

• The value 𝑓(𝑥) is also called the output.

But some care is needed with the use of “input” and “output” in this context, since
in programming “input” is often used for extra information that a program reads from
another source (such as a file), while “output” is often used for information a program
writes to a file or screen, and these may be quite different to the argument and value.
Functions are ubiquitous in computer science, as well as in science, engineering,
economics, finance, and any other field where symbolic objects of some type need to be
transformed into something else in a precisely specified way.
We need to make some key points about the three parts of the definition of a func-
tion 𝑓.

2.1.1𝛼 Domain

The domain of a function 𝑓 can be any set, finite or infinite, provided that:

• Every member of the domain is considered to be a valid argument for the function.
So, for every member 𝑥 of the domain, its corresponding value 𝑓(𝑥) must be
properly defined and “make sense”.

• For everything that does not belong to the domain, the function is considered to
be undefined and 𝑓(𝑥) has no meaning.

For our Sorting example, the domain is the set of all possible lists of names of the
required type.
For the Maximum Goals example, it is the set of all possible records of all the games
in a single season. Note that the domain is not just the set of all past seasons’ records,
since we want our function to be able to determine the required information for any
possible future season as well.
Suppose now that we have a function called SumOfFourIntCubes which takes any
four integers and finds the sum of their cubes, according to the rule

SumOfFourIntCubes(𝑤, 𝑥, 𝑦, 𝑧) = 𝑤 3 + 𝑥3 + 𝑦 3 + 𝑧 3 .

Then its domain is the set of all quadruples of integers, which we can write as ℤ×ℤ×ℤ×ℤ.
We could also envisage a function SumOfFourRealCubes that takes any four real numbers
and finds the sum of their cubes. The rule looks the same:

SumOfFourRealCubes(𝑤, 𝑥, 𝑦, 𝑧) = 𝑤 3 + 𝑥3 + 𝑦 3 + 𝑧 3 .
2.1𝛼 D E F i N i T i O N S 39

But its domain is now ℝ × ℝ × ℝ × ℝ. Since this function has a different domain, it is
considered to be a different function, even though the rule looks the same.
Why are functions with the same rule considered to be different if their domains are
different? Why would we even need the function SumOfFourIntCubes when we can use
the function SumOfFourRealCubes to do everything it does and more?
There are several reasons for this.
• Firstly, rules may look the same without actually being the same, because of details
of how the operations in the rule depend on the types of objects being used. A
multiplication symbol might be used to multiply numbers and also to multiply
matrices, but they are very different operations.

• Secondly, the details of implementing functions in code will depend crucially on


the specific types of objects they work with. In computing, real numbers are repre-
sented very differently to integers, even though we think of integers mathematically
as a special case of real numbers. So some of the technical details of writing a pro-
gram will be different for SumOfFourIntCubes and SumOfFourRealCubes. So, when
we think of functions as specifications of tasks to be programmed, the specification
of the domain is a crucial part of the definition.

• Thirdly, the purpose of the domain is not merely to help explain the rule; it is also
a promise to the function’s user that the function will work for every member of
the domain. Sometimes, you may want a larger domain so that the function works
for as many cases as possible. But bigger promises require more work to keep!
Sometimes, it is better to use a more modest domain, provided it still captures
everything that your function is supposed to work with; a more modest promise
is easier to keep!
If 𝑓 is a function, then we write dom(𝑓) for the domain of 𝑓. So

dom(SumOfFourIntCubes) = ℤ × ℤ × ℤ × ℤ,
dom(SumOfFourRealCubes) = ℝ × ℝ × ℝ × ℝ.

2.1.2𝛼 Codomain & co.

The codomain of a function 𝑓 must include every possible value of 𝑓(𝑥) for every
member 𝑥 of the domain. But the codomain is allowed to be “loose”, in the sense that
it is allowed to include other stuff too. We do not need to ensure that the only things
in the codomain are things we can get by applying our function 𝑓 to some member of
its domain.
So, for the codomain, instead of specifying the possible function values exactly, we
specify some superset of the set of possible function values.
The exact set of possible values of a function is called the image of the function.
This is a subset of the codomain, but not necessarily a proper subset.
40 FUNCTiONS

You may wonder at this point, why do we allow such “looseness” in the codomain
when we were so insistent that the domain be the exact set of allowed arguments of the
function? Why shouldn’t we use the image, instead of the codomain, when defining a
function?
The reason for this is practical. It is often harder to know the image than it is to
specify a natural codomain. Sometimes it’s impossible.
In our Maximum Goals example above, the codomain is the set ℕ ∪ {0}. This does
not mean that every nonnegative integer must arise as a possible maximum number of
goals kicked by any team in a season. Indeed, some numbers (e.g., 10100 ) could never
arise in this way in practice. But it may be hard or impossible to know exactly which
numbers are feasible values and which are not. So it is more practical to give a codomain,
in the form of a simple, easily-described set which we know includes all possible function
values (even if it has other things too). The set ℕ ∪ {0} works well.
In some cases, the difficulty may be computational. For example, consider the
notorious Pompous Python function which takes any positive integer 𝑛 and gives,
as its value, the length (in characters) of the longest output that can be printed by
any Python program which reads no input, eventually stops, and whose source code file
has at most 𝑛 characters.1 On the face of it, this seems painful to compute, because
there are so many programs that could be considered (once 𝑛 is large enough to allow
interesting programs of at most 𝑛 characters to be written). In fact, it’s worse than that;
it can be shown that this function is impossible to compute perfectly. (This fact is not
obvious, but is a consequence of some famous results on uncomputability from the 1930s.
Uncomputability is covered in detail in the unit FIT2014 Theory of Computation.) So it
is impossible, in a precise sense, to know exactly which numbers can be possible values
of this function. Therefore, in specifying the function, we would prefer not to have
to specify, in advance, which numbers to allow for its possible values! So, instead, we
specify a suitable codomain, and in this case ℕ ∪ {0} will do fine, even though only very
few numbers can actually be values of the Pompous Python function.
In other cases, the difficulty may be the limits of our current knowledge. Consider
again SumOfFourIntCubes. It is not yet known whether or not every integer can be
written as the sum of four cubes, so we do not know if the values taken by this function
are all integers or only some subset of them. But we can specify the codomain to be
ℤ and know that this is a superset (but not necessarily a proper superset, as far as we
currently know) of the actual set of possible values.
The codomain, then, represents a promise to users of the function that all the
values they get from it will be in that set, but it gives no guarantee that every one of
its members is an actual function value.

1 Here we envisage an ideal computing environment where arbitrarily long programs can be run, arbitrarily
long outputs can be printed, and programs can take an arbitrarily long time to run before stopping. If
a program crashes or prints no output, then we define the length of its output to be 0. If a program
runs forever, in an infinite loop, then we exclude it from consideration, regardless of how much output it
produces.
2.1𝛼 D E F i N i T i O N S 41

It is good to know the image too, when that’s possible. But we do not want our
definition of the function concept to be hampered by the difficulties that are often
associated with specifying the image. So we do not include a specification of the image
in our specification of the function. Rather, we use the codomain as the “next best
thing”.
Of course, if we do know the image, there is nothing to stop us from stating it as
our codomain. But, even then, it is often neater to specify a simple codomain than
it is to specify the image. For example, a function with domain ℕ whose value is the
𝑛-th Fibonacci number has, as its image, the set of all Fibonacci numbers, and it’s easy
enough to write that down.2 But it’s even easier to specify ℕ as a codomain.
A function in which the codomain is the same as the image of the function is said
to map its domain onto its codomain, and may be said to be onto onto. Such a function is
also said to be surjective and is called a surjection
surjection.
Finally, a word of warning! We have studiously avoided using the word “range”. This
is because the term is, confusingly, used in two different ways: sometimes, it means the
image, while at other times, it means the codomain. We will dodge this issue by not
using “range” at all.

2.1.3𝛼 Rule

The rule of a function must specify the relationship between each member of its do-
main and the corresponding value of the function. But it does not, in general, give an
algorithm for computing the function.
In other words, the rule specifies what must be done, but it does not need to specify
how it is to be done.
In our Sorting example, the rule is that the function’s value is the sorted list of
names. This rule does not specify how the sorting is to be done. As computer science
students, you will meet many different sorting algorithms, including Bucket Sort, Merge
Sort, Insertion Sort, and Quick Sort. They all have their strengths and all could be
used to compute our sorting function. But the function itself does not include a choice
of which algorithm we will use to compute it; that choice is a separate issue to the
specification of the function.
Sometimes, a function’s rule does give some information on how to compute it. For
example, consider a function that squares integers. It has domain ℤ, and we’ll use the
(loose) codomain ℤ too. Its rule is just that any integer argument is squared. This may
be thought of as a small algorithm: it tells you what the value is in such a way that
you also know how to work it out. Or do you? There is more than one algorithm for
squaring an integer! To specify the rule, we don’t need to specify which algorithm is to

2 The Fibonacci numbers are the numbers you get by starting with two consecutive 1s and then repeatedly
adding the two most recent numbers together to get the next number. So the Fibonacci sequence is:
1,1,2,3,5,8,13,21,34,55,89,144,233,….
42 FUNCTiONS

be used; we only need to give enough information so that the reader can know, for each
argument, what its corresponding value is.
A function’s rule associates, to each argument in its domain, a unique value in its
codomain. One way to specify this information is to give the set of all possible ordered
pairs (𝑥, 𝑓(𝑥)). For example, consider a function Employer, defined as follows. Its domain
is the set

{Annie Jump Cannon, Henrietta Swan Leavitt, Muriel Heagney, Winsome Bellamy},

which consists of four human computers who worked at various astronomical observato-
ries. For its codomain, we use the set of all astronomical observatories over the last two
centuries. The rule gives values to arguments as follows.

Employer(Annie Jump Cannon) = Harvard College Observatory,


Employer(Henrietta Swan Leavitt) = Harvard College Observatory,
Employer(Muriel Heagney) = Melbourne Observatory,
Employer(Winsome Bellamy) = Sydney Observatory.

We can specify this rule by giving the following set of ordered pairs (computer, Employer(computer)).

{ (Annie Jump Cannon, Harvard College Observatory),


(Henrietta Swan Leavitt, Harvard College Observatory),
(Muriel Heagney, Melbourne Observatory),
(Winsome Bellamy, Sydney Observatory) }

Note that this function is not a surjection, since its image is only

{ Harvard College Observatory, Melbourne Observatory, Sydney Observatory }

which is a (very) proper subset of the codomain.


To do the same with our squaring function would require an infinite set of ordered
pairs,
{(0, 0), (1, 1), (−1, 1), (2, 4), (−2, 4), …},
which we can write succinctly, using our standard conventions for writing sets, as

{(𝑥, 𝑥2 ) ∶ 𝑥 ∈ ℤ}.

The set of all ordered pairs (𝑥, 𝑓(𝑥)) of a function 𝑓 is called the graph of the
function. This term reminds us that we often illustrate a function by drawing a plot of
all points (𝑥, 𝑓(𝑥)), with the horizontal axis containing the domain and the vertical axis
containing the codomain. We are used to referring to such a plot as a “graph” of the
function, but the term graph as just defined is more abstract: it just refers to the set of
pairs, without regard to how they might be displayed to a reader.
2.1𝛼 D E F i N i T i O N S 43


Greenwich Observatory
Annie Jump Cannon

Harvard College Observatory


Henrietta Swan Leavitt

Melbourne Observatory
Muriel Heagney

Sydney Observatory
Winsome Bellamy

Jantar Mantar

domain ⋮

codomain

Figure 2.1: The Employer function.

Plots of functions, using horizontal and vertical axes, are convenient visual ways
to display information about the function, but they also have their limitations. Many
domains and codomains do not have an inherently one-dimensional structure, and in-
creasing the number of dimensions — say, by using a 3D plot — does not always help.
Some domains and codomains are not geometric in character at all. For example, the
domain of Employer is a set of four people, and the domain of our Sorting function is the
set of all possible lists of names, neither of which are defined in numerical or geometric
terms.
There are other ways to depict functions. For example, we could start with a Venn
diagram of the domain and codomain, draw points within the domain representing its
members, and then, for every pair (𝑥, 𝑓(𝑥)), draw an arrow from 𝑥 to 𝑓(𝑥) to indicate
that the function sends 𝑥 to 𝑓(𝑥). Because 𝑓 is a function, every point in the domain
has exactly one arrow going out of it. Our Employer function is depicted in this way in
Figure 2.1.
44 FUNCTiONS

2.2𝛼 FUNCTiONS iN COMPUTER SCiENCE

2.2.1𝛼 Functions from analysis

In software development, the first task is to work out what must be done. This process
is traditionally called analysis and usually involves extensive communication with the
owner of a problem (e.g., a client) in order to come up with a precise description of the
task at hand. One possible outcome of this analysis process is a function.
At this stage, we have not yet worked out how to solve the problem at hand. But
at least we have a precise statement of the task to be done (the “what”). With this, we
can then try to design a method for doing this task (the “how”). If our task is specified
by a function, then we will design an algorithm for the function.
Once we have an algorithm, we proceed to implementation
implementation, or programming the
algorithm using a programming language such as Python.
Of course, this is a very simplified and incomplete view of software development.
The process is seldom purely linear; each step often involves going back and re-doing
parts of a previous stage. The design process might highlight problems, or gaps, in
the specification, which may require further communication with the problem owners
to sort out, leading to changes to the specification. Or the clients may simply change
their minds about some aspect of the task! The implementation process may bring to
light some problems with the design which must then be fixed. There are later stages
we haven’t mentioned, notably maintenance
maintenance. And not all problem analysis leads to
a function specification. For example, some may lead to a specification of how the
various components of some system must interact; some may lead to a specification of
a database. But functions remain a very important product of analysis, partly because
more complicated systems often contain functions as components.

2.2.2𝛼 Functions in mathematics and programming

Our view of functions, as specifications of what rather than how, originated in mathe-
matics although is now widespread in computer science and other disciplines. But, as
you study programming, you will find that the term is used in another way too.
In many programming languages, and even in many pseudocode conventions for
writing algorithms, a definition of a “function” has — in addition to the parts discussed
here — some code in the programming language, or an algorithm, that specifies how
the function is to be computed. In fact, the very word “function” is a reserved word in
some programming languages and has this meaning, possibly with additional technical
details.
By default, we use the term “function” in the mathematical sense, where there is no
code or algorithm given. Occasionally we may use the term “mathematical function” to
emphasise this, but even without that adjective, the term “function” will be used in this
way.
2.3𝛼 N O TAT i O N 45

If we wish to use the programming sense of the term, we will specify that explicitly,
as in “Python function” or “programming function” or “algorithmic function”.
In functional programming languages, algorithmic functions are the most funda-
mental objects used, and all computation is done by manipulating them. They can be
represented by variables and treated as both arguments and values of other functions.
We do not consider the functional programming paradigm in this unit or in FIT1045.
You can learn more about functional programming in FIT2102 Programming Paradigms.

2.3𝛼 N O TAT i O N

A function definition has the form

name ∶ domain ⟶ codomain,


name(𝑥) = precise description of the function value, in terms of 𝑥.

For example, a function 𝑓 that gives squares of real numbers can be defined by

𝑓 ∶ ℝ ⟶ ℝ,
𝑓(𝑥) = 𝑥2 .

Here, the domain is ℝ, and the codomain is ℝ too. Since we stated that our function
would give the squares of real numbers, we really have no alternative but to specify ℝ as
the domain. But we have some more flexibility with the codomain. We could have used
ℝ+0 , since that is the image of this function: the squares of real numbers are precisely
the nonnegative real numbers. We could, instead, have used ℝ ∪ {0, −√2, −42} as the
codomain, which is perfectly valid mathematically, although this codomain has some
extra detail that is irrelevant, useless, distracting, and shows poor user interface design!
The first line of these function definitions, such as 𝑓 ∶ ℝ ⟶ ℝ, is like a declaration in
a program. It announces the name of the function and specifies the types of objects that
it can take as its arguments and give as its values. The second line, such as 𝑓(𝑥) = 𝑥2 ,
completes the definition by specifying the rule. A common form of wording is to say
something like, “The function 𝑓 ∶ ℝ → ℝ is defined by 𝑓(𝑥) = 𝑥2 .”
There is another common convention for specifying the rule of a function, where we
just write

𝑥 ↦ precise description of the function value, in terms of 𝑥.

Note how the “mapping arrow” ↦ in the rule differs from the ordinary arrow → going
from domain to codomain. It is important not to mix the two arrow types up.
If we use this second convention, then our squaring function would be defined by

𝑓 ∶ ℝ ⟶ ℝ,
𝑥 ↦ 𝑥2 .
46 FUNCTiONS

2.4𝛼 SOME SPECiAL FUNCTiONS

The most trivial, vacuous, degenerate function of all is the empty function,
function denoted
∅. Its domain and codomain are each empty, and it has no rule because there is nothing
in the domain for any rule to apply to. It can be defined simply as ∅ ∶ ∅ → ∅. It’s pretty
useless; we may hope never to see it again! Let’s move on.
For any set 𝐴, the identity function 𝑖𝐴 on 𝐴 is defined by

𝑖𝐴 ∶ 𝐴 ⟶ 𝐴,
𝑖𝐴 (𝑥) = 𝑥.

This function just maps each member of 𝐴 to itself.3


For any domain 𝐷, we can define for every subset 𝐴 ⊆ 𝐷 the indicator function
𝜒𝐴 , which uses 1 and 0 to indicate, for each member of the domain, whether or not it is
in 𝐴:

𝜒𝐴 ∶ 𝐷 ⟶ {0, 1},
1, if 𝑥 ∈ 𝐴;
𝜒𝐴 (𝑥) = 
0, if 𝑥 ∉ 𝐴;

Although indicator function notation 𝜒𝐴 only mentions 𝐴, it must be kept in mind that
a function’s definition always includes a specification of its domain, which in this case
is 𝐷. Different domains give rise to different indicator functions.
We can express the indicator functions of sets obtained from set operations on 𝐴
and 𝐵 using the indicator functions of 𝐴 and 𝐵. For example, for all 𝑥 we have

𝜒𝐴∪𝐵 (𝑥) = max{𝜒𝐴 (𝑥), 𝜒𝐵 (𝑥)}.

See Exercise 2.
For any domain 𝐷 and any object 𝑎, the constant function 𝑐𝑎 just maps everything
to 𝑎:

𝑐𝑎 ∶ 𝐷 ⟶ {𝑎},
𝑐𝑎 (𝑥) = 𝑎.

2.5𝛼 F U N C T i O N S W i T H M U LT i P L E A R G U M E N T S A N D VA L U E S

Many functions you will meet have multiple arguments. If 𝑓 is a function of two argu-
ments 𝑥 and 𝑦, then we write its value as 𝑓(𝑥, 𝑦). Suppose 𝑥 ∈ 𝑋 and 𝑦 ∈ 𝑌, and that
the value of the function belongs to a codomain 𝐶. Then the function definition would
start by stating 𝑓 ∶ 𝑋 × 𝑌 → 𝐶.
3 So, in a sense, it does nothing. But at least it does nothing to something, whereas the empty function
does nothing to nothing!
2.6 R E S T R i C T i O N S 47

It may seem that we are extending our definition of functions here, since we seem
to have two domains, 𝑋 and 𝑌, for the first and second argument respectively. And no
real harm can come from this view. But functions of two arguments may also be viewed
functions of a single argument where that one argument happens to be a pair (𝑥, 𝑦) and
its domain is the Cartesian product 𝑋 ×𝑌. So, when we start a function definition with
𝑓 ∶ 𝑋 × 𝑌 → 𝐶, we are still just using our usual way of defining functions. When we
write 𝑓(𝑥, 𝑦), indicating two arguments, we are using a shorthand for 𝑓((𝑥, 𝑦)), where
the argument that we put inside 𝑓(⋯) is the ordered pair (𝑥, 𝑦). In accordance with
usual practice, we will drop the second pair of parentheses from 𝑓((𝑥, 𝑦)), writing 𝑓(𝑥, 𝑦)
instead, and we will happily speak of its first argument 𝑥, its second argument 𝑦, and
so on. But keep in the back of your mind that we can also view this as a function of
just one argument whose sole argument happens to be the ordered pair (𝑥, 𝑦).
When we write 𝑓(𝑥, 𝑦) for applying function 𝑓 to arguments 𝑥 and 𝑦, we are using
prefix notation, because we put the name of the function before the arguments. This
is common practice and is the one we use when defining new functions. But there are
also many well-known functions that use infix notation, where the function name is put
between its arguments. Familiar examples include ordinary arithmetic functions +, −,
×, / and many built-in operations in many programming languages. Far less common
is postfix notation, where the name is placed after the arguments.
All these remarks extend readily to functions of three or more arguments. For
example, a function of three arguments can also be regarded as a function of a single
argument which happens to be a triple.
We also sometimes want the value of a function to be a tuple. For example, suppose
the value of a function is to be a pair (𝑦, 𝑧) where 𝑦 ∈ 𝑌 and 𝑧 ∈ 𝑍. If the domain of
the function is 𝑋 , then we may write 𝑓 ∶ 𝑋 → 𝑌 × 𝑍. This function may be regarded as
giving two values, which we find it convenient to put in an ordered pair. We also view
it as giving a single value which happens to be the ordered pair (𝑦, 𝑧) ∈ 𝑌 × 𝑍.
Again, these comments extend readily to functions that return tuples of three or
more values.

2.6 RESTRiCTiONS

Sometimes we want to restrict a function to some subset of its domain, and to treat
this restricted version of the function as a new function in its own right. For example,
consider a function that assigns ID numbers to Monash students. Its domain is the set
of all Monash students. If we want to use a function which only considers FIT1058
students, and assigns ID numbers to them, then this function is a restriction of the
previous function just to the set of all FIT1058 students.
There are several reasons why we might want to focus just on the restriction of some
function.
• If the domain is significantly smaller than the original function, then storing the
restricted function as a list of ordered pairs takes up less space.
48 FUNCTiONS

• As we discussed on page 39 in § 2.1.1𝛼 , the domain of a function serves as a promise


to users of the function that anything in the domain is valid, for that function,
and is assigned a value by the function. As we noted, a bigger domain amounts to
a bigger promise which may require more work to keep. If a function is given an
algorithm and implemented as a program, then testing is required in order to be
sure that it works for all members of its domain. The bigger the domain is, the
more testing is required. So you may prefer to restrict the function to a subset
of its domain, focusing on the arguments you really care about and reducing the
work of testing and the size of the promise implied by the domain.

• Sometimes a function is simpler to describe, analyse or compute on some particular


subset of its domain that we most care about.

• Sometimes a restricted version of a function may have stronger properties that


enable it to be used in situations where the original function cannot be used.

Suppose we have a function 𝑓 ∶ 𝐴 → 𝐵 and that 𝑋 ⊆ 𝐴. The restriction of 𝑓 to 𝑋 ,


denoted by 𝑓|𝑋 or 𝑓 ↾𝑋 , is the function with domain 𝑋 , codomain 𝐵, and which agrees
with 𝑓 on 𝑋 :

𝑓|𝑋 ∶ 𝑋 → 𝐵,
𝑓|𝑋 (𝑥) = 𝑓(𝑥) for all 𝑥 ∈ 𝑋 .

For example, suppose 𝑓 is our squaring function from § 2.3𝛼 :

𝑓 ∶ ℝ ⟶ ℝ,
𝑓(𝑥) = 𝑥2 .

Then its restriction 𝑓|ℝ+ to the nonnegative real numbers is given by


0

𝑓|ℝ+ ∶ ℝ+
0 ⟶ ℝ,
0

𝑓(𝑥) = 𝑥2 .

This has some properties that the original function 𝑓 does not have. For example, 𝑓|ℝ+ is
0
continually increasing as 𝑥 increases across its domain, whereas 𝑓(𝑥) is decreasing along
+
some of its domain (specifically, along ℝ− 0 ) and increases elsewhere (along ℝ0 ), so its
behaviour is a bit more complicated. Each member 𝑦 in the image of 𝑓|ℝ+ comes from a
0
unique 𝑥 in the domain ℝ+ 0 , namely the positive square root √𝑦 of 𝑦 (or 0, in the case
𝑦 = 0). By contrast, each nonzero member 𝑦 of the image of 𝑓 comes from two different
values of 𝑥, namely the two square roots ±√𝑦 of 𝑦. This illustrates the point mentioned
above that a restriction can be simpler and have stronger properties, which can make it
more useful in some situations. (Later, in § 2.8, we discuss inverse functions. Then, we
can say that 𝑓|ℝ+ is invertible but 𝑓 is not.)
0
2.7 i N j E C T i O N S , S U R j E C T i O N S , B i j E C T i O N S 49

2.7 iNjECTiONS, SURjECTiONS, BijECTiONS

As we have seen from some of our examples, it is perfectly ok for different function
arguments to give the same value. So it is ok for both Annie Jump Cannon and Henrietta
Swan Leavitt to have the value Harvard College Observatory under the Employer function
(p. 42 in § 2.1.3𝛼 ). There is no requirement for there to be a unique argument for
each value. This contrasts with the requirement that there be a unique value for each
argument, which is an essential property of any function. This specific Employer function
only assigns one employer to each human computer.
Although it’s ok in general for different arguments to be mapped to the same value,
there are situations where we do not want that to happen. For example, a function that
assigns an ID number to each student must ensure that different students get different
ID numbers. A function that encrypts files must ensure that different files are encrypted
differently, else the contents of a file cannot be recovered from its encrypted form.
A function with this property, that different arguments are always mapped to dif-
ferent values, is said to be injective and is called an injection
injection. Mathematically, this
property of a function 𝑓 ∶ 𝐴 ⟶ 𝐵 can be expressed as follows: for any two distinct
𝑥1 , 𝑥2 ∈ 𝐴, we have 𝑓(𝑥1 ) ≠ 𝑓(𝑥2 ). Such a function gives a one-to-one correspondence
between the domain and the image, but not between the domain and the codomain in
general.
Injections have the virtue of preserving information: for every member 𝑦 in the
image of an injection 𝑓, there is a unique 𝑥 in its domain such that 𝑓(𝑥) = 𝑦. In every
case, knowing 𝑦 is logically sufficient for determining 𝑥 (although we are not saying
anything here about how much work it might be to recover 𝑥; that depends on the
details of the function). If a function is not an injection, then there must be at least one
member 𝑦 of its image such that there are two or more members 𝑥1 , 𝑥2 of its domain
which map to that image: 𝑓(𝑥1 ) = 𝑓(𝑥2 ) = 𝑦. So, in that case, knowing 𝑦 still leaves you
in doubt as to how it could have been produced by 𝑓.
Functions that aren’t injections lose information. We might call them lossy lossy. In
fact, this term is used for data compression functions that are not injections. By contrast,
an injective data compression function is called lossless
lossless.
Similarly, we can express mathematically the onto property of a function, which we
defined in § 2.1.2𝛼 . A function 𝑓 ∶ 𝐴 ⟶ 𝐵 is surjective
surjective, and is said to be a surjection
surjection,
if for any value 𝑦 ∈ 𝐵 there must be a value of the function’s argument 𝑥 ∈ 𝐴 such that
𝑦 = 𝑓(𝑥).
A function that is both an injection and a surjection is said to be bijective and
is called a bijection
bijection. Such a function is a one-to-one correspondence between the
domain and the codomain (which is also the image in this case).
A bijection whose domain and codomain are the same set is also called a permutation
permutation,
though the latter term is usually used only for bijections on finite sets.
Bijections preserve information, since they are injections. Furthermore, since they
are also surjections, each member of the codomain may be thought of as encoding a
50 FUNCTiONS

unique member of the domain. So a bijection establishes that the domain and codomain
contain the same information, although it may be represented in different ways.
If a function has finite domain and codomain of the same size, and is an injection,
then it must also be a surjection, and hence also a bijection. This is because there is no
room in the codomain for the injection to avoid mapping to all its members. Similarly, a
surjection with finite domain and codomain of the same size must also be an injection,
and hence a bijection, since the need to map to all members of the codomain prevents any
repetition of codomain elements. Both these assertions fail if the domain and codomain
are infinite. Can you find examples to illustrate the failure in each case?

2.8 iNVERSE FUNCTiONS

We think of a function as going from any member of its domain to its corresponding
value in the codomain. But there are times when we may want to go backwards: given a
value in the codomain, what argument in the domain does it come from? For example,
which computer worked at Harvard College Observatory? Which number, when squared,
gives 4? Which file corresponds to a particular encrypted file?
For functions in general, the answer may not be unique. We have just mentioned
some cases of this: our Employer function mapped two different computers to Harvard
College Observatory, and 22 = (−2)2 = 4.
This failure of uniqueness can happen in either of two different ways. Let 𝑓 ∶ 𝐴 → 𝐵
be a function. For a given value 𝑦 in the codomain:

• There may be more than one argument that gives the value 𝑦 under the function.
So we may have 𝑥1 , 𝑥2 ∈ 𝐴 such that 𝑥1 ≠ 𝑥2 but 𝑓(𝑥1 ) = 𝑓(𝑥2 ) = 𝑦.

• There may be no argument that gives the value 𝑦. This happens when the image is
a proper subset of the codomain and 𝑦 lies in the codomain but not in the image.

But if neither of these occurs, then every 𝑦 ∈ 𝐵 has a unique 𝑥 ∈ 𝐴 such that 𝑓(𝑥) = 𝑦.
This means that, in giving each 𝑦 ∈ 𝐵 a corresponding 𝑥 ∈ 𝐴, we are actually defining a
function from 𝐵 to 𝐴, with domain 𝐵 and codomain 𝐴. So the roles played by 𝐴 and
𝐵 are reversed, in keeping with the reversed “direction” of this new function. We call
this new function the inverse function of 𝑓 and denote it by 𝑓 −1 . We can write its
definition as follows.

𝑓 −1 ∶ 𝐵 ⟶ 𝐴,
𝑓 −1 (𝑦) = the unique 𝑥 such that 𝑓(𝑥) = 𝑦.

If we want to write the rule of 𝑓 −1 as a set of ordered pairs, then we just take all the
ordered pairs in 𝑓 and reverse them:

{ (𝑦, 𝑥) ∶ (𝑥, 𝑦) belongs to 𝑓 }


2.9 C O M P O S i T i O N 51

or, to put it slightly differently,

{ (𝑓(𝑥), 𝑥) ∶ 𝑥 ∈ 𝐴 }.

For a function to have an inverse function, it must be an injection (so that no value has
two corresponding arguments) and it must also be a surjection (so that every value in
the codomain is also in the image, i.e., has a corresponding argument). So, in fact, a
function has an inverse function if and only if it is a bijection.
Our Employer function is not a bijection and therefore does not have an inverse
function. The squaring function also does not have an inverse function, for the same
reason. But we would want an encryption function to have an inverse function, so that
an intended user of an encrypted file has no doubt about its contents. (In that context,
there is the separate issue of how easy or hard it should be to actually compute the
inverse. We would like that to be easy for intended users and hard for others. Achieving
these competing aims is the fundamental challenge of cryptography.)
The role of the inverse function is to “undo” the function and get back what you
started with. So, if you have 𝑥 ∈ 𝐴 and apply 𝑓 to get 𝑦 = 𝑓(𝑥), then it does not matter
much if you “lose” 𝑥, because you can recover it from 𝑦:

𝑥 = 𝑓 −1 (𝑦) = 𝑓 −1 (𝑓(𝑥)).

Since the original function 𝑓 is a bijection, its inverse 𝑓 −1 is also a bijection. It


follows that 𝑓 −1 also has an inverse, which we could write as (𝑓 −1 )−1 . But the inverse
of the inverse is just the original function; if 𝑓 −1 “undoes” 𝑓, then 𝑓 also “undoes” 𝑓 −1 .
We have
𝑦 = 𝑓(𝑥) = 𝑓(𝑓 −1 (𝑦)).

2.9 COMPOSiTiON

It is common to use values obtained from one function as arguments to another function.
For example, consider the functions Father and Mother which each have, as their domain
and codomain, the set ℙ of all people who have ever lived.

Father ∶ ℙ ⟶ ℙ,
Father(𝑝) = the father of person 𝑝.
Mother ∶ ℙ ⟶ ℙ,
Mother(𝑝) = the mother of person 𝑝.
52 FUNCTiONS

Starting with Alan Turing, the function Mother gives Alan’s mother, Sara Turing. Ap-
plying the function Father to her gives Sara’s father — Alan’s maternal grandfather —
Edward Stoney.

Mother(Alan Turing) = Sara Turing.


Father(Sara Turing) = Edward Stoney.

This “chaining” of the two functions is called composition, and is denoted by stating the
second function, followed by the composition symbol ∘, followed by the first function.
(Note the order there.) In this case, the function is Father ∘ Mother and it is defined as
follows.

Father ∘ Mother ∶ ℙ ⟶ ℙ,
Father ∘ Mother(𝑝) = Father(Mother(𝑝)).

Note now the order in which the functions are written in Father ∘ Mother is the same as
the order in which they are written when we write one function as an argument of the
other, i.e., in Father(Mother(𝑝)). But our usual order of reading and writing (left to
right) is the reverse of the order of application (right to left): we apply the function
Mother first, and then we apply the function Father.
In this example, the codomain of the first function applied (Mother) equals the
domain of the second function applied (Father); both are ℙ. This ensures that our
first function application, using Mother, always produces something (or someone) that
our second function, Father, can deal with. This is a general requirement for function
composition.
In general, the composition 𝑔 ∘𝑓 of two functions 𝑓 ∶ 𝐴 → 𝐵 and 𝑔 ∶ 𝐵 → 𝐶 is defined
by

𝑔 ∘ 𝑓 ∶ 𝐴 ⟶ 𝐶,
𝑔 ∘ 𝑓(𝑥) = 𝑔(𝑓(𝑥)).

Note that, in this composition, 𝑓 is applied first, and then the result is given to 𝑔. The
order of doing things does matter; in general, function composition is not commutative,
meaning that 𝑔 ∘ 𝑓 and 𝑓 ∘ 𝑔 are not the same. It is, however, associative: if 𝑓 ∶ 𝐴 → 𝐵,
𝑔 ∶ 𝐵 → 𝐶 and ℎ ∶ 𝐶 → 𝐷, then

ℎ ∘ (𝑔 ∘ 𝑓) = (ℎ ∘ 𝑔) ∘ 𝑓,

so we can write ℎ ∘ 𝑔 ∘ 𝑓 unambiguously.


Function composition is undefined if the codomain of the first function applied does
not equal the domain of the second function appied. For example, suppose 𝕊 is the set
2.9 C O M P O S i T i O N 53

of all students, 𝕌 is the set of all Monash units, and the function FavouriteUnit is defined
as follows.

FavouriteUnit ∶ 𝕊 ⟶ 𝕌,
FavouriteUnit(𝑝) = the favourite Monash unit of person 𝑝.

If you want to know your mother’s favourite unit at Monash, you might be tempted to
use the function FavouriteUnit ∘ Mother. But not all mothers are students, and not all
students are mothers. Formally, the codomain of Mother is ℙ, which does not equal the
domain of FavouriteUnit, namely 𝕊. So this function composition is undefined.
The requirement that the codomain of the first function 𝑓 equals the domain of the
second function 𝑔 amounts to insisting that our guarantee about what 𝑓 can produce
(expressed in the form of its codomain) is the same as our guarantee about what 𝑔 can
handle (expressed in the form of its domain). So the two functions are compatible, in a
precise sense.
You have seen composition before for mathematical functions. For example, the
expression (𝑥 − 1)2 may be regarded as the rule for the composition 𝑔 ∘ 𝑓 of the two
functions

𝑓 ∶ ℝ ⟶ ℝ,
𝑓(𝑥) = 𝑥 − 1.
𝑔 ∶ ℝ ⟶ ℝ+
0,
𝑔(𝑥) = 𝑥2 .

We saw a very special case of composition in § 2.8. If 𝑓 ∶ 𝐴 → 𝐴 is a bijection, then

𝑓 ∘ 𝑓 −1 = 𝑖𝐴 ,
𝑓 −1 ∘ 𝑓 = 𝑖𝐴 .

If 𝑔 ∶ 𝐶 → 𝐷 is any function, then composing it with the identity function makes no


difference, provided we pick the correct identity:

𝑔 ∘ 𝑖𝐶 = 𝑔,
𝑖𝐷 ∘ 𝑔 = 𝑔.

We can compose a function with itself provided its domain and codomain are the
same. If 𝑓 ∶ 𝐴 → 𝐴 then the definition of composition tells us that 𝑓 ∘𝑓 ∶ 𝐴 → 𝐴 is defined
for all 𝑥 ∈ 𝐴 by 𝑓 ∘ 𝑓(𝑥) = 𝑓(𝑓(𝑥)). We can then do iterated composition of 𝑓 with itself,
if we wish. We write 𝑓 (𝑛) for the composition of 𝑛 copies of 𝑓:

𝑓 (𝑛) = 𝑓
∘𝑓∘⋯∘𝑓
𝑛 copies of 𝑓
54 FUNCTiONS

We introduce three iterated compositions: one from a practical application, one


party trick, and one profound unsolved theoretical question.

• Consider the function LCG ∶ [0, 231 −1]ℤ → [0, 231 −1]ℤ defined for all 31-bit nonneg-
ative binary integers 𝑥 by

LCG(𝑥) = the last 31 bits of 1103515245𝑥 + 12345. (2.1)

We always keep only the last 31 bits, to ensure that the numbers generated stay
within our fixed interval. This function has been used to generate sequences of
numbers that are pseudorandom in the sense that, superficially, they look random
if you don’t look too closely. Starting with some initial “seed” number 𝑠, the
function LCG is applied repeatedly, and the successive numbers LCG(𝑛) (𝑠) should
behave in a way that looks statistically random in some sense. (We have used this
example as it is one of the simpler pseudorandom number generators that have
been used in practice, but its randomness properties are imperfect and it should
not be used by itself in this naive way. It is usually used in conjunction with other
methods in order to increase the randomness.) The name LCG comes from the
term Linear Congruential Generator, which is a type of pseudorandom number
generator of which this is one example.

• Consider the function 𝐾 ∶ [0, 9999]ℤ → [0, 9999]ℤ defined for any nonnegative integer
𝑥 with at most four (decimal) digits as follows. First, form a four-digit number
by writing the four digits of 𝑥 (using leading zeros if necessary) from smallest
to largest. Then reverse that number, so that the digits now go from largest to
smallest. Then 𝐾(𝑥) is defined to be the difference between these two numbers.
For example,
𝐾(1729) = 9721 − 1279 = 8442.
This function is not an injection (why?). It is clear that, if all the digits in 𝑥 are the
same, then 𝐾(𝑥) = 0, and therefore 𝐾 (𝑛) (𝑥) = 0 for all 𝑛 ≥ 1. More surprisingly, in
all other cases (i.e., when 𝑥 has at least two different digits), iterated composition
of this function with itself eventually reaches 6174, and does so after at most seven
iterations. For example, starting with 1729 as above, we have
𝐾 𝐾 𝐾 𝐾 𝐾 𝐾 𝐾 𝐾
1729 ⟼ 8442 ⟼ 5994 ⟼ 5355 ⟼ 1998 ⟼ 8082 ⟼ 8532 ⟼ 6174 ⟼ 6174.

So 𝐾 (7) (𝑥) is 0 if all digits are the same and 6174 otherwise. This was discovered
by the Indian mathematician D. R. Kaprekar in 1946 and published in 1955.4

4 D. R. Kaprekar, An interesting property of the number 6174, Scripta Mathematica 21 (1955) 304. Martin
Gardner, Mathematical Games, Scientific American (March 1975).
2.9 C O M P O S i T i O N 55

• Consider the function Collatz ∶ ℕ → ℕ defined for all 𝑥 ∈ ℕ by

3𝑥 + 1, if 𝑥 is odd;
Collatz(𝑥) = 
𝑥/2, if 𝑥 is even.

For example, if we start with 7 and keep iterating, we obtain

7 ↦ 22 ↦ 11 ↦ 34 ↦ 17 ↦ 52 ↦ 26 ↦ 13 ↦ 40 ↦ 20 ↦ 10 ↦ 5 ↦ 16 ↦ 8 ↦ 4 ↦ 2 ↦ 1 ↦ 4 ↦ 2 ↦ 1 ↦ ⋯

Note how it eventually gets stuck in a loop, going from 4 to 2 to 1, then back to
4, and so on.
Iterated composition of this function is mysterious. It is conjectured that, for
every 𝑥, there exists 𝑛 such that Collatz(𝑛) (𝑥) = 1. This has become known as
Collatz’s Conjecture or the 3𝑥 + 1 problem.
problem Currently it is unsolved. It is a
remarkable illustration that even simple questions about very simple algorithms
can be very deep and hard to answer.

We now consider how the injective, surjective and bijective properties are affected
by composition.
We start with injection.

Theorem 6.
6 If 𝑓 ∶ 𝐴 → 𝐵 and 𝑔 ∶ 𝐵 → 𝐶 are injections then 𝑔 ∘ 𝑓 is also an injection.

Proof. We prove this by proving the equivalent statement that

if 𝑔 ∘ 𝑓 is not an injection, then 𝑓 and 𝑔 are not both injections.

This is an example of the general logical principle that “𝐴 implies 𝐵” is equivalent to


“not-𝐵” implies not-𝐴”. For example, the statement that “every rectangle has four sides”
is equivalent to the statement that “everything that does not have four sides is not a
rectangle”. See (1.10).
Suppose that 𝑔 ∘𝑓 is not an injection. Then there must exist 𝑎, 𝑏 ∈ 𝐴 such that 𝑎 ≠ 𝑏
and 𝑔 ∘ 𝑓(𝑎) = 𝑔 ∘ 𝑓(𝑏).

𝑎 𝑔 ∘𝑓

𝑔 ∘ 𝑓(𝑎) = 𝑔 ∘ 𝑓(𝑏)
𝑔 ∘𝑓
𝑏

Consider 𝑓(𝑎) and 𝑓(𝑏). Either they are equal or unequal.

• If 𝑓(𝑎) = 𝑓(𝑏), then we have 𝑎 ≠ 𝑏 and 𝑓(𝑎) = 𝑓(𝑏), so 𝑓 is not an injection.


56 FUNCTiONS

𝑎 𝑓
𝑔
𝑓(𝑎) = 𝑓(𝑏) 𝑔 ∘ 𝑓(𝑎) = 𝑔 ∘ 𝑓(𝑏)
𝑓
𝑏

• If 𝑓(𝑎) ≠ 𝑓(𝑏), then we have 𝑓(𝑎) ≠ 𝑓(𝑏) and 𝑔(𝑓(𝑎)) = 𝑔(𝑓(𝑏)), so 𝑔 is not an
injection (since we have two distinct members of its domain, namely 𝑓(𝑎) and
𝑓(𝑏), that are mapped by 𝑔 to the same value).

𝑓
𝑎 𝑓(𝑎) 𝑔

𝑔 ∘ 𝑓(𝑎) = 𝑔 ∘ 𝑓(𝑏)
𝑔
𝑓
𝑏 𝑓(𝑏)

So we see that, whatever happens with 𝑓(𝑎) and 𝑓(𝑏), at least one of 𝑓 and 𝑔 is not an
injection.

The converse of this theorem does not hold: 𝑔 ∘ 𝑓 being an injection does not imply
that both 𝑓 and 𝑔 are injections. For example, define 𝑓 ∶ {1, 2} → {1, 2, 3} by

𝑓(1) = 1,
𝑓(2) = 2,

and define 𝑔 ∶ {1, 2, 3} → {1, 2} by

𝑔(1) = 1,
𝑔(2) = 2,
𝑔(3) = 2.

Then their composition 𝑔 ∘ 𝑓 ∶ {1, 2} → {1, 2} is defined by

𝑔 ∘ 𝑓(1) = 𝑔(𝑓(1)) = 𝑔(1) = 1,


𝑔 ∘ 𝑓(2) = 𝑔(𝑓(2)) = 𝑔(2) = 2.

So 𝑔 ∘ 𝑓 is an injection, but it is not the case that both 𝑓 and 𝑔 are injections. In fact,
𝑓 is an injection, but 𝑔 is not.
Now let’s consider surjections.

Theorem 7.
7 If 𝑓 ∶ 𝐴 → 𝐵 and 𝑔 ∶ 𝐵 → 𝐶 are surjections then 𝑔 ∘ 𝑓 is also a surjection.
2.9 C O M P O S i T i O N 57

Proof. Suppose 𝑓 and 𝑔 are surjections. Consider 𝑔 ∘ 𝑓 ∶ 𝐴 → 𝐶.


Let 𝑐 ∈ 𝐶 be any member of its codomain 𝐶.
Since 𝑔 is a surjection, there must exist 𝑏 ∈ 𝐵 such that 𝑔(𝑏) = 𝑐.
Since 𝑓 is a surjection, there must exist 𝑎 ∈ 𝐴 such that 𝑓(𝑎) = 𝑏.
But then we have
𝑔 ∘ 𝑓(𝑎) = 𝑔(𝑓(𝑎)) = 𝑔(𝑏) = 𝑐.
So, every member of the codomain of 𝑔 ∘ 𝑓 also belongs to its image. Therefore 𝑔 ∘ 𝑓
is a surjection.

For Theorem 7, too, the converse does not hold. In fact, the same 𝑓 and 𝑔 we gave
above, after the proof of Theorem 6 and before stating Theorem 7, shows this here too.
In that example, 𝑔 ∘𝑓 is a surjection, but 𝑓 is not a surjection (although 𝑔 is a surjection).
Theorem 6 and Theorem 7 together give a similar statement for bijections.

Theorem 8.
8 If 𝑓 ∶ 𝐴 → 𝐵 and 𝑔 ∶ 𝐵 → 𝐶 are bijections then 𝑔 ∘ 𝑓 is also a bijection.

Proof. Suppose 𝑓 ∶ 𝐴 → 𝐵 and 𝑔 ∶ 𝐵 → 𝐶 are bijections. Then, by definition, they are


both injections and they are both surjections.
Since they are both injections, their composition is also an injection by Theorem 6.
Since 𝑓 and 𝑔 are both surjections, their composition is also a surjection by Theo-
rem 6.
So 𝑔 ∘ 𝑓 is both an injection and a surjection. Therefore, by definition, it is a
bijection.

Once again, the converse does not hold in general, and once again our little functions
𝑓 and 𝑔 show this, since 𝑔 ∘ 𝑓 is a bijection but neither 𝑓 nor 𝑔 is a bijection.
There is one important situation where the converse does hold as well.

Theorem 9. 9 Let 𝑓 ∶ 𝐴 → 𝐵 and 𝑔 ∶ 𝐵 → 𝐶 be functions where 𝐴, 𝐵, 𝐶 are finite sets of


the same size. Then 𝑔 ∘ 𝑓 is a bijection if and only if both 𝑓 and 𝑔 are bijections.

Proof. The domains and codomains of 𝑓, 𝑔 and 𝑔 ∘ 𝑓 are finite and of the same size, as
stated. So each of them is a bijection if and only if it is an injection, by our remarks at
the end of § 2.7. So it is enough to prove that

𝑔 ∘ 𝑓 is an injection if and only if both 𝑓 and 𝑔 are injections.

We have already seen that, if 𝑓 and 𝑔 are injections, then so is 𝑔 ∘ 𝑓 (Theorem 6). So it
remains to prove that

if 𝑔 ∘ 𝑓 is an injection then both 𝑓 and 𝑔 are injections.

This we do now, by proving the equivalent statement that

if at least one of 𝑓 and 𝑔 is not an injection then 𝑔 ∘𝑓 is not an injection.


58 FUNCTiONS

Our starting assumption here, that at least one of 𝑓 and 𝑔 is not an injection, divides
naturally into two cases: (i) 𝑓 is not an injection, and (ii) 𝑔 is not an injection. These
two cases overlap, which is ok.

Case (i):
If 𝑓 is not an injection, then by definition there exist distinct 𝑎, 𝑏 ∈ 𝐴 such that
𝑓(𝑎) = 𝑓(𝑏). Then 𝑔(𝑓(𝑎)) = 𝑔(𝑓(𝑏)). So in fact our distinct 𝑎, 𝑏 also give 𝑔∘𝑓(𝑎) = 𝑔∘𝑓(𝑏),
so 𝑔 ∘ 𝑓 is not an injection.

𝑎 𝑓
𝑔
𝑓(𝑎) = 𝑓(𝑏) 𝑔 ∘ 𝑓(𝑎) = 𝑔 ∘ 𝑓(𝑏)
𝑓
𝑏

Case (ii):
It remains to consider the possibility that 𝑔 is not an injection. Within this case, we
can restrict to cases where 𝑓 is an injection, since we have just dealt with the possibility
that 𝑓 is not an injection. (Effectively, we are ignoring the overlap between the two
cases, since that overlap is covered by Case (i).)
Suppose then that 𝑓 is an injection. Since its domain and codomain are finite and
have the same size, this means it is also a bijection, and therefore has an inverse.
If 𝑔 is not an injection, then by definition there exist distinct 𝑐, 𝑑 ∈ 𝐵 such that
𝑔(𝑐) = 𝑔(𝑑). Now because 𝑓 is a bijection, its inverse 𝑓 −1 is defined, has the same
domain 𝐵, and is also a bijection. So 𝑓 −1 (𝑐) and 𝑓 −1 (𝑑) are both defined and must
be distinct since 𝑐 ≠ 𝑑. Furthermore, 𝑐 = 𝑓(𝑓 −1 (𝑐)) and 𝑑 = 𝑓(𝑓 −1 (𝑑)). So 𝑔(𝑐) = 𝑔(𝑑)
implies 𝑔(𝑓(𝑓 −1 (𝑐))) = 𝑔(𝑓(𝑓 −1 (𝑑))), which may be rewritten

𝑔 ∘ 𝑓(𝑓 −1 (𝑐)) = 𝑔 ∘ 𝑓(𝑓 −1 (𝑑)).

But 𝑓 −1 (𝑐) ≠ 𝑓 −1 (𝑑), so we have two distinct members of 𝐴 which are mapped to the
same thing by 𝑔 ∘ 𝑓. So 𝑔 ∘ 𝑓 is not an injection.

𝑓
𝑓 −1 (𝑐) 𝑐 𝑔

𝑔(𝑐) = 𝑔(𝑑)
𝑔
𝑓
𝑓 −1 (𝑑) 𝑑

Summarising, we have found that if either of 𝑓, 𝑔 is not an injection then 𝑔 ∘ 𝑓 is not


an injection either. Therefore 𝑔 ∘ 𝑓 is not a bijection.

The restriction to finite sets is essential. Consider the following example.


2.10 C RY P T O S Y S T E M S 59

Define 𝑓 ∶ ℕ → ℕ by 𝑓(𝑛) = 2𝑛, and define 𝑔 ∶ ℕ → ℕ by 𝑓(𝑛) = ⌊𝑛/2⌋. Neither of


these is a bijection: 𝑓 is an injection but not a surjection, and 𝑔 is a surjection but not
an injection. Yet their composition 𝑔 ∘𝑓 is a bijection, and in fact is the identity function
on ℕ:
𝑔 ∘ 𝑓(𝑛) = 𝑔(𝑓(𝑛)) = ⌊(2𝑛)/2⌋ = ⌊𝑛⌋ = 𝑛.

Now that we can compose two functions, it is natural to ask about the inverse of the
composition. This turns out to be the reverse composition of their inverses. This aligns
with everyday experience of doing and undoing sequences of tasks: if we wrap up a gift
in multiple layers of wrapping paper, then the recipient unwraps the layers in reverse
order.

10 If 𝑓 ∶ 𝐴 → 𝐵 and 𝑔 ∶ 𝐵 → 𝐶 are bijections, then (𝑔 ∘ 𝑓)−1 = 𝑓 −1 ∘ 𝑔 −1


Theorem 10.

Proof. Let 𝑓 ∶ 𝐴 → 𝐵 and 𝑔 ∶ 𝐵 → 𝐶 be bijections. By Theorem 8, their composition


𝑔 ∘ 𝑓 ∶ 𝐴 → 𝐶 is also a bijection, so its inverse (𝑔 ∘ 𝑓)−1 ∶ 𝐶 → 𝐴 exists. Now consider
𝑓 −1 ∘ 𝑔 −1 . This exists, since the inverses 𝑓 −1 and 𝑔 −1 exist, since 𝑓 and 𝑔 are bijections.
We check the two conditions that 𝑓 −1 ∘ 𝑔 −1 must satisfy in order to be the inverse of
𝑔 ∘ 𝑓.

(𝑔 ∘ 𝑓) ∘ (𝑓 −1 ∘ 𝑔 −1 ) = 𝑔 ∘ (𝑓 ∘ 𝑓 −1 ) ∘ 𝑔 −1 (since function composition is associative)


−1
= 𝑔 ∘ 𝑖𝐵 ∘ 𝑔
= 𝑔 ∘ 𝑔 −1
= 𝑖𝐶 .

Similarly,

(𝑓 −1 ∘ 𝑔 −1 ) ∘ (𝑔 ∘ 𝑓) = 𝑓 −1 ∘ (𝑔 −1 ∘ 𝑔) ∘ 𝑓 (again using associativity)


−1
= 𝑓 ∘ 𝑖𝐵 ∘ 𝑓
= 𝑓 −1 ∘ 𝑓
= 𝑖𝐴 .

So 𝑓 −1 ∘ 𝑔 −1 is indeed the inverse of 𝑔 ∘ 𝑓.

2.10 C RY P T O S Y S T E M S

One important application of function composition is to cryptosystems.


Suppose we have

• a message space,
space which is a finite set 𝑀 of possible messages (in the form of
strings over some alphabet),
60 FUNCTiONS

• a cypher space,
space which is a finite set 𝐶 of strings, which we call cyphertexts
cyphertexts,
and

• a keyspace
keyspace, which is a finite set 𝐾 whose members we call keys
keys.

An encryption function is a function 𝑒 ∶ 𝑀 × 𝐾 → 𝐶, and a decryption function is


a function 𝑑 ∶ 𝐶 × 𝐾 → 𝑀 . These are functions of two arguments. Each also gives us a
family of single-argument functions. For each key 𝑘 ∈ 𝐾, define the function 𝑒𝑘 ∶ 𝑀 → 𝐶
by 𝑒𝑘 (𝑚) = 𝑒(𝑚, 𝑘), and define the function 𝑑𝑘 ∶ 𝐶 → 𝑀 by 𝑑𝑘 (𝑐) = 𝑑(𝑐, 𝑘). These are the
functions of one argument that you get from 𝑒 and 𝑑 by fixing their second argument.
So, for a given key 𝑘 ∈ 𝐾, the function 𝑒𝑘 encrypts messages just with that one key,
while the function 𝑑𝑘 decrypts just with that one key. We want decryption to undo
encryption, provided the same key is used by each.
We say that (𝑀 , 𝐶, 𝐾, 𝑒, 𝑑) is a cryptosystem if, for all 𝑘,

(i) 𝑒𝑘 and 𝑑𝑘 are bijections, and

(ii) 𝑑𝑘 = 𝑒𝑘−1 .

In practice we will also want some conditions on how easy or hard it is to compute these
functions or even to obtain partial information from them.
For convenience, we restrict ourselves to cryptosystems where the cypherspace and
message space are the same, i.e., 𝑀 = 𝐶. (Most real cryptosystems either have this
property or can easily be modified so that they do.)
Suppose we have two cryptosystems with the same message spaces but with keyspaces
and encryption/decryption maps that may be different. Call them 𝒞 = (𝑀 , 𝑀 , 𝐾, 𝑒, 𝑑)
and 𝒞 ′ = (𝑀 , 𝑀 , 𝐾 ′ , 𝑒′ , 𝑑 ′ ). We would like to compose them to make a more complex
cryptosystem. For encryption, we want to first encrypt with 𝑒 and then encrypt further
with 𝑒′ . This is shown in Figure 2.2. For decryption, we want to do the reverse: decrypt
using 𝑑 ′ , then decrypt further with 𝑑.
But our definition of function composition (§ 2.9) only applies to functions of one
argument. So we need to extend this definition for our encryption and decryption
functions.
The keyed composition of encryption functions 𝑒 ∶ 𝑀 ×𝐾 → 𝑀 and 𝑒′ ∶ 𝑀 ×𝐾 ′ → 𝑀
is the function 𝑒′ • 𝑒 ∶ 𝑀 × (𝐾 × 𝐾 ′ ) → 𝑀 defined for all 𝑚 ∈ 𝑀 and (𝑘, 𝑘 ′ ) ∈ 𝐾 × 𝐾 ′ by

(𝑒′ • 𝑒)(𝑚, (𝑘, 𝑘 ′ )) = 𝑒′ (𝑒(𝑚, 𝑘), 𝑘 ′ ). (2.2)

Similarly, the keyed composition of decryption functions 𝑑 ′ ∶ 𝑀 × 𝐾 ′ → 𝑀 and 𝑑 ∶


𝑀 × 𝐾 → 𝑀 is the function 𝑑 • 𝑑 ′ ∶ 𝑀 × (𝐾 × 𝐾 ′ ) → 𝑀 defined for all 𝑚 ∈ 𝑀 and
(𝑘, 𝑘 ′ ) ∈ 𝐾 × 𝐾 ′ by
(𝑑 • 𝑑 ′ )(𝑚, (𝑘, 𝑘 ′ )) = 𝑑(𝑑 ′ (𝑚, 𝑘 ′ ), 𝑘). (2.3)
Then the composition 𝒞 ′ ∘ 𝒞 of the cryptosystems 𝒞 and 𝒞 ′ is (𝑀 , 𝑀 , 𝐾 × 𝐾 ′ , 𝑒′ •
𝑒, 𝑑 • 𝑑 ′ ).
2.10 C RY P T O S Y S T E M S 61

message for 𝑒

key for 𝑒: 𝑘 𝑒

cyphertext from 𝑒
becomes
message for 𝑒′

key for 𝑒′ : 𝑘′ 𝑒′

cyphertext from 𝑒′

Figure 2.2: Composition of encryption functions

Observe that, in the composition 𝒞 ′ ∘ 𝒞, the encryption function 𝑒′ • 𝑒 gives rise to a


family of single-argument functions, as follows. For each key pair (𝑘, 𝑘 ′ ), the function
𝑒(𝑘,𝑘′ ) ∶ 𝑀 → 𝑀 is defined for each 𝑚 ∈ 𝑀 by

𝑒(𝑘,𝑘′ ) (𝑚) = (𝑒′ • 𝑒)(𝑚, (𝑘, 𝑘 ′ )).

But, by (2.2), this is just 𝑒′ (𝑒(𝑚, 𝑘), 𝑘 ′ ). And we can express this in terms of composition
of our single-argument encryption functions:

𝑒′ (𝑒(𝑚, 𝑘), 𝑘 ′ ) = 𝑒𝑘′ ′ (𝑒𝑘 (𝑚))

Therefore each single-argument encryption function 𝑒(𝑘,𝑘′ ) in the composition 𝒞 ′ ∘𝒞 of the


two cryptosystems is itself just the composition of the two single-argument encryption
functions:
𝑒(𝑘,𝑘′ ) = 𝑒′ ∘ 𝑒.
Similarly, the decryption function 𝑑 • 𝑑 ′ yields the family of single-argument functions
𝑑(𝑘,𝑘′ ) ∶ 𝑀 → 𝑀 , defined for each 𝑚 ∈ 𝑀 by

𝑑(𝑘,𝑘′ ) (𝑚) = (𝑑 • 𝑑 ′ )(𝑚, (𝑘, 𝑘 ′ )) = 𝑑(𝑑 ′ (𝑚, 𝑘 ′ ), 𝑘) = 𝑑𝑘 (𝑑𝑘′ ′ (𝑚)),


62 FUNCTiONS

and
𝑑(𝑘,𝑘′ ) = 𝑑 ∘ 𝑑 ′ .

Theorem 11.
11 The composition of two cryptosystems is also a cryptosystem.

Proof. We need to prove that, for each key pair (𝑘, 𝑘 ′ ), the encryption function 𝑒(𝑘,𝑘′ )
and the decryption function 𝑑(𝑘,𝑘′ ) are both bijections, and that the latter is the inverse
of the former.
The fact that they are bijections follows from the fact that 𝑒𝑘 , 𝑒𝑘′ ′ , 𝑑𝑘 , 𝑑𝑘′ ′ are all bijec-
tions for all 𝑘 and 𝑘′ (because 𝒞 and 𝒞 ′ are both cryptosystems) and Theorem 8.
The fact that 𝑑(𝑘,𝑘′ ) = (𝑒(𝑘,𝑘′ ) )−1 follows from Theorem 10.

In a good cryptosystem, 𝑒𝑘 and 𝑑𝑘 should be easy to compute if the key 𝑘 is known,


otherwise the intended users will find it hard to use. But, in order for it to be secure,
the encryption function should be hard to invert without the key: if all an eavesdropper
knows is the encrypted text 𝑐 ∈ 𝐶, it should be hard for them to recover either the
original message 𝑚 ∈ 𝑀 or the secret key 𝑘 ∈ 𝐾.
Composition can be used to make decryption even harder for an eavesdropper. If
each of 𝑒 and 𝑒′ is hard to invert, it is reasonable to hope that 𝑒′ ∘𝑒 might be even harder
to invert, since an eavesdropper now has to undo the work of both encryption functions,
not just one of them. There are many cryptosystems where this is indeed the case. But
care is needed, because there are also systems where composition does not give any extra
security.

2.11𝜔 LOOSE COMPOSiTiON

We might call the above type of function composition tight composition,


composition because it
requires an “exact fit” between the codomain of the first function to be applied and
the domain of the second. There is also a looser form of function composition, which
we’ll call “loose composition”, which is always defined, regardless of the domains and
codomains of the functions, although in the worst case it might turn out to be the empty
function.
Let us try again to define a function based on applying Mother followed by FavouriteU-
nit, which we considered in § 2.9. We’d like a function that equals FavouriteUnit(Mother(𝑝))
whenever this makes sense. We saw in § 2.9 that the composition FavouriteUnit ∘ Mother
is undefined. But we will define a new form of composition, called loose composition
and denoted by ∘ (a larger, “looser” version of the symbol ∘), under which FavouriteUnit ∘
Mother is defined.
To properly define this loose composition, we must specify its domain and codomain.
The codomain is easy, since any student’s favourite unit will always belong to 𝕌 (al-
though the image would be more complicated). But what should we use for this func-
tion’s domain? Not all mothers are students, so not every person 𝑝 ∈ ℙ gives a valid
FavouriteUnit(Mother(𝑝)).
2.12 C O U N T i N G F U N C T i O N S 63

The domain of a loose composition is the subset of the domain of the first function
(i.e., the one that is applied first) containing everything for which the successive function
applications are possible. In other words, it’s everything that the first function maps into
the domain of the second function. For FavouriteUnit ∘ Mother, this means every person
whose mother is a student, since for any such person, the function Mother produces a
member of 𝕊.
In general, the loose composition 𝑔 ∘ 𝑓 of two functions 𝑓 ∶ 𝐴 → 𝐵 and 𝑔 ∶ 𝐶 → 𝐷
is defined by

𝑔 ∘ 𝑓 ∶ {𝑥 ∈ 𝐴 ∶ 𝑓(𝑥) ∈ 𝐶} ⟶ 𝐷,
𝑔 ∘ 𝑓(𝑥) = 𝑔(𝑓(𝑥)).

Note that, as usual for our composition notation, 𝑓 is applied first, and then the result
is given to 𝑔.
The loose composition of two functions is always defined. It might sometimes be
useless: if 𝐵 ∩ 𝐶 = ∅, then nothing that 𝑓 produces is in the domain of 𝑔, so the
composition 𝑔 ∘ 𝑓 is just the empty function.
It will be seen from the definition of loose composition that it is harder to work out
the domain of loose composition than it is to work out the domain of tight composition.
This makes it a bit harder to use in practice. Tight composition has the advantage that
the question of whether the composition is defined can be answered solely by looking at
the appropriate codomain and domain; you do not need to study the rule at all. This
makes it much easier to work with, and from a computing perspective, much easier to
use as a specification of a task based on combining two tasks.
In this unit, we will use tight composition rather than loose composition.

2.12 COUNTiNG FUNCTiONS

We often want to count functions of various types. You might want to determine the
amount of time an algorithm takes, if the algorithm has to search through all functions
of some type. You might want to determine the amount of space that a collection of
data requires, if the data items correspond to functions. You might want to determine
the probability that a random function has some particular property.
Suppose 𝑓 ∶ 𝐴 → 𝐵, where the sets have sizes |𝐴| = 𝑚 and |𝐵| = 𝑛. How many
functions of this type are there? The domain has 𝑚 elements, and each of them is
mapped to one, and only one, member of the codomain 𝐵. So there are 𝑛 possibilities
for 𝑓(𝑥) for each element 𝑥 ∈ 𝐴. These choices are independent; there is no requirement
for the various values of 𝑥 to differ from each other or to be related in any other way. So
we have 𝑚 independent choices, each being among 𝑛 possibilities. This means we have
𝑛𝑚 functions.
Suppose now we require 𝑓 to be an injection. We still have 𝑚 elements in the
domain, and for each of these, we must still choose exactly one member of the codomain.
64 FUNCTiONS

But now these choices are no longer independent, since as soon as one member of the
codomain is chosen, that member is no longer available for any other member of the
domain. Suppose we make these choices in order, and to help describe this, we suppose
that the elements of 𝐴 are enumerated as 𝑎1 , 𝑎2 , … , 𝑎𝑚 . Now, 𝑎1 can be mapped to any
of the 𝑛 elements of 𝐵. Then, 𝑎2 can be mapped to any element of 𝐵 except 𝑓(𝑎1 ), so it
has 𝑛 − 1 choices. Then, 𝑎3 can be mapped to any element of 𝐵 except 𝑓(𝑎1 ) and 𝑓(𝑎2 ),
so it has 𝑛 − 2 choices. And so on. Finally, 𝑎𝑚 can be mapped to any element of 𝐵
except 𝑓(𝑎1 ), 𝑓(𝑎2 ), … , 𝑓(𝑎𝑚−1 ), so it has 𝑛 − (𝑚 − 1) choices, which is 𝑛 − 𝑚 + 1 choices.
So the total number of injections is

𝑛(𝑛 − 1)(𝑛 − 2) ⋯ (𝑛 − 𝑚 + 1).

This formula also copes nicely with the possibility that 𝑚 > 𝑛. In that case, we know
that the codomain is too small to allow any injections from the domain at all, so the
answer should be 0, and that is indeed what the formula gives, since one of the factors
will be 0.
Requiring 𝑓 to be a surjection needs some more care and will be considered later.
If we require 𝑓 to be a bijection, then that’s the same as requiring 𝑓 to be an injection
whose codomain and domain are the same size, since both sets are finite. (See the end
of § 2.7 on p. 50.) So this is the same as the injective case when 𝑚 = 𝑛. So the number
of bijections is just 𝑛(𝑛 − 1)(𝑛 − 2) ⋯ (𝑛 − 𝑛 + 1), which is 𝑛!.
Recall that, when a bijection maps a finite set to itself, it is called a permutation of
that set (§ 2.7). So the number of permutations of an 𝑛-element set is 𝑛!.

2.13 B i N A RY R E L AT i O N S

We often want to know when objects of one type are related in some specific way to
objects of another type. For example, in considering sets of people, we might like to
know who is a parent of whom. In mobile communications, systems that manage calls
would use information on which smartphones contacted which cellphone towers. Some
ecologists monitor predator-prey relationships among species in a geographic area. In
timetabling Monash classes in a given semester, we want to know which pairs of units
have at least one student in common, so that we can try to schedule classes in those
units at different times.
These situations, and very many others, can be modelled by binary relations.
A binary relation consists of two sets 𝐴 and 𝐵 and a set of ordered pairs (𝑎, 𝑏)
where 𝑎 ∈ 𝐴 and 𝑏 ∈ 𝐵. The ordered pairs are used to state which members of 𝐴 and 𝐵
are related to each other in the required way.
Recall that the Cartesian product 𝐴 × 𝐵 is the set of all ordered pairs in which the
first and second members of the pair belong to 𝐴 and 𝐵 respectively. This gives us a
very succinct way to restate our definition. A binary relation consists of two sets 𝐴
2.13 B i N A RY R E L AT i O N S 65

and 𝐵 and a subset of 𝐴 × 𝐵. We sometimes say that the binary relation is from 𝐴 to
𝐵, and we may still call 𝐴 the domain and 𝐵 the codomain
codomain.
The two sets might be the same. A binary relation on a set 𝐴 is a binary relation
from 𝐴 to itself. Each of the two sets is 𝐴, so that the relation is a subset of 𝐴 × 𝐴.
A binary relation is also called a binary predicate or a predicate with two arguments.
arguments
If 𝑅 is the name of a binary relation, then we write 𝑥𝑅𝑦 or 𝑅(𝑥, 𝑦) or (𝑥, 𝑦) ∈ 𝑅 to
mean that (𝑥, 𝑦) is one of the ordered pairs in 𝑅. The notation 𝑥𝑅𝑦 is an example of
infix notation, where the name of the operation/function/relation is placed between the
two things it links. The notation 𝑅(𝑥, 𝑦) is a further example of prefix notation. (Recall
the discussion of prefix, infix and postfix notation on p. 47.)
For example, the Parent relation is a relation on the set ℙ of all people (so the two
sets are the same in this case), and the pair of people (𝑝, 𝑞) belongs to this relation if 𝑞
is a parent of 𝑝. Members of the Parent relation include:

( Ada Lovelace , George Gordon Byron ),


( Annie Jump Cannon , Wilson Cannon ),
( John Ferrier Turing , Sara Turing ),
( John Ferrier Turing , Julius Turing ),
( Alan Mathison Turing , Sara Turing ),
( Alan Mathison Turing , Julius Turing ),
⋮ ⋮

So we may write, for example,

(Ada Lovelace , George Gordon Byron) ∈ Parent,

or using infix notation,

Ada Lovelace Parent George Gordon Byron,

or using prefix notation,

Parent(Ada Lovelace, George Gordon Byron).

For the mobile communications example, the two sets are different, namely a set of
smartphones and a set of cellphone towers, and the relation includes pairs like

( Catherine Deneuve’s phone , the Eiffel Tower ).


66 FUNCTiONS

It may be tempting to use binary relation names in the same way we use function
names. Recall the functions Mother and Father, which enable us to write statements like

Mother(Alan Turing) = Sara Turing,


Father(Alan Turing) = Julius Turing.

But we will not write “Parent(Alan Turing)”; such notation would treat Parent as a
function of people, when in fact “the parent” of a person is not, in general, uniquely
defined.
We saw in § 2.1.3𝛼 that one way to specify the rule of a function 𝑓 ∶ 𝐴 → 𝐵 is to give its
graph, i.e., to give all its ordered pairs (𝑥, 𝑓(𝑥)), which all belong to 𝐴 ×𝐵. So a function
is a type of binary relation. To be precise, a function with domain 𝐴 and codomain 𝐵
is a binary relation on 𝐴 and 𝐵 in which, for every 𝑎 ∈ 𝐴, there is a unique 𝑏 ∈ 𝐵 such
that (𝑎, 𝑏) belongs to the relation. To put it the other way round (and less formally),
a binary relation is a “function” where we drop the requirement that each member of
the domain gives exactly one member of the codomain. With this latter viewpoint, a
binary relation is sometimes called a “many-valued function” because a single member of
the domain can yield many different values in the codomain (whereas normal functions
are just single-valued). But we will not refer to “many-valued functions” because it is a
contradiction in terms: a function, by definition, cannot be many-valued in that sense.
Examples of binary relations on sets of numbers include =, ≤, ≥. Examples of
binary relations on sets of sets include =, ⊆, ⊇.
One common source of binary relations is network structures. Networks consist of
nodes with links between some pairs of nodes. For example:

• In a social network, the nodes are people and the links represent which people
know which other people.

• In a communications network, nodes represent the communicating devices and the


links may represent messages sent from one device to another.

• In an ecological network, nodes represent species and links represent predator-prey


relations among the species.

• In Monash timetabling, nodes represent the units running in a given semester, and
links represent pairs of units that have at least one student in common for that
semester. A small fragment of this network is shown in Figure 2.3. In this example,
MTH1030 and MTH1035 have no students in common (because it is prohibited to
enrol in both of them), so their classes may be at the same time (or at different
times), and we represent this lack of restriction by the absence of a link between
the corresponding nodes. But every other pair of these units has some students
that do both of them, so every other pair of nodes has a link between them.
2.13 B i N A RY R E L AT i O N S 67

FIT1058 MTH1035

MTH1030 FIT1045

Figure 2.3: A fragment of the Monash timetabling network

• In a software system, nodes represent software components of some kind (modules,


programs, …), and the links represent some mechanism for passing information
from one component to another.

In each of these cases, we have a set of nodes, and the set of links between them can be
represented by a binary relation.
Another common source of binary relations is tables of data. Whenever you write a
table with two columns, you are specifying a binary relation whose ordered pairs (𝑥, 𝑦)
correspond to the rows of the table, with 𝑥 and 𝑦 being the entries in the first and second
column respectively. If you have a table with more than two columns, then often taking
some (or perhaps any) pairs of columns will give you a binary relation in the same way.
Similar remarks apply to data stored in other ways that may be thought of as tables,
such as in spreadsheets and databases.
Important operations which you might want to do, when using a binary relation 𝑅
on sets 𝐴 and 𝐵, include:

• Determine the set of all members of 𝐴 that actually appear as the first member of
a pair in 𝑅. This is {𝑥 ∈ 𝐴 ∶ (𝑥, 𝑦) ∈ 𝑅 for some 𝑦 ∈ 𝐵}.

• Determine the set of all members of 𝐵 that actually appear as the second member
of a pair in 𝑅. This is {𝑦 ∈ 𝐵 ∶ (𝑥, 𝑦) ∈ 𝑅 for some 𝑥 ∈ 𝐴}.

• Given 𝑥, determine everything that is related to it by 𝑅. This is {𝑦 ∈ 𝐵 ∶ 𝑥𝑅𝑦}.


This set is often denoted by 𝑅(𝑥), but we must not confuse this with the notation
for the value of a function, since 𝑅(𝑥) is typically not just one single value in 𝐵,
but rather a set of values, and it might even be empty.

• Given 𝑦, determine everything that is related to it by 𝑅. This is {𝑥 ∈ 𝐴 ∶ 𝑥𝑅𝑦}.


This is often denoted by 𝑅 −1 (𝑦). Again, take care to avoid confusion with the
value of an inverse function, since 𝑅 −1 (𝑦) is in general some set of values that are
each related to 𝑦 rather than just one value.

• For each of these sets, we may want to determine its size.


68 FUNCTiONS

Let 𝑅 be any binary relation from 𝐴 to 𝐵. Its inverse 𝑅 −1 is the binary relation
from 𝐵 to 𝐴 defined by
𝑦𝑅 −1 𝑥 ⟺ 𝑥𝑅𝑦.
So the inverse is constructed by just swapping the roles of the domain and codomain
and reversing all the pairs. In the special case when 𝑅 is actually a function, this is just
the definition of the inverse of a function (see § 2.8). We can now talk of the inverse of
any function, not just of bijections, but we have to remember that, if 𝑓 is not a bijection,
then 𝑓 −1 is not a function (although it is a valid binary relation).

2.14 P R O P E RT i E S O F B i N A RY R E L AT i O N S

We now meet some important types of binary relations.


Let 𝑅 be a binary relation on a set 𝐴.
We say 𝑅 is reflexive if, for all 𝑎 ∈ 𝐴, we have 𝑎𝑅𝑎. In other words, everything is
related to itself.
Examples of reflexive binary relations include equality, ≤, ≥, ⊆, ⊇ (these latter two
because these versions of the subset relation allow improper subsets).
Using the set of all people again, let knows be the binary relation that holds when
one person knows another. It contains an ordered pair (𝑝, 𝑞) of people precisely when
person 𝑝 knows person 𝑞. It seems reasonable to describe knows as reflexive, as everyone
knows themselves (more or less).
Binary relations that are not reflexive include <, >, ⊂, ⊃. The relation Parent is
also not reflexive.
We say 𝑅 is symmetric if, for all 𝑥, 𝑦 ∈ 𝐴, 𝑥𝑅𝑦 implies 𝑦𝑅𝑥.
The relation = is symmetric, but the relations <, ≤, >, ≥, ⊆, ⊂, ⊃, ⊇ are not sym-
metric. The relation knows is symmetric, assuming that we are using the word “knows”
for relationships where each knows the other. The relation Parent is not symmetric.
The hyperlink relation on the set of all webpages — consisting of all pairs (𝑝, 𝑞) where
webpage 𝑝 has a hyperlink to webpage 𝑞 — is also not symmetric.
We say 𝑅 is antisymmetric if, for all 𝑥, 𝑦 ∈ 𝐴, 𝑥𝑅𝑦 and 𝑦𝑅𝑥 together imply 𝑥 = 𝑦.
An equivalent way to say this is: for all 𝑥, 𝑦 ∈ 𝐴, if 𝑥𝑅𝑦 and 𝑥 ≠ 𝑦 then we cannot have
𝑦𝑅𝑥. So, if two distinct elements (𝑥 ≠ 𝑦) are related one way (𝑥𝑅𝑦), then they cannot
be related the other way (𝑦𝑅𝑥).
Being antisymmetric is not the same as just being not symmetric. In other words,
“antisymmetric” and “asymmetric” are different terms meaning different things.
The relations =, ≤, ≥, ⊆, ⊇ are antisymmetric. For example, if two numbers 𝑥 and
𝑦 satisfy 𝑥 ≤ 𝑦 and 𝑦 ≤ 𝑥 then 𝑥 = 𝑦. Similarly, if two sets 𝑋 and 𝑌 satisfy 𝑋 ⊆ 𝑌 and
𝑌 ⊆ 𝑋 then 𝑋 = 𝑌. We often use this when proving that two sets are equal (see p. 9
in § 1.6): if we prove that 𝑋 ⊆ 𝑌 and also prove that 𝑌 ⊆ 𝑋 then we can conclude that
𝑋 = 𝑌.
2.15 C O M B i N i N G B i N A RY R E L AT i O N S 69

It may seem slightly surprising that equality, =, is antisymmetric, since it is also


symmetric and the two terms sound opposite. But this example illustrates that it is
indeed possible for a binary relation to be both symmetric and antisymmetric. Are
there any other cases where this happens?
The hyperlink relation on webpages is not antisymmetric. This illustrates the fact
that it is possible for a binary relation to be neither symmetric nor antisymmetric.
The relation knows is also not antisymmetric.
We say 𝑅 is transitive if, for all 𝑥, 𝑦, 𝑧 ∈ 𝐴, 𝑥𝑅𝑦 and 𝑦𝑅𝑧 together imply 𝑥𝑅𝑧.
The relations =, <, ≤, >, ≥, ⊂, ⊆, ⊃, ⊇ are all transitive, but ≠ is not. The relation
knows is not transitive: a “friend of a friend” is not necessarily your friend! The relation
Parent is not transitive either.

2.15 C O M B i N i N G B i N A RY R E L AT i O N S

Viewing a binary relation as a set of ordered pairs enables us to apply ordinary set
operations on them. Suppose 𝑅 and 𝑆 are both binary relations from 𝐴 to 𝐵. Then
their union 𝑅 ∪ 𝑆 is the binary relation from 𝐴 to 𝐵 containing every pair that belongs
to at least one of 𝑅 and 𝑆.

𝑅 ∪ 𝑆 = {(𝑥, 𝑦) ∶ 𝑥𝑅𝑦 or 𝑥𝑆𝑦}.

For example,
Parent = Mother ∪ Father.
The intersection 𝑅 ∩ 𝑆 is the binary relation from 𝐴 to 𝐵 containing every pair that
belongs to both 𝑅 and 𝑆.

𝑅 ∩ 𝑆 = {(𝑥, 𝑦) ∶ 𝑥𝑅𝑦 and 𝑥𝑆𝑦}.

For example,
Employer ∩ Parent
is the set of pairs (𝑥, 𝑦) such that 𝑦 is both an employer and a parent of 𝑥. This includes,
for example, the pairs

( Prince William , King Charles III ),


( Leonard Lauder , Estée Lauder ).

If 𝑅 is a binary relation from 𝐴 to 𝐵, and 𝑆 is a binary relation from 𝐵 to 𝐶, then


the composition 𝑆 ∘ 𝑅 is the binary relation from 𝐴 to 𝐶 whose set of ordered pairs is

{(𝑥, 𝑧) ∶ there exists 𝑦 ∈ 𝐵 such that 𝑥𝑅𝑦 and 𝑦𝑆𝑧}.

For example, the composition


Employer ∘ Parent
70 FUNCTiONS

contains all pairs (𝑥, 𝑧) such that 𝑧 employs a parent of 𝑥. If you are 𝑥, then an employer
of one of your parents is 𝑧, and the pair of you are related by Employer ∘ Parent. But the
order matters: this relation is not the same as the one that relates people to their own
employers’ parents.
In the special case when 𝑅 and 𝑆 are both functions, their composition is just their
composition as functions, using the definition of function composition given in § 2.9.
Let’s investigate the composition of any relation 𝑅 from 𝐴 to 𝐵 with its inverse
relation 𝑅 −1 . The composition 𝑅 −1 ∘ 𝑅, which goes from 𝐴 to itself, satisfies

(𝑥, 𝑧) ∈ 𝑅 −1 ∘ 𝑅 ⟺ there exits 𝑦 such that 𝑥𝑅𝑦 and 𝑦𝑅 −1 𝑧


⟺ there exits 𝑦 such that 𝑥𝑅𝑦 and 𝑧𝑅𝑦
(using the definition of 𝑅 −1 ).

So, in the composition 𝑅 −1 ∘ 𝑅, two elements of 𝐴 are related by 𝑅 −1 ∘ 𝑅 precisely when


there is an element of 𝐵 that they are both related to by 𝑅.
Similarly, the composition 𝑅 ∘𝑅 −1 , which goes from 𝐵 to itself, relates two elements
of 𝐵 precisely when there is an element of 𝐴 that is related to both those elements of 𝐵.

Binary relations, like functions, can be composed with themselves. Consider again
the binary relation knows on the set of all people. In the composition knows ∘ knows,
two people are related if they have a mutual acquaintance, i.e., they each know someone
who knows the other. We can extend this. In the composition knows∘knows∘knows, one
person is related to another if they know someone who knows someone who knows the
other. The five-fold composition

knows ∘ knows ∘ knows ∘ knows ∘ knows ∘ knows

is said to relate every pair of people on Earth. This is the principle known as “six degrees
of separation”; it uses the knows relation six times, with five compositions.
As for function composition, we can use exponents in parentheses to denote com-
position of relations with themselves: 𝑅 (𝑛) is the composition of 𝑛 copies of 𝑅. So the
composition we wrote above, for six degrees of separation, could be written knows(6) .
Sometimes, for a binary relation 𝑅 on a set 𝐴, we may want to go further and identify
every pair of elements of 𝐴 that are linked by any chain, no matter how long, of pairs
in 𝑅.
The transitive closure of a binary relation 𝑅 on a set 𝐴 is the unique binary
relation 𝑅 + on 𝐴 such that, for all 𝑥, 𝑦, 𝑧 ∈ 𝐴,

(i) if 𝑥𝑅𝑦 then 𝑥𝑅 + 𝑦;

(ii) if 𝑥𝑅 + 𝑦 and 𝑦𝑅𝑧 then 𝑥𝑅 + 𝑧;

(iii) if 𝑥𝑅𝑦 and 𝑦𝑅 + 𝑧 then 𝑥𝑅 + 𝑧.


2.16 E Q U i VA L E N C E R E L AT i O N S 71

Actually, you can drop either the second or third (but not both) of these conditions
from the definition. You cannot drop the first condition, which gets the transitive
closure process started.
The transitive closure 𝑅 + certainly contains all pairs in 𝑅, by condition (i) in its
definition. It also contains all pairs in 𝑅 (2) , and all pairs in 𝑅 (3) , and so on. So the set
of pairs in the transitive closure is given by

𝑅 + = 𝑅 ∪ 𝑅 (2) ∪ 𝑅 (3) ∪ 𝑅 (4) ∪ ⋯ ,

which may be written more succinctly as



+
𝑅 =  𝑅 (𝑖) .
𝑖=1

The transitive closure of knows identifies whenever two people can be linked by
an arbitrarily long chain of social connections. Under the hypothesis of six degrees of
separation,
knows+ = knows(6)
and every pair of people on Earth are linked in this way.
The transitive closure of a binary relation is always transitive, hence the name.

2.16 E Q U i VA L E N C E R E L AT i O N S

It seems fair to regard equality as the most fundamental binary relation. No matter
what kind of objects we are working with, we want to be able to say whether or not two
of them are really the same. If we cannot do even that, it is hard to imagine having any
useful discussion about such objects at all. So, even if there is no other relationship to
speak of among the objects we are discussing, there must be an equality relation. We
will always feel free to use equality on any set at all, without announcing its existence
beforehand.
In many situations, different objects may be treated as being equivalent for some
purposes even though they must be treated differently for other purposes. For example,
if two different students are in the same weekly tutorial class, then they can be treated
as equivalent for FIT1058 timetabling purposes, even though their other subjects may
be different so that they cannot be treated as equivalent for other timetabling purposes.
1058
Define the binary relation === on the set 𝕊 of all students by
1058
𝑝 === 𝑞 ⟺ 𝑝 and 𝑞 are in the same FIT1058 tutorial,

for any 𝑝, 𝑞 ∈ 𝕊.
If a binary relation is to capture some notion of “equivalence”, what properties should
it have? We can use equality as a guide, since “equivalence” should be like equality but
72 FUNCTiONS

a bit “looser” in that two things can be equivalent without being identical. We have
just seen that the equality relation is reflexive, symmetric, and transitive. This is also
what we would expect of a binary relation that tells us when two things are equivalent.
Every object, of any kind, is certainly equivalent to itself; if one object is equivalent to
another, then the latter object must also be equivalent to the former; and if an object is
equivalent to another, which in turn is equivalent to a third object, then the first object
must also be equivalent to the third object.
Equality is also antisymmetric. But we will not add this to our requirements of
equivalence in general, since antisymmetry requires that if an object is equivalent to
another, which in turn is equivalent to our first object, then the two objects must actually
be equal. This would be tantamount to saying that equivalence implies equality, which
would mean that we have no form of equivalence other than equality itself, which is too
narrow.
With these thoughts in mind, we make the following definition.

An equivalence relation is a binary relation that is reflexive, symmetric, and tran-


sitive.
1058
It is routine to check that the binary relation === is an equivalence relation.
Other examples of equivalence relations:

• Let 𝑚 ∈ ℕ. Two integers 𝑥, 𝑦 are congruent modulo 𝑚 if 𝑥 − 𝑦 is a multiple of


𝑚 (often written 𝑚 ∣ (𝑥 − 𝑦)). We write this equivalence as 𝑥 ≡ 𝑦 (mod 𝑚).

• Two real numbers 𝑥, 𝑦 are equivalent in integer part if, when each is rounded
int
to the nearest integer, they become equal. We write this as 𝑥 == 𝑦.

• Two triangles in the plane are congruent if one can be moved onto the other by
some sequence of translations, rotations and reflections.

• Every function 𝑓 ∶ 𝐴 → 𝐵 defines an equivalence relation as follows. If 𝑥, 𝑦 ∈ 𝐴, we


write 𝑥 ∼𝑓 𝑦 if 𝑓(𝑥) = 𝑓(𝑦). This relation ∼𝑓 is an equivalence relation on 𝐴.

• The transitive closure of any reflexive symmetric binary relation is an equivalence


relation.

• For any relation 𝑅, the relations 𝑅 ∘ 𝑅 −1 and 𝑅 −1 ∘ 𝑅 are equivalence relations.

Let 𝑅 be an equivalence relation on a set 𝐴. An equivalence class of 𝐴 under 𝑅


is a nonempty subset 𝑋 ⊆ 𝐴 in which

(i) whenever 𝑥, 𝑦 ∈ 𝑋 , we have 𝑥𝑅𝑦;

(ii) there are no 𝑥 ∈ 𝑋 , 𝑧 ∉ 𝑋 such that 𝑥𝑅𝑧.


2.16 E Q U i VA L E N C E R E L AT i O N S 73

So all members of 𝑋 are equivalent to each other, but no member of 𝑋 is equivalent to


anything outside 𝑋 . This condition may be rephrased as saying that 𝑋 is a maximal
subset of 𝐴 in which all elements are equivalent. (Reminder: the word “maximal” does
not mean that 𝑋 is largest in size among all those with this property. Rather, it means
that 𝑋 cannot be enlarged while maintaining this property; in other words, no proper
superset of 𝑋 has the property. See § 1.7.)

Recalling our earlier examples of equivalence relations:


1058
• For the binary relation ===, the equivalence classes are the FIT1058 tutorials (or,
more precisely, the sets of students allocated to each tutorial).

• For congruence modulo 𝑚, the equivalence classes are the sets {𝑘𝑚 + 𝑟 ∶ 𝑘 ∈ ℤ},
for each 𝑟 ∈ {0, 1, … , 𝑚 − 1}. These sets are called the residue classes modulo
𝑚. There is one such set for each 𝑟 ∈ {0, 1, … , 𝑚 − 1}. For example, if 𝑚 = 2, then
𝑟 ∈ {0, 1} and we have two equivalence classes: the set of all even integers (for
𝑟 = 0), and the set of all odd integers (for 𝑟 = 1).
int
• For the relation ==, the equivalence classes are the intervals [𝑛 − 12 , 𝑛 + 12 ) for all
𝑛 ∈ ℤ.

• For congruent triangles, the equivalence classes each contain all triangles of one
particular shape and size (so they all have the same angles and side lengths, but
otherwise can be in any location and orientation in the plane).

• For ∼𝑓 , the equivalence classes are the preimages 𝑓 −1 (𝑦) = {𝑥 ∶ 𝑓(𝑥) = 𝑦} of each
𝑦 ∈ im 𝑓.
In each of these cases, the equivalence classes divide up the set 𝐴 neatly, so that
each element of 𝐴 belongs to exactly one equivalence class. These are manifestations of
a general phenomenon.
Theorem 12.12 The equivalence classes of an equivalence relation 𝑅 on a set 𝐴 form a
partition of 𝐴.
Proof. To show that the equivalence classes form a partition of 𝐴, we need to show
that (a) every member of 𝐴 belongs to an equivalence class, and (b) no two different
equivalence classes overlap. (Note that equivalence classes are nonempty by definition.)

(a)
Let 𝑥 ∈ 𝐴. Define 𝑋 to be the set of all members of 𝐴 that are equivalent to 𝑥 under
our equivalence relation 𝑅:
𝑋 = {𝑦 ∈ 𝐴 ∶ 𝑥𝑅𝑦}.
We claim that this is an equivalence class of 𝑅 that contains 𝑥. We prove parts (i) and
(ii) of the definition of equivalence relation, in turn.
74 FUNCTiONS

Firstly, suppose 𝑦, 𝑧 ∈ 𝑋 . By definition of 𝑋 , this implies that 𝑥𝑅𝑦 and 𝑥𝑅𝑧. By


symmetry and transitivity of 𝑅, it follows that 𝑦𝑅𝑧. So every pair of members of 𝑋 are
equivalent, satisfying the first condition for an equivalence class.
Now suppose that 𝑣 ∈ 𝑋 , 𝑤 ∉ 𝑋 such that 𝑣𝑅𝑤. The fact that 𝑣 ∈ 𝑋 implies that
𝑥𝑅𝑣. This, combined with 𝑣𝑅𝑤 and transitivity, gives 𝑥𝑅𝑤. Therefore 𝑤 ∈ 𝑋 . This
contradicts the assumption that 𝑤 ∉ 𝑋 . So there cannot be any 𝑣 ∈ 𝑋 , 𝑤 ∉ 𝑋 such that
𝑣𝑅𝑤. So the second condition for an equivalence class is satisfied.
We conclude that 𝑋 is an equivalence class.
Furthermore, 𝑋 clearly contains 𝑥, since 𝑥𝑅𝑥 by reflexivity.
So every member of 𝐴 belongs to an equivalence class.

(b)
Suppose 𝑋 and 𝑌 are overlapping equivalence classes of 𝑅. So 𝑋 ∩ 𝑌 ≠ ∅. Let
𝑥 ∈ 𝑋 ∩ 𝑌.
We claim that 𝑋 = 𝑌.
To do this, we consider 𝑌 ∖𝑋 and 𝑋 ∖𝑌 and show that they are both empty, which
implies that 𝑋 = 𝑌. (See Theorem 4 and (1.17).)
Consider first the possibility that 𝑌 ∖ 𝑋 ≠ ∅. Suppose 𝑦 ∈ 𝑌 ∖ 𝑋 . Because 𝑋 is an
equivalence class and 𝑦 ∉ 𝑋 , it follows that (𝑥, 𝑦) ∉ 𝑅. But because 𝑌 is an equivalence
class and 𝑥, 𝑦 ∈ 𝑌, it follows that (𝑥, 𝑦) ∈ 𝑅. So we have a contradiction in this case.
Therefore 𝑌 ∖ 𝑋 ≠ ∅ is impossible, so 𝑌 ∖ 𝑋 = ∅.
It remains to consider the possibility that 𝑋 ∖ 𝑌 ≠ ∅. But the argument here is the
same as that of the previous paragraph, with 𝑋 and 𝑌 interchanged. So 𝑋 ∖ 𝑌 = ∅.
So we have 𝑌 ∖ 𝑋 = 𝑋 ∖ 𝑌 = ∅. This implies that 𝑋 = 𝑌.
So the only way two equivalence classes can overlap is if they are identical.
So we have shown that every member of 𝐴 belongs to exactly one equivalence class.
So the equivalence classes of 𝑅 form a partition of 𝐴.

2.17 R E L AT i O N S

Binary relations are binary in the sense that they consists of ordered pairs. It is often
useful to consider relations whose members have more objects.
A ternary relation on sets 𝐴, 𝐵, 𝐶 is a set of triples (𝑥, 𝑦, 𝑧) such that 𝑥 ∈ 𝐴, 𝑦 ∈ 𝐵
and 𝑧 ∈ 𝐶. (Triples written in parentheses are always considered to be ordered.) In other
words, it is a subset of 𝐴 × 𝐵 × 𝐶. For example, a set of points in real three-dimensional
space is a ternary relation on ℝ. A list of names in which each is listed as

first name middle name surname

could be represented as a ternary relation in which each triple has the form

(first name, middle name, surname).


2.18 C O U N T i N G R E L AT i O N S 75

Many Chinese names could be represented in a ternary relation with triples

(family name, personal name, generation name).

Dates can be represented by (day, month, year) triples, so they too can be represented
naturally in a ternary relation.
More generally, an 𝑛-ary
-ary relation on sets 𝐴1 , 𝐴2 , … , 𝐴𝑛 consists of 𝑛-tuples (𝑥1 , 𝑥2 , … ,
𝑥𝑛 ) where 𝑥𝑖 ∈ 𝐴𝑖 for all 𝑖 ∈ {1, … , 𝑛}. In other words, it is a subset of 𝐴1 × 𝐴2 × ⋯ × 𝐴𝑛 .
-th domain.
The set 𝐴𝑖 is called the 𝑖-th domain
When you use a table with 𝑛 columns, its rows represent the 𝑛-tuples in an 𝑛-ary
relation. For each 𝑖, the set 𝐴𝑖 is a set that contains all objects appearing in the 𝑖-th
column (and is allowed to contain more, much as a codomain of a function is allowed to
contain more than just the image). So, when you work with spreadsheets, you are usually
working with 𝑛-ary relations whose members represent the rows of the spreadsheet.
Relations are fundamental in databases, so much so that the term relational database
is used for one of the most widely used types of database.
An 𝑛-ary relation is also called an 𝑛-ary
-ary predicate or a predicate with 𝑛 arguments.
arguments

2.18 C O U N T i N G R E L AT i O N S

How many binary relations from 𝐴 to 𝐵 are there, where 𝐴 and 𝐵 are finite sets?
Put 𝑚 ∶= |𝐴| and 𝑛 ∶= |𝐵|.
A binary relation is just a subset of 𝐴 × 𝐵, so the number of binary relations from
𝐴 to 𝐵 is just the number of subsets of 𝐴 × 𝐵. This is just the size of the power set
𝒫(𝐴 × 𝐵), which is
2|𝐴×𝐵| ,
by (1.1). But |𝐴 × 𝐵| = 𝑚𝑛, by (1.19). So

# binary relations from 𝐴 to 𝐵 = 2𝑚𝑛 .

In the special case when 𝐴=𝐵, with |𝐴| = 𝑚, we have


2
# binary relations on 𝐴 = 2𝑚 .

Now let’s look at more general relations. How many 𝑛-ary relations are there on sets
𝐴1 , 𝐴2 , … , 𝐴𝑛 ?
Put 𝑚𝑖 ∶= |𝐴𝑖 | for each 𝑖 = 1, 2, … , 𝑛.
An 𝑛-ary relation is just a set of 𝑛-tuples from 𝐴1 × 𝐴2 × ⋯ × 𝐴𝑛 . So the number of
𝑛-ary relations is just the size of the power set of 𝐴1 × 𝐴2 × ⋯ × 𝐴𝑛 :

# 𝑛-ary relations on 𝐴1 … , 𝐴𝑛 = 2|𝐴1 ×𝐴2 ×⋯×𝐴𝑛 | = 2𝑚1 𝑚2 ⋯𝑚𝑛 .


76 FUNCTiONS

In the special case when all sets are the same (say, 𝐴𝑖 = 𝐴 for all 𝑖) and have size 𝑚, we
have
𝑛
# 𝑛-ary relations on 𝐴 = 2𝑚 .

2.19 EXERCiSES

1. Let 𝐴 be a finite set. How many indicator functions with domain 𝐴 are there?

2. Let 𝐴 and 𝐵 be any sets. Express the indicator functions of each of the following
in terms of the indicator functions 𝜒𝐴 and 𝜒𝐵 .
(a) 𝐴
(b) 𝐴 ∩ 𝐵
(c) 𝐴 ∖ 𝐵
(d) 𝐴△𝐵

3. If two functions 𝑓 ∶ 𝐴 → ℝ and 𝑔 ∶ 𝐴 → ℝ have the same domain 𝐴 and give real
numbers as their values, then their sum 𝑓 + 𝑔 ∶ 𝐴 → ℝ is defined for all 𝑥 ∈ 𝐴 by

(𝑓 + 𝑔)(𝑥) = 𝑓(𝑥) + 𝑔(𝑥).

What is the sum of all the indicator functions of all the subsets of a finite set?

4. Draw a Venn diagram showing each of the following sets of functions and the rela-
tionships between them: functions, injections, surjections, bijections, identity functions,
binary relations, ternary relations, relations.

5. Functions can be viewed as sets of ordered pairs, so we can combine functions using
set operations. The result will be a binary relation but, depending on the operation, it
may or may not be another function.
Suppose 𝑓 ∶ 𝐴 → 𝐵 and 𝑔 ∶ 𝐶 → 𝐷 are functions. Which of the following is always a
function?
𝑓 ∩ 𝑔; 𝑓 ∪ 𝑔; 𝑓 ∖ 𝑔; 𝑔 ∖ 𝑓; 𝑓△𝑔; 𝑓 × 𝑔 .
For each that is always a function, give its definition in the usual form, including showing
how the rule depends on 𝑓 and 𝑔. For each that is not necessarily a function, explain why
this is the case (e.g., with the help of examples for 𝑓 and 𝑔 under which the operation
does not give a function).

6. Suppose that 𝑓 ∶ 𝐴 → 𝐵 and 𝑔 ∶ 𝐶 → 𝐷 are functions with disjoint domains:


𝐴 ∩ 𝐶 = ∅. Is the disjoint union 𝑓 ⊔ 𝑔 always a function? If so, give its definition in
terms of 𝑓 and 𝑔. If not, why not?
2.19 E X E R C i S E S 77

7. Give an example of a function 𝑓 ∶ ℕ × ℕ → ℕ which is an injection. Even better,


give one which is a bijection.

8. A bitstring is a string over the binary alphabet {0,1}; in other words, it is a


member of {0, 1}∗ .

(a) Give a bijection from {0, 1}∗ to ℕ.

Now let {0, 1}∗+ denote the set of all finite nonempty tuples (or sequences) of bit-
strings. Examples of members of {0, 1}∗+ include:

(011, 1, 10001), (10, 10, 10, 10), (𝜖, 1011), (1010010001).

(b) (Challenge) Give a bijection from {0, 1}∗+ to {0, 1}∗ .

9. Let 𝐴 be a set of 𝑛 elements and let 𝑥 ∈ 𝐴. We say that a function 𝑓 ∶ 𝐴 → 𝐴 fixes


𝑥 if 𝑓(𝑥) = 𝑥. We also say in this case that 𝑥 is a fixed point of 𝑓.
This says nothing about what 𝑓 does to other elements of 𝐴; it is possible that some
other elements are fixed by 𝑓 too, or that none are. All this definition requires is that 𝑓
sends 𝑥 to itself.
Now let 𝑋 ⊆ 𝐴. The function 𝑓 fixes 𝑋 if it fixes all elements of 𝑥. So, for all 𝑥 ∈ 𝑋 ,
we have 𝑓(𝑥) = 𝑥. In other words, the restriction of 𝑓 to 𝑋 is just the identity function
on 𝑋 :
𝑓|𝑋 = 𝑖𝑋 .
Members of 𝐴 ∖ 𝑋 may or may not be fixed by 𝑓.
Let Fix(𝑋 ) be the set of bijections on 𝐴 that fix 𝑋 , and let 𝐹 be the set of all
bijections on 𝐴 that fix at least one element of 𝐴.
Using Exercise 1.13 (i.e., Exercise 13 in Chapter 1), express |𝐹 | in terms of the set
sizes |Fix(𝑋 )| using all 𝑋 ⊆ 𝐴.

10. For each 𝑘 ∈ {1, … , 𝑛} and any subset 𝑋 ⊆ 𝐴 with |𝑋 | = 𝑘, write 𝑓𝑘 for the number
of bijections on 𝐴 that fix 𝑋 .

(a) Why don’t we include 𝑋 in the notation 𝑓𝑘 , given that its definition refers to 𝑋 ?

(b) Give an expression for 𝑓𝑘 .

(c) Express |𝐹 | in terms of 𝑓1 , 𝑓2 , … , 𝑓𝑛 , and then use your expression for 𝑓𝑘 to give an
expression for |𝐹 |.

(d) Hence express the number of fixed-point-free bijections on 𝐴 in terms of 𝑓1 , 𝑓2 , … , 𝑓𝑛 .


A function is fixed-point-free if it has no fixed points.
78 FUNCTiONS

(e) For each of the following sets, what proportion of all bijections on the set are fixed-
point-free? {0, 1}; {♠, ♣, ♢, ♡}; the set of ten decimal digits; the 26-letter English
alphabet.
You may need to use a program like Wolfram Alpha, or a spreadsheet, to help
calculate the latter two.

(f) What is the largest set for which you can determine this proportion, and what is the
value of the proportion for that set? (Use a spreadsheet or a program if you wish.)

11.
(a) Write down all surjections from {1, 2, 3, 4} to {𝑎, 𝑏, 𝑐}. Compare your answer with
Exercise 1.20.

(b) Define a surjection from

{ surjections from {1, 2, 3, 4} to {𝑎, 𝑏, 𝑐} }

to
{ partitions of {1, 2, 3, 4} into three parts }.

(c) What sizes can the preimages of members of its image be?

12. Suppose 𝑓 ∶ 𝐴 → 𝐵 and 𝑔 ∶ 𝐶 → 𝐷 are bijections, 𝐴 ∩ 𝐶 = ∅ and 𝐵 ∩ 𝐷 = ∅. Since


𝐴 ∩ 𝐶 = ∅, the disjoint union 𝑓 ⊔ 𝑔 exists. Show that 𝑓 ⊔ 𝑔 is a bijection and express the
inverse of 𝑓 ⊔ 𝑔 in terms of the inverses of 𝑓 and 𝑔.

13. If the function 𝑓 has an inverse function, which of its iterated compositions 𝑓 (𝑘)
have inverse functions (where 𝑘 ∈ ℕ)?

14. Determine all functions from {1, 2, 3} to itself whose indefinitely iterated compo-
sition is a constant function (i.e., 𝑓 (𝑘) is a constant function if 𝑘 is large enough).
For a challenge: investigate when this happens in general. Try to characterise those
functions 𝑓 ∶ {1, 2, … , 𝑛} → {1, 2, … , 𝑛} such that, for large enough 𝑘, the iterated compo-
sition 𝑓 (𝑘) is constant.

15. This exercise is about the Caesar slide cryptosystem, one of the oldest and
simplest cryptosystems. It is not secure, but its core operation is used in many stronger
and more complex cryptosystems.
Caesar slide encrypts a message by sliding each letter along the alphabet, rightwards,
by some fixed number of steps. This fixed number is the key, 𝑘, and the same amount of
sliding is done to each letter of the message. The sliding is done with wrap-around, so
if sliding ever takes you beyond the end of the alphabet, then you resume sliding at the
2.19 E X E R C i S E S 79

start of the alphabet. In effect, we treat the alphabet as a circle rather than a straight
line.
For example, if the message is5

thefamilyofdashwoodhadlongbeen

and the key is 𝑘 = 3 (Julius Caesar’s favourite), then the cyphertext is

wkhidplobrigdvkzrrgkdgorqjehhq

Here, sliding t along the alphabet by 3 steps gives w. We see the wrap-around when
encrypting the letter y, since sliding it step-by-step gives z, then (wrapping around) a,
then b.
Decryption is the reverse of encryption, meaning that we slide 𝑘 steps leftwards
instead of rightwards, again with wrap-around.
We now give a formal definition of this cryptosystem, using the definition of cryp-
tosystems given in §2.10.

• The message space 𝑀 is the set of all strings of English lower case letters (without
blanks).

• The key space 𝐾 is the English lower-case alphabet, {a, b, c, … , y, z}.

• The cypher space 𝐶 is the same as the message space: 𝐶 = 𝑀 .

• The encryption and decryption functions, to be defined below, need to use the
letter addition operation defined by the following table.

5 from the first sentence of Sense and Sensibility by Jane Austen, first published by Thomas Egerton in
London in 1811.
80 FUNCTiONS

+ a b c d e f g h i j k l m n o p q r s t u v w x y z
a a b c d e f g h i j k l m n o p q r s t u v w x y z
b b c d e f g h i j k l m n o p q r s t u v w x y z a
c c d e f g h i j k l m n o p q r s t u v w x y z a b
d d e f g h i j k l m n o p q r s t u v w x y z a b c
e e f g h i j k l m n o p q r s t u v w x y z a b c d
f f g h i j k l m n o p q r s t u v w x y z a b c d e
g g h i j k l m n o p q r s t u v w x y z a b c d e f
h h i j k l m n o p q r s t u v w x y z a b c d e f g
i i j k l m n o p q r s t u v w x y z a b c d e f g h
j j k l m n o p q r s t u v w x y z a b c d e f g h i
k k l m n o p q r s t u v w x y z a b c d e f g h i j
l l m n o p q r s t u v w x y z a b c d e f g h i j k
m m n o p q r s t u v w x y z a b c d e f g h i j k l
n n o p q r s t u v w x y z a b c d e f g h i j k l m
o o p q r s t u v w x y z a b c d e f g h i j k l m n
p p q r s t u v w x y z a b c d e f g h i j k l m n o
q q r s t u v w x y z a b c d e f g h i j k l m n o p
r r s t u v w x y z a b c d e f g h i j k l m n o p q
s s t u v w x y z a b c d e f g h i j k l m n o p q r
t t u v w x y z a b c d e f g h i j k l m n o p q r s
u u v w x y z a b c d e f g h i j k l m n o p q r s t
v v w x y z a b c d e f g h i j k l m n o p q r s t u
w w x y z a b c d e f g h i j k l m n o p q r s t u v
x x y z a b c d e f g h i j k l m n o p q r s t u v w
y y z a b c d e f g h i j k l m n o p q r s t u v w x
z z a b c d e f g h i j k l m n o p q r s t u v w x y

In effect, letters correspond to numbers in {0, 1, 2, … , 24, 25}, and when we do the
letter addition 𝛼 + 𝛽, we start at 𝛼 and slide to the right (with wrap-around
as needed) by a number of steps given by 𝛽. Letter addition is commutative and
associative. We can also define letter subtraction, where we slide to the left instead
of the right.

• The Caesar slide encryption function 𝑒 ∶ 𝑀 × 𝐾 → 𝑀 is defined for any message


𝑚 ∈ 𝑀 and key 𝑘 ∈ 𝐾 as follows. Let 𝑛 be the length of 𝑚 and let its letters
be 𝑚1 , 𝑚2 , … , 𝑚𝑛 , so we can write 𝑚 = 𝑚1 𝑚2 ⋯ 𝑚𝑛 . Then 𝑒(𝑚, 𝑘) is the string
𝑐 = 𝑐1 𝑐2 ⋯ 𝑐𝑛 of 𝑛 letters where, for each 𝑖, the 𝑖-th letter 𝑐𝑖 is obtained from the
𝑖-th letter of the message by adding the key letter to it, using the wrap-around
method described above. Formally,

𝑒(𝑚, 𝑘) = 𝑐1 𝑐2 ⋯ 𝑐𝑛 ,
𝑐𝑖 = 𝑚𝑖 + 𝑘.
2.19 E X E R C i S E S 81

The addition here is letter addition. Note that, for each 𝑖, the same key letter is
used. Although message and cypher letters may (and usually do) change as you
go along the message, the key letter used for encryption does not change.

• The Caesar slide decryption function 𝑑 ∶ 𝑀 × 𝐾 → 𝑀 is defined for any cyphertext


𝑐 = 𝑐1 𝑐2 ⋯ 𝑐𝑛 ∈ 𝑀 and key 𝑘 ∈ 𝐾 as follows. Its value 𝑑(𝑐, 𝑘) is the string 𝑚 =
𝑚1 𝑚2 ⋯ 𝑚𝑛 where, for each 𝑖, the 𝑖-th letter 𝑚𝑖 is obtained from the 𝑖-th letter
of the cyphertext by subtracting the key letter to it, again using wrap-around.
Formally,

𝑑(𝑐, 𝑘) = 𝑚1 𝑚2 ⋯ 𝑚𝑛 ,
𝑚𝑖 = 𝑐𝑖 − 𝑘.

Again, we use the same key letter at all positions.

Now refer to the definitions of embeddable, equivalent and idempotent given in


Assignment 1.

Prove that the Caesar slide cryptosystem is idempotent.

16. If 𝑓 is an injection and 𝑓 −1 is its inverse relation, what can you say about 𝑓 ∘ 𝑓 −1
−1
and 𝑓 ∘ 𝑓? What kinds of functions or relations are they, and what are their domains
and codomains?

17. If a binary relation is symmetric, what can you say about its inverse relation?

18. Recall the Parent relation from § 2.13.


Suggest a good name for its transitive closure.
Then use that relation (i.e., the transitive closure) to define formally the relation
that links two people that are related to each other (however distantly).

19. Rephrase the Collatz Conjecture as a statement about the transitive closure of
the Collatz function.

20. Below we give some binary relations on the set of all Python programs. The
Python programs are denoted by 𝑃 and 𝑄, and the binary relations are denoted by
≃1 , ≃2 , … , ≃10 . For each relation, determine whether or not it is an equivalence relation,
and give reasons.
82 FUNCTiONS

notation definition of when the relation holds


𝑃 ≃1 𝑄 𝑃 and 𝑄 have the same number of characters
𝑃 ≃2 𝑄 𝑃 and 𝑄 compute the same function
𝑃 ≃3 𝑄 𝑃 and 𝑄 have the same set of variable names
𝑃 ≃4 𝑄 𝑃 and 𝑄 have at least one variable name in common
𝑃 ≃5 𝑄 𝑃 and 𝑄 have no variable names in common
𝑃 ≃6 𝑄 𝑃 and 𝑄 were written by the same set of programmers
𝑃 ≃7 𝑄 𝑃 and 𝑄 have at least one coauthor in common
𝑃 ≃8 𝑄 𝑃 was completed before 𝑄
𝑃 ≃9 𝑄 𝑃 was completed on the same day as 𝑄
𝑃 ≃10 𝑄 𝑃 was completed after 𝑄

21. What can you say about the transitive closure of an equivalence relation?

22. The anagram relation on the set of all English words is defined as follows. Let
𝑥1 𝑥2 ⋯ 𝑥𝑚 and 𝑦1 𝑦2 ⋯ 𝑦𝑛 be two English words of lengths 𝑚 and 𝑛 respectively, where
the 𝑥𝑖 and 𝑦𝑗 are their letters (1 ≤ 𝑖 ≤ 𝑚, 1 ≤ 𝑗 ≤ 𝑛). The ordered pair of words

(𝑥1 𝑥2 ⋯ 𝑥𝑚 , 𝑦1 𝑦2 ⋯ 𝑦𝑛 )

belongs to the anagram relation if and only if there exists a bijection 𝑓 ∶ {1, 2, … , 𝑚} →
{1, 2, … , 𝑛} such that, for all 𝑖 ∈ {1, 2, … , 𝑚}, we have 𝑥𝑖 = 𝑦𝑓(𝑖) .
We assume all letters belong to the usual lower-case English alphabet {a, b, … , z}.
The exact choice of dictionary does not matter for this exercise.

(a) Find the largest set of three-letter words you can that are all related to each other
by the anagram relation.

(b) For each 𝑘 = 1, 2, 3, 4, find an English word of length 𝑘 that is not related to any
other word by anagram.

(c) Prove that anagram is an equivalence relation.


𝜔
(d) (Programming challenge) Using a standard open-access list of English words,
determine the number of equivalence classes of anagram on the set of words in that
list.

23. How many non-reflexive binary relations are there on a set of size 𝑛?

24. Recall that when you have a table with 𝑛 columns, its rows represent the 𝑛-tuples
in an 𝑛-ary relation. For 1 ≤ 𝑖 ≤ 𝑛, let 𝐴𝑖 be a set containing all the entries in column 𝑖
(and possibly more elements), so that the columns of the table define an 𝑛-ary relation
on 𝐴1 × 𝐴2 × ⋯ × 𝐴𝑛 . Assuming the columns of this table are all distinct, how many
2.19 E X E R C i S E S 83

ternary relations can you construct by choosing columns from this table of 𝑛 columns?
3
PROOFS

Why should a programmer learn to write mathematical proofs? The most obvious
answer is that proofs give rigorous justification for properties of your programs or of
the structures that they work with. A computer scientist is not just a computer user
or hobbyist or fan. A computer scientist provides rational support for their claims.
Sometimes, this can be done by computational experiments, where programs are run
on a large number of different inputs that hopefully form a representative sample of
the situations of interest, and the outputs are studied carefully and perhaps analysed
statistically. But a proof gives a more fundamental kind of support for a claim. It is
independent of the specific technology which is used to develop and run the program.
It applies to a far wider range of scenarios than can ever actually be run in a set of
computational experiments.
In this chapter, we learn about the nature of mathematical proofs and the art of
writing them. We treat the main types of proof, with emphasis on proof by induction.
We conclude in § 3.14 by reflecting further on the role of proofs in computer science.
Proofs are not only a tool for proving statements about programs; proofs are, themselves,
like programs in many ways, and writing them is like programming, and developing skill
in proof-writing will make you a better programmer.

3.1𝛼 THEOREMS AND PROOFS

A theorem is a mathematical statement that has been proved to be true.


You may also come across the terms “proposition”, “lemma” and “corollary”. They are
also theorems, but the different terms tell us something about how the theorem is used.
A lemma is a theorem whose sole purpose is to be used in the proof of a more significant
theorem later.1,2 A proposition is a theorem that is (unlike lemmas) of interest in its
own right, but the term is typically used for theorems that are easy to prove and are

1 Lemmas can tend to be highly technical and are often forgotten even in cases where the theorem they
help prove becomes famous. But some lemmas have found fame in their own right, e.g., the Handshaking
Lemma, which you’ll meet later in this unit, and Burnside’s Lemma.
2 These days, the plural of “lemma” is “lemmas”, as you would expect. But, traditionally, the plural was
“lemmata”, which you may encounter in old books or papers. Are there other English words where the
plural ending is -ta?

85
86 PROOFS

of less significance than other theorems being proved in the same article/chapter/book.
We also use the term “proposition” in a more specific sense from next week onwards. A
corollary is a theorem that follows almost immediately from another theorem that has
just been stated.
A proof of a claim is a step-by-step argument that establishes, logically and with
certainty, that the claim is true.
A proof consists of a finite sequence of statements, culminating in the claim that is
being proved. These statements are often called the steps of the proof. Each statement
in the proof must be one of the following:

• something you already knew before the start of the proof, i.e.,
– a definition,
– an axiom (i.e., some fundamental property that is always taken to be true for
the objects under discussion, such as the distributive law 𝑥(𝑦 + 𝑧) = 𝑥𝑦 + 𝑥𝑧
for numbers),
– a previously-proved theorem;

or

• an assumption, as a start towards proving something that is true under that as-
sumption;

or

• a logical consequence of some of the previous statements. In other words, there


must be some previous statements which, together, imply the current statement.

• The last statement in the sequence should establish that the claim follows from
some previous statements in the proof. If the last statement is, by that stage, an
obvious consequence of the statements before it, it is often omitted.

A proof must be verifiable, so it must be written clearly. It must be able to be read


sequentially, with each step depending only on what comes before it. It should not be
necessary to read ahead to determine if a step is correct (although it’s ok to read ahead
to help your understanding).
We announce the beginning of a proof with the heading Proof. The end-of-proof
symbol □ indicates the end of the proof.3

3 Another traditional way to indicate the end of a proof is using the acronym “QED”, which stands for the
Latin phrase “quod erat demonstrandum”, meaning “which was to be proved”. Occasionally, the end of a
proof is indicated by //.
3.1𝛼 T H E O R E M S A N D P R O O F S 87

We illustrate these concepts with the following theorem and proof, which you have
seen before in Exercise 7. We number the steps of the proof, and give each its own line,
to help with later discussion. But we won’t normally do this in proofs.

Theorem 13.
13 For any sets 𝐴 and 𝐵, if 𝐴 ⊆ 𝐵 then 𝒫(𝐴) ⊆ 𝒫(𝐵).

Proof.
(1) Assume 𝐴 ⊆ 𝐵.
(2) Let 𝑋 ∈ 𝒫(𝐴).
(3) Therefore 𝑋 ⊆ 𝐴, by definition of 𝒫(𝐴).
(4) Therefore 𝑋 ⊆ 𝐵.
(5) Therefore 𝑋 ∈ 𝒫(𝐵).
(6) So we have shown that 𝑋 ∈ 𝒫(𝐴) implies 𝑋 ∈ 𝒫(𝐵).
(7) Therefore 𝒫(𝐴) ⊆ 𝒫(𝐵).

Think about the role of each step in this proof.

• What type of proof step is it? (See our listing of types of proof steps above.)

• Does it make use of any previous steps? If not, why not? If so, how?

• Does it use any subsequent steps? If not, good! If so, we have a problem!

The following table repeats all the proof steps, slightly expanding some of them, and
annotating each step to show how it relates to our discussion of proof steps above.
88 PROOFS

proof step comment

(1) Assume 𝐴 ⊆ 𝐵. The theorem statement is about what hap-


pens under the assumption that 𝐴 ⊆ 𝐵, so
we start by making this assumption
assumption.

(2) Let 𝑋 ∈ 𝒫(𝐴). This is a definition


definition. Its purpose is to give a
name to a general member of 𝒫(𝐴), so we can
talk about it.

(3) Therefore 𝑋 ⊆ 𝐴, 𝑋 ⊆ 𝐴 is a logical consequence of (2) and


by definition of 𝒫(𝐴). the definition of power set.

(4) Therefore 𝑋 ⊆ 𝐵, 𝑋 ⊆ 𝐵 is a logical consequence of (3) 𝑋 ⊆


by (1) and (3). 𝐴 and (1) 𝐴 ⊆ 𝐵.

(5) Therefore 𝑋 ∈ 𝒫(𝐵), This is a logical consequence of (4) 𝑋 ⊆ 𝐵


by (4) and the definition of 𝒫(𝐵). and the definition of power set.

(6) So 𝑋 ∈ 𝒫(𝐴) implies 𝑋 ∈ 𝒫(𝐵). This really just summarises what we’ve done
over steps (2)–(5).

(7) Therefore 𝒫(𝐴) ⊆ 𝒫(𝐵). This is a logical consequence of (6) and the
definition of the subset relation.

3.2 LOGiCAL DEDUCTiON

The backbone of any proof consists of its logical deductions. These are the steps where
we deduce a logical consequence of previous steps. If a proof does not have logical
deductions, then it’s just a collection of known facts and assumptions, and we get nothing
new. It’s a kind of “logical jelly” without structure or substance.
When making a logical deduction from a previous statement in a proof, the funda-
mental principle is:

If you’ve previously established 𝑃


and also that 𝑃 implies 𝑄
then you can deduce 𝑄.

This principle is known as modus ponens.


ponens We usually don’t bother to refer to it by
that name when we’re doing proofs, though, as we use it very often and is so natural
(indeed, it’s the very essence of logical deduction itself).
3.2 L O G i C A L D E D U C T i O N 89

Figure 3.1: Two dominos which could fall to the right

For example, consider the deduction we made at step (3) of the proof of Theorem 13.
At that stage, we know from earlier steps that

• 𝑋 ∈ 𝒫(𝐴), which we can treat as true simply because it’s the definition of 𝑋 (step
(2));

• if 𝑋 ∈ 𝒫(𝐴) then 𝑋 ⊆ 𝐴, which comes from the definition of 𝒫(𝐴).4

If we let 𝑃 stand for 𝑋 ∈ 𝒫(𝐴), and 𝑄 stand for 𝑋 ⊆ 𝐴, then 𝑃 ⇒ 𝑄 represents the
assertion that
if 𝑋 ∈ 𝒫(𝐴) then 𝑋 ⊆ 𝐴.
So, step (2) is 𝑃, the definition of power set gives 𝑃 ⇒ 𝑄, and logical deduction (or
modus ponens, if we want to practise our Latin) then gives 𝑄.
The role of implication in logical deduction is crucial. An implication 𝑃 ⇒ 𝑄 is the
link that translates the truth of 𝑃 into the truth of 𝑄. So let us consider it further.
Suppose we have two dominos standing on their ends, a short distance apart and
with their faces parallel, as shown side-on in Figure 3.1.
Let 𝑃 be the statement that the left domino falls to the right, and let 𝑄 be the
statement that the right domino falls to the right. Each of these statements could be
true or false. There is no requirement here for either domino to fall; it’s fine for them
both to remain standing. It’s also fine for the left domino to remain standing but
for the right one to fall (by whatever means). But if the left domino falls, then the
right domino must fall too. It is impossible to have the left domino fall with the right
domino remaining standing. So we have three possible situations, which we’ll represent
as ordered pairs:

( left stands, right stands )


( left stands, right falls )
( left falls, right falls )

4 That definition actually gives “if and only if” here: 𝑋 ∈ 𝒫(𝐴) ⇔ 𝑋 ⊆ 𝐴. But we do not need the reverse
implication right now.
90 PROOFS

left stands, right stands 3 left stands, right falls 3

left falls, right stands 7 left falls, right falls 3


Figure 3.2: Two dominos, standing or falling: three possible situations, one impossible one.

We can depict these various possibilities using sets. Let 𝑃 be the set of those
situations where the left domino falls, and let 𝑄 be the set of those situations where the
right domino falls. Then

𝑃 = {(left falls, right falls)},


𝑄 = {(left falls, right falls), (left stands, right falls)}.

The various situations and the sets 𝑃 and 𝑄 are shown in a Venn diagram in Figure 3.3.
Observe that the impossible situation

( left falls, right stands )

is not shown on the Venn diagram. If it were possible, then it would belong to 𝑃 ∖ 𝑄
and then 𝑃 would not be a subset of 𝑄. But its impossibility means that 𝑃 ∖ 𝑄 = ∅
and 𝑃 ⊆ 𝑄.
This example illustrates the general principle that the logical implication 𝑃 ⇒ 𝑄
between the statements 𝑃 and 𝑄 corresponds to the subset relation 𝑃 ⊆ 𝑄 between the
sets of situations they represent.
3.2 L O G i C A L D E D U C T i O N 91

(left stands, right stands)

(left stands, right falls)

𝑃
(left falls, right falls)

Figure 3.3: The sets 𝑃 and 𝑄.

We have seen this principle in action before, when the statements are framed as
statements about set membership, on p. 7 in § 1.6. For any two sets 𝐴 and 𝐵,

𝐴 ⊆ 𝐵 if and only if for all 𝑥 we have 𝑥 ∈ 𝐴 ⇒ 𝑥 ∈ 𝐵.

Our domino example illustrates that the link between the subset relation and logical
implication is more general, and applies even where the logical implication is not stated
in terms of set membership.
An important special case of implication is when the starting condition (on the left)
is false. If we have an implication 𝑋 ⇒ 𝑌 when 𝑋 is false, then the implication 𝑋 ⇒ 𝑌
is true regardless of what 𝑌 is. This corresponds to the fact that, if 𝑋 is the empty set
and 𝑌 is any set, then 𝑋 ⊆ 𝑌, since ∅ ⊆ 𝑌; the empty set is a subset of every set.
Keep in mind that the truth of an implication does not mean that either of its two
parts is true. In our domino example, we know that 𝑃 ⇒ 𝑄, but this does not mean
that 𝑃 falls or that 𝑄 falls. It just gives a logical relationship between these two events,
namely that if 𝑃 falls then 𝑄 falls.
Note also that this is a purely logical relationship. In the domino scenario, there is
also the ingredient of time. This plays a role in the physical mechanism by which the
falling of 𝑃 (if it happens) causes the falling of 𝑄, and it follows from that mechanism
that the fall of 𝑄 happens later, in time, than the fall of 𝑃. But this is a detail of the
actual physical setting. Logical implication itself is not an assertion about time, but
merely an assertion about the truth or falsehood of the two parts, in this case 𝑃 and 𝑄.
92 PROOFS

Indeed, it is entirely possible that logical implication can “go backwards” in time.
Suppose 𝑋 is the statement that you can see stars in the sky and 𝑌 is the statement
that the sun has set. The sun setting does not itself mean that you can see stars, since it
might be too cloudy. But if you can see stars, then you know the sun has set. So 𝑋 ⇒ 𝑌
holds, even though 𝑋 happened after 𝑌, and there is no suggestion that 𝑋 causes 𝑌.
Always keep in mind that implication is not symmetric. In the domino example, we
have 𝑃 ⇒ 𝑄, but we do not have 𝑄 ⇒ 𝑃, because 𝑄 falling does not have to be because
𝑃 falls (as we have supposed throughout that 𝑄 could be made to fall, by some external
force, even if 𝑃 remains standing). In the sunset example just given, we have 𝑋 ⇒ 𝑌,
but we do not have 𝑌 ⇒ 𝑋 , because the sun setting does not imply that you can see
stars (as it might be cloudy).
The converse of an implication is the reverse implication, i.e., the implication you
get by swapping the order of the two parts or by reversing the direction of the implication
arrow symbol.5 So the converse of 𝑃 ⇒ 𝑄 is 𝑄 ⇒ 𝑃, which can also be written 𝑃 ⇐ 𝑄.
When an implication holds, we cannot assume that the converse also holds. In the
examples of the previous paragraph, we showed that, for the two example implications
we have been discussing, the converse does not hold.
Sometimes, an implication and its converse both hold. For example, suppose you
are holding a ball above the ground. Suppose 𝑅 means that you release the ball and
𝑆 means that the ball falls to the ground. Then 𝑅 ⇒ 𝑆, and its converse 𝑆 ⇒ 𝑅 holds
too. In these situations, we can put the two implications together to make a two-way
implication, written 𝑅 ⇔ 𝑆, which means that 𝑅 and 𝑆 are logically equivalent, and
we often say that 𝑅 holds if and only if 𝑆 holds. We have seen statements of this type
before, in some of our Theorems.

3.3 P R O O F S O F E X i S T E N T i A L A N D U N i V E R S A L S TAT E M E N T S

We now give more examples of theorems and proofs, highlighting the relationship be-
tween the kind of statement you are trying to prove and the way you need to prove
it.
Theorem 14.
14 English has a palindrome.
Proof. ‘rotator’ is an English word and also a palindrome.

An existential statement is a statement that asserts that something with a specified


property exists. It may or may not be true.
The above theorem is an existential statement. This one happens to be true, and
we have given a proof of it, so it is a theorem.
Proving an existential statement, such as …
There exists a palindrome in English
5 But don’t do both of those, or you’ll get the implication you started with, just written differently! 𝑃 ⇒ 𝑄
and 𝑄 ⇐ 𝑃 are just different ways of writing the same thing.
3.4 F i N D i N G P R O O F S 93

…just requires one suitable example.

Most proofs are not this short …


Theorem 15.
15 Every English word has a vowel or a ‘y’.
Proof. This can be shown by listing all English words:
‘aardvark’ has a vowel.
‘aardwolf’ has a vowel.
‘aasvogel’ has a vowel.


‘syzygy’ has a ‘y’.

‘zygote’ has a vowel.

We have only shown a few lines of this proof. The number of lines of the full proof
equals the number of English words, which is several tens of thousands.
A universal statement is a statement that asserts that everything within some set
has a specified property.
To prove a universal statement, such as …
For every English word, it has a vowel or a ‘y’
… you need to cover every possible case.

One way is to go through all possibilities, in turn, and check each one. But the
number of things to check may be huge, or infinite. So usually we want to reason in a
way that can apply to many different possibilities at once.

3.4 FiNDiNG PROOFS

There is no systematic method for finding proofs for theorems. There are deep theoretical
reasons for this, based on work in the 1930s (Gödel, 1931; Church, 1936; Turing, 1936).
Discovering proofs is an art as well as a science. It requires
• skill at logical thinking and reasoning

• understanding the objects you’re working with

• practice and experience

• play and exploration

• creativity and imagination

• perseverence.
94 PROOFS

Although we can’t give a recipe for discovering proofs, we will give some general
advice on dealing with some common situations.
To prove subset relations, 𝐴 ⊆ 𝐵 (where 𝐴 and 𝐵 are sets):

1. Take a general member of 𝐴, and give it a name. e.g., “Let 𝑥 ∈ 𝐴”

2. Use the definition of 𝐴 to say something about 𝑥.

3. Follow through the logical consequences of that,

4. … aiming to prove that 𝑥 also satisfies the definition of 𝐵.

To prove set equality, 𝐴 = 𝐵 (where 𝐴 and 𝐵 are sets):

1. Prove 𝐴 ⊆ 𝐵

2. Prove 𝐴 ⊇ 𝐵

To prove numerical equality, 𝐴 = 𝐵 (where 𝐴 and 𝐵 represent numbers):


If symbolic manipulation using algebraic rules can transform expression 𝐴 to expression
𝐵, then that’s good;
but if not:

1. Prove 𝐴 ≤ 𝐵

2. Prove 𝐴 ≥ 𝐵

3.5 TYPES OF PROOFS

We now consider five types of proof, namely

• Proof by symbolic manipulation

• Proof by construction

• Proof by cases

• Proof by contradiction

• Proof by induction.

This list is not exhaustive.


Proofs can be quite individual in character and hard to classify, although many will
follow one of the above patterns.
Many proofs are a mix of these types.
3.6 P R O O F B Y S Y M B O L i C M A N i P U L AT i O N 95

3.6 P R O O F B Y S Y M B O L i C M A N i P U L AT i O N

Some proofs proceed by a sequence of equations, where each equation uses some basic
law (an axiom or some fundamental theorem) about the objects in question.
Many proofs you did in secondary school mathematics would have been of this type.
For example, consider the difference of two squares:

𝑥2 − 𝑦 2 = (𝑥 + 𝑦)(𝑥 − 𝑦). (3.1)

This is typically proved along the following lines.

(𝑥 − 𝑦)(𝑥 + 𝑦) = 𝑥2 + 𝑥𝑦 − 𝑦𝑥 − 𝑦 2 (expanding, using the distributive law for numbers)


2 2
= 𝑥 + 𝑥𝑦 − 𝑥𝑦 − 𝑦 (since multiplication is commutative)
2 2
= 𝑥 −𝑦 (by additive cancellation).

Incidentally, this proof also illustrates that, if you want to prove an equation, you can
start with either side of the equation and work towards the other side. You don’t have
to start with the left side, just because you read it first! In the above proof, we started
with the right side. Any proof of this type can be done in either direction, but sometimes
one direction seems more intuitive than the other.
Similarly, the basic laws of sets can be used to prove equality between some ex-
pressions involving sets. We have already seen some proofs of this type, in Theorem 1,
Corollary 2, and Theorem 5,
We can also use symbolic manipulation in parts of other proofs, as we did for example
in the proof of Theorem 10, where we used basic properties of function composition
and inverse functions to do chains of equalities that establish the desired claims about
inverses of compositions of functions.

3.7 PROOF BY CONSTRUCTiON

…also known as Proof by example.


example
A proof by construction describes a specific object precisely and shows that it
exists and that it satisfies the required conditions.
This can be used for some theorems that are existential statements, just asserting
the existence of a certain object that satisfies some specified properties.
We saw a proof of this type in Theorem 14.
Mistakes to avoid:

• attempting a proof by construction for a universal statement.

– If a theorem asserts that every object has some property, then it’s not enough
to just construct one object with the property.
96 PROOFS

• constructing an example that has the claimed property, thinking that a convincing
example is enough to prove that the property holds for other objects too.
– An example may be useful in illustrating a proof or explaining the key ideas
of a proof. But it is not, of itself, a proof.

3.8 PROOF BY CASES

…also known as Proof by exhaustion or (if lots of cases) “brute force”.


To do a proof by cases,

• identify a finite number of different cases which cover all possibilities,

• prove the theorem for each of these cases.


– These separate proofs of the cases may be thought of as “subproofs” of the
proof of the theorem.

We saw an example in Theorem 15. That was not typical of proofs by cases, since
the number of cases was so large (one case for each English dictionary word) and the
number of possibilities to be covered was finite. More typically, a theorem asserts that
every object from some infinite set has some property, and we divide the infinite set up
into a small finite number of cases, and do a separate proof for each of the cases. In
such situations, some of these cases must cover an infinite number of objects, and our
reasoning in each case must be general enough to apply to all the objects covered by
that case.
It’s ok if cases overlap (although it might indicate that the proof includes some
unnecessary duplication of effort and is inefficient). But they must be exhaustive in the
sense that every object considered by the theorem must belong to (at least) one of the
cases.

3.9 PROOF BY CONTRADiCTiON

…also known as “reductio ad absurdum”.


A proof by contradiction works as follows.

• Start by assuming the negation of the statement you want to prove.

• Reason from this until you deduce a contradiction.

• This contradiction shows that the initial assumption was wrong.

• Therefore, the original statement must be true.

Our first proof by contradiction is somewhat whimsical, but has the structure of
many proofs of this type and illustrates some important points about such proofs.
3.10 P R O O F B Y M AT H E M AT i C A L i N D U C T i O N 97

16 Every natural number is interesting.6


Theorem 16.

Proof. Assume that not every natural number is interesting. So, there exists at least
one uninteresting number. Therefore there exists a smallest uninteresting number. But
that number must be interesting, by virtue of having this special property of being the
smallest of its type. This is a contradiction, as this number is uninteresting. Therefore
our original assumption was wrong. Therefore every natural number is interesting.

Comments:
That “theorem” and “proof” is really just an informal argument, as the meaning of
“interesting” is imprecise and subjective.
But it illustrates the structure of proof by contradiction.
It also illustrates the point that, if you know an ordered set of objects is nonempty,
then you can choose an element of smallest size in the set.
Often, the smallest object in a set may have special properties that can help you go
further in the proof.

Can you always choose an object of largest size in a nonempty set?


Is every integer interesting?
Would the above proof still work, if applied to the set of all integers?

We now give a proof by contradiction of a fundamental theorem about prime num-


bers.

Theorem 17 (Euclid).
(Euclid) There are infinitely many prime numbers.

Proof. Suppose, by way of contradiction, that there are only finitely many primes.
Let 𝑛 be the number of primes.
Let 𝑝1 , 𝑝2 , … , 𝑝𝑛 be all the primes.
Define: 𝑞 ∶= 𝑝1 ⋅ 𝑝2 ⋅ ⋯ ⋅ 𝑝𝑛 + 1.
This is bigger than every prime 𝑝𝑖 . Therefore 𝑞 must be composite.
Therefore 𝑞 is a multiple of some prime.
But, for each prime 𝑝𝑖 , if you divide 𝑞 by 𝑝𝑖 , you get a remainder of 1.
So 𝑞 cannot be a multiple of 𝑝𝑖 .
So 𝑞 cannot be a multiple of any prime. This is a contradiction.
So our initial assumption was wrong.
So there are infinitely many primes.

3.10 P R O O F B Y M AT H E M AT i C A L i N D U C T i O N

Suppose you want to prove that a statement 𝑆(𝑛) holds for every natural number 𝑛.

6 See, e.g., Ch. 14 (Fallacies), in: Martin Gardner, The Scientific American Book of Mathematical Puzzles
and Diversions, Simon & Schuster, New York, 1959.
98 PROOFS

One powerful technique for proving theorems of this type is the Mathematical
Induction. It is widely used across computer science and is particularly useful in
Induction
proving theorems about the behaviour of algorithms and the discrete structures they
work with including those considered in this unit. You will also use it extensively in
later Computer Science units: FIT2004, FIT2014 and MTH3170/3175.
The Principle of Mathematical Induction is:

IF 𝑆(1) is true (inductive basis)

AND for all 𝑘 : if 𝑆(𝑘) is true


 then 𝑆(𝑘 + 1) is true (inductive step)
inductive hypothesis

THEN for all 𝑛 : 𝑆(𝑛) is true

The intuition behind Induction is that:


• by the inductive basis, we know 𝑆(1) is true;

• since 𝑆(1) is true and the inductive step tells us that 𝑆(1) ⇒ 𝑆(2),
we deduce that 𝑆(2) is true;

• since 𝑆(2) is true and the inductive step tells us that 𝑆(2) ⇒ 𝑆(3),
we deduce that 𝑆(3) is true;

• ⋮

• and so on, forever.


Mathematical Induction is a single logical principle that allows us to apply modus ponens
repeatedly, forever, all the way along the number line, without “waving our hands” and
saying “and so on”.
Our first use of Mathematical Induction is to prove an extension of De Morgan’s
Law for Sets, Theorem 1, to arbitrary numbers of sets. It is illustrated for three sets in
Figure 3.4.
Theorem 18.
18 For all 𝑛 ∈ ℕ,

𝐴1 ∪ 𝐴2 ∪ ⋯ ∪ 𝐴𝑛 = 𝐴1 ∩ 𝐴2 ∩ ⋯ ∩ 𝐴𝑛 . (3.2)

Proof. Let 𝑆(𝑛) be the statement that (3.2) holds for 𝑛. We must prove that, for all
𝑛 ∈ ℕ, the statement 𝑆(𝑛) is true.
We prove it by induction on the number 𝑛 of sets.
3.10 P R O O F B Y M AT H E M AT i C A L i N D U C T i O N 99

𝐵
𝐴 𝑈

Figure 3.4: 𝐴1 ∪ 𝐴2 ∪ 𝐴3 = 𝐴1 ∩ 𝐴2 ∩ 𝐴3 , shaded.

Inductive basis:
𝑆(1) is trivially true. In that case, each side of (3.2) is just 𝐴1 , so the equation holds.

Inductive step:
Let 𝑘 ≥ 1.
Suppose 𝑆(𝑘) is true:

𝐴1 ∪ 𝐴2 ∪ ⋯ ∪ 𝐴𝑘 = 𝐴1 ∩ 𝐴2 ∩ ⋯ ∩ 𝐴𝑘 .

This our Inductive Hypothesis. We will use it later.


We have:

𝐴1 ∪ 𝐴2 ∪ ⋯ ∪ 𝐴𝑘+1
= (𝐴1 ∪ 𝐴2 ∪ ⋯ ∪ 𝐴𝑘 ) ∪ 𝐴𝑘+1 (just grouping …) (3.3)
= 𝐴1 ∪ 𝐴2 ∪ ⋯ ∪ 𝐴𝑘 ∩ 𝐴𝑘+1 (by De Morgan’s Law for two sets, Theorem 1) (3.4)
= 𝐴1 ∩ 𝐴2 ∩ ⋯ ∩ 𝐴𝑘 ∩ 𝐴𝑘+1 (by Inductive Hypothesis)

Therefore 𝑆(𝑘 + 1) is true.

Conclusion:
So, by the Principle of Mathematical Induction, it’s true for any number of sets.

Because induction helps prove theorems about statements that hold for all positive
integers, it is a natural tool for proving statements about infinite sequences. In the next
section (§ 3.11), we will use it to prove some statements about infinite sequences of
100 PROOFS

numbers. Because of this, it is a useful tool for proving statements about the behaviour
of loops in programs; we will see this in the next section too.
A key thought process in the inductive step is to construct, from the “(𝑘 +1)-object”,
a smaller object to which you can apply the inductive hypothesis. In the previous proof,
our “(𝑘 + 1)-object” is
𝐴1 ∪ 𝐴2 ∪ ⋯ ∪ 𝐴𝑘+1 .
Our aim is to show that this satisfies (3.2) (with 𝑛 = 𝑘 + 1). In order to do this, we try
to construct, from it, a “𝑘-object”, in this case

𝐴1 ∪ 𝐴 2 ∪ ⋯ ∪ 𝐴 𝑘 .

We first group the first 𝑘 sets, in (3.3), as a step in this direction. Then applying De
Morgan’s Law for two sets, in (3.4), gives us what we are aiming for: 𝐴1 ∪ 𝐴2 ∪ ⋯ ∪ 𝐴𝑘 ,
our “𝑘-object”, shown in green in (3.4). Once we have the “𝑘-object”, we can apply the
Inductive Hypothesis to it.
This technique, of reducing an object to a simpler object (or objects) of the same
type, is known as recursion
recursion. It is one of the most fundamental problem solving strate-
gies in computer science; you will encounter it and use it again and again and again. It
is also provided for in most programming languages, when functions or methods can call
themselves. By using this skill in proofs by induction, you are practising a core skill of
your discipline. Moreover, proof by induction is the go-to proof technique for proving
claims about the behaviour of recursive functions in programs.

3.11 iNDUCTiON: MORE EXAMPLES

For another use of induction, consider the following equation, which gives a formula for
obtaining the sum of the first 𝑛 positive integers, so that you can compute this sum
without having to add up all those numbers.

𝑛(𝑛 + 1)
1 + 2 + 3 + ⋯ + (𝑛 − 1) + 𝑛 = . (3.5)
2
This is very useful in computer science and in fact throughout science, engineering and
in many other fields. We illustrate this with an application.
Suppose you are trying to construct the Monash timetabling network described in
§ 2.13. You have access to a function which tells you, for any pair of units, how many
students are doing both those units. You need to search all pairs of units, check how
many students are doing both units in a pair, and if that number is nonzero, add a link
between those units, to record the fact that their classes must be at different times.
A natural approach is to use two nested loops. The outer loop iterates over all units
in some order (e.g., lexicographic order). For each unit considered in the outer loop, the
inner loop iterates over all units that come later in the order. This saves duplication: if
3.11 i N D U C T i O N : M O R E E X A M P L E S 101

the inner loop also iterated over all units, then each pair of units would be considered
twice, once for each of the two possible ordered pairs based on those units.
If there are 𝑁 units altogether, then these nested loops have the structure

for each 𝑖 = 1, 2, … , 𝑁 :
for each 𝑗 = 𝑖 + 1, 𝑖 + 2, … , 𝑁 :
if at least one student is doing both the 𝑖-th unit and the 𝑗-th unit,
then make a link between these two units.

There are 𝑁 iterations of the outer loop, but these have varying numbers of inner loop
iterations. If we want to work out how long this computation takes, we need to know
how many inner loop iterations are done altogether.

• For the first iteration of the outer loop, 𝑖 = 1, and the inner loop starts at 𝑖 + 1 =
1 + 1 = 2 and considers 𝑗 = 2, 3, … , 𝑁 , so it does 𝑁 − 1 iterations.

• For the second iteration of the outer loop, 𝑖 = 2, and the inner loop considers
𝑗 = 3, 4, … , 𝑁 , so it does 𝑁 − 2 iterations.

• For the third iteration of the outer loop, the inner loop does 𝑁 − 3 iterations.

• …and so on …

• For the (𝑁 − 1)-th iteration of the outer loop, we have 𝑖 = 𝑁 − 1, so there is only
one iteration of the inner loop, with 𝑗 = 𝑖 + 1 = (𝑁 − 1) + 1 = 𝑁 .

• The 𝑁 -th iteraton of the outer loop actually has no inner loop iterations at all,
because the range 𝑖 + 1 ≤ 𝑗 ≤ 𝑁 is empty, because 𝑖 + 1 > 𝑁 .

So the total number of inner loop iterations is

(𝑁 − 1) + (𝑁 − 2) + ⋯ + 2 + 1.

This expression is of the same type as the left-hand side of (3.5). We’ve written it in a
different order, but that doesn’t matter at all. The only other difference is that we’re
adding up the first 𝑁 − 1 positive integers instead of the first 𝑛, but we can still use the
right-hand side of (3.5) to give us the answer, with an appropriate substitution.
Nested loop structures along these lines are very common in programming, and
equations like (3.5) enable us to give good estimates of how long they will take even
before we run the program.
So, how do we prove (3.5)? There are many beautiful proofs of this fact, and we
will see at least two proofs in this unit. The first we give now, using induction.

Theorem 19.
19 For all 𝑛:
𝑛(𝑛 + 1)
1+⋯+𝑛 = .
2
102 PROOFS

Proof. We prove it by induction on 𝑛.

Inductive basis:
When 𝑛 = 1, LHS = 1 and RHS = 1(1+1)/2 = 1.

Inductive step:
Let 𝑘 ≥ 1.
Suppose it’s true for 𝑛 = 𝑘:

1 + ⋯ + 𝑘 = 𝑘(𝑘 + 1)/2.

We will deduce that it’s true for 𝑛 = 𝑘 + 1.

1 + ⋯ + (𝑘 + 1) = (1 + ⋯ + 𝑘) + (𝑘 + 1) (preparing to use the Inductive Hypothesis)


= 𝑘(𝑘 + 1)/2 + (𝑘 + 1) (by the Inductive Hypothesis)
= (𝑘 + 1)𝑘/2 + (𝑘 + 1) (algebra …)
= (𝑘 + 1)(𝑘/2 + 1)
= (𝑘 + 1)(𝑘 + 2)/2
= (𝑘 + 1)((𝑘 + 1) + 1)/2

This is just the equation in the Theorem, for 𝑛 = 𝑘 + 1 instead of 𝑘.


So the inductive step is now complete.

Conclusion:
Therefore, by the Principle of Mathematical Induction, the equation holds for all 𝑛.

Notice, in the inductive step, how we construct, from the “(𝑘 +1)-object” 1+2+⋯+
(𝑘 + 1), a “𝑘-object” 1 + 2 + ⋯ + 𝑘. In this case, it’s just a matter of grouping, since the
𝑘-object just sits within the (𝑘 + 1)-object. As soon as we construct the 𝑘-object, we
can apply the Inductive Hypothesis to it.
Often, as in our proof of Theorem 18, we need to do some more work to construct
the 𝑘-object from the larger (𝑘 + 1)-object.

There is a slightly different way of writing inductive proofs that you are likely to
come across. We could make the inductive step go from 𝑛 = 𝑘 − 1 to 𝑛 = 𝑘 , instead of
from 𝑛 = 𝑘 to 𝑛 = 𝑘 + 1. Let us re-do the previous proof in this way.

Slightly different proof: We prove Theorem 19 by induction on 𝑛.

Inductive basis:
3.11 i N D U C T i O N : M O R E E X A M P L E S 103

When 𝑛 = 1, LHS = 1 and RHS = 1(1+1)/2 = 1.

Inductive step:
Let 𝑘 ≥ 2. [Note the change here!]
Suppose it’s true for 𝑛 = 𝑘 − 1, where 𝑘 ≥ 2:

1 + ⋯ + (𝑘 − 1) = (𝑘 − 1)𝑘/2.

We will deduce that it’s true for 𝑛 = 𝑘.

1 + ⋯ + 𝑘 = (1 + ⋯ + (𝑘 − 1)) + 𝑘 (preparing to use the inductive hypothesis)


= (𝑘 − 1)𝑘/2 + 𝑘 (by the Inductive Hypothesis)
= 𝑘(𝑘 − 1)/2 + 𝑘 (algebra …)
= 𝑘((𝑘 − 1)/2 + 1)
= 𝑘(𝑘 + 1)/2

This is just the equation in the Theorem, for 𝑛 = 𝑘 instead of 𝑘 − 1.


So the inductive step is now complete.

Conclusion:
Therefore, by the Principle of Mathematical Induction, the equation holds for all 𝑛.

We will mostly frame our inductive steps as going from 𝑛 = 𝑘 (where we assume
the statement is true) to 𝑛 = 𝑘 + 1 (where we deduce it’s true, using the Inductive Hy-
pothesis), as we did in our first proof of Theorem 19. But there is nothing wrong with
doing it from 𝑛 = 𝑘 − 1 to 𝑛 = 𝑘, as in the second proof above. If you read other books
and resources, you will find that some authors do it one way, while others do it the
other way. If you do it the second way (from 𝑛 = 𝑘 − 1 to 𝑛 = 𝑘), then you need to take
care that the inductive step is grounded in the inductive basis. In the first proof, the
inductive step starts with “Let 𝑘 ≥ 1”; in the second proof, the inductive step starts with
“Let 𝑘 ≥ 2” which ensures that 𝑘 − 1 ≥ 1 so that the inductive step is, indeed, grounded
in the inductive basis.

Even though we have now proved Theorem 19, you are entitled to wonder where the
expression, 𝑛(𝑛 +1)/2, came from in the first place. There are many ways to derive this
expression from scratch. We will discuss this in more detail later, in § 6.12.
When you first try to work out what the expression should be, you might explore
the first few cases and try to discern a pattern. This may lead you to conjecture that
the expression is 𝑛(𝑛 + 1)/2.
104 PROOFS

A pattern that works for small cases is not, in itself, a proof. But it might still help
you come up with a proof, since now you have something to aim for: a conjecture that
you can try to prove. And, in this case, mathematical induction can be used to prove
the conjecture.
We will discuss this explore-conjecture-prove methodology, for discovering and prov-
ing formulae for mathematical expressions, in § 6.6.
Mathematical induction is a proof technique, and as such can only be used once you
have a statement — in this case, an equation — to prove. It won’t help you discover
what the right equation should be; that’s where exploration and conjecture come in.
The upside is that you get a rigorous proof for a conjecture which you might otherwise
have remained unsure about or been unable to justify fully.
Having used induction to prove that the sum of the first 𝑛 positive integers is indeed
𝑛(𝑛 + 1)/2, it is natural to ask, what about sums of higher powers of positive integers?
To start with, what about the sum of the squares of the first 𝑛 positive integers, i.e.,
12 +22 +⋯+𝑛2 ? This gives the number of iterations in the following triply-nested loops:

for each 𝑖 = 1, 2, … , 𝑛:
for each 𝑗 = 𝑖, 𝑖 + 1, … , 𝑛:
for each 𝑘 = 𝑖, 𝑖 + 1, … , 𝑛:
[some action]

If you’d like a challenge, explore 12 +22 +⋯+𝑛2 by computing it for several small values
of 𝑛 and trying to understand its behaviour. How does it grow as 𝑛 increases? Then
try to conjecture a possible formula for it, and then try to prove it by induction.

3.12𝜔 iNDUCTiON: EXTENDED EXAMPLE

We now give another detailed example of proof by induction. This is quite involved
but it does give extended practice at reasoning about functions and relations as well as
induction.
Consider the following problem. Suppose there are 𝑚 job vacancies to be filled from
a pool of 𝑛 applicants. For each vacant job, the employer has constructed a shortlist of
applicants. Is it possible for each position to be filled by a different applicant, so that
no position remains unfilled?
If 𝑚 > 𝑛, then this is impossible: there are simply too many positions for the
available applicants, and some number of positions must remain unfilled. If 𝑚 ≤ 𝑛, then
it may or may not be possible, depending on the shortlists.7
We model this problem as follows. Let 𝐴 be the set of positions, with |𝐴| = 𝑚, and
let 𝐵 be the set of applicants, with |𝐵| = 𝑛. Let 𝑆 ⊆ 𝐴 × 𝐵 be the shortlist relation, a
binary relation consisting of all pairs (𝑎, 𝑏) such that the shortlist for position 𝑎 includes
7 If 𝑚 < 𝑛, then some applicants will not get a position, but that is permitted in this problem. One could
also imagine a scenario where it is the applicants who have shortlists and the employers who have no choice.
3.12𝜔 i N D U C T i O N : E X T E N D E D E X A M P L E 105

applicant 𝑏. The full shortlist for position 𝑎 is the set 𝑆(𝑎), which is a subset of 𝐵; here,
we use the notation introduced on p. 67 in § 2.13. If we have a set of positions 𝑋 ⊆ 𝐴,
then we can, if we wish, form a combined shortlist for the set of positions by taking the
union of all the individual shortlists:

𝑆(𝑋 ) = {𝑏 ∈ 𝐵 ∶ 𝑎𝑆𝑏} =  𝑆(𝑎).


𝑎∈𝑋

We want each position to be filled by exactly one applicant, so the set of pairs (𝑎, 𝑏)
that specify allocation of applicants to positions must be a function. Furthermore, we
want each position to be filled by an applicant who does not also fill any other position;
no applicant fills two positions. So we want, for each 𝑎 ∈ 𝐴, a unique 𝑏 ∈ 𝐵 such that
(𝑎, 𝑏) ∈ 𝑆. These pairs (𝑎, 𝑏) that specify a valid allocation of applicants to positions,
filling each position with a unique applicant, must therefore be an injection. So we are
asking: does the shortlist relation 𝑆 contain an injection?
Some situations can be dealt with easily.
If 𝑚 > 𝑛, then 𝐵 is too small to enable an injection to it from 𝐴, so 𝑆 contains no
injection. More generally, suppose there is a set 𝑋 ⊆ 𝐴 of jobs for which the union of
their shortlists 𝑆(𝑋 ) is smaller than 𝑋 , i.e., |𝑋 | > |𝑆(𝑋 )|. Then 𝑆 cannot contain an
injection, since an injection would ensure that the shortlists for 𝑋 together include at
least |𝑋 | applicants. We have now proved:
Theorem 20. 20 If a shortlist relation from 𝐴 to 𝐵 contains an injection with domain 𝐴
then, for all 𝑋 ⊆ 𝐴, we have |𝑋 | ≤ |𝑆(𝑋 )|.
(The condition here means that, for every set 𝑋 of jobs, the union of their shortlists
contains at least as many applicants as there are jobs in 𝑋 .) □
More surprisingly, the converse is true as well. We will prove this by induction.
Theorem 21.
21 Let 𝑆 ⊆ 𝐴×𝐵 be a binary relation. If, for all 𝑋 ⊆ 𝐴 we have |𝑋 | ≤ |𝑆(𝑋 )|,
then 𝑆 contains an injection with domain 𝐴.
Proof. We prove this by induction on |𝐴|.

Inductive basis:
When |𝐴| = 1, the set 𝐴 contains a single element 𝑎. Using 𝑋 = {𝑎}, if the shortlist
for this one job has at least one applicant, then there exists 𝑏 ∈ 𝐵 such that (𝑎, 𝑏) ∈ 𝑆.
So 𝑆 contains the simple function whose sole ordered pair is (𝑎, 𝑏). This function is an
injection with domain 𝐴. So the claim holds for |𝐴| = 1.

Inductive step:
Let 𝑚 ≥ 1.
Assume that the following holds for every binary relation 𝑇 ⊆ 𝐶×𝐷 in which |𝐶| ≤ 𝑚:
If, for each 𝑋 ⊆ 𝐶 we have |𝑋 | ≤ |𝑇(𝑋 )|, then 𝑇 contains an injection with
domain 𝐶.
106 PROOFS

This is our Inductive Hypothesis. Note that this is an implication. Like any implication,
it has a condition (namely, “for each 𝑋 ⊆ 𝐴 we have |𝑋 | ≤ |𝑇(𝑋 )|”) and a consequence
(namely, “𝑇 contains an injection with domain 𝐶”). In assuming this Inductive Hypoth-
esis, we are not assuming that the condition is true, and we are not assuming that the
consequence is true. We are merely assuming that, if the condition is true, then the
consequence is true.
Let 𝑆 ⊆ 𝐴 × 𝐵 where |𝐴| = 𝑚 + 1. Suppose that the following condition holds:

for each 𝑋 ⊆ 𝐴 we have |𝑋 | ≤ |𝑆(𝑋 )| (3.6)

We need to prove that 𝑆 contains an injection with domain 𝐴.


Case 1: for each nonempty 𝑋 ⊂ 𝐴 we have |𝑋 | ≤ |𝑆(𝑋 )| − 1. (Note here that 𝑋 is a
proper subset of 𝐴.)
Let (𝑎, 𝑏) be any pair in 𝑆. Then consider the restriction of 𝑆 to (𝐴 ∖{𝑎})×(𝐵 ∖{𝑏}),
i.e., the subrelation consisting of all pairs of 𝑆 that do not meet either 𝑎 OR 𝑏. Call
this smaller relation 𝑇, and put 𝐶 ∶= 𝐴 ∖ {𝑎} and 𝐷 ∶= 𝐵 ∖ {𝑏}. Let 𝑋 ⊆ 𝐶. Then
|𝑆(𝑋 )| ≥ |𝑋 |+1, by the assumption on which this case is based. But, in 𝑇, the applicant
𝑏 is excluded. Nevertheless, we can still say that, in 𝑇, the union 𝑇(𝑋 ) of 𝑋 ’s shortlists
has at least |𝑋 | applicants (obtained from the 𝑘+1 applicants by excluding 𝑏 if necessary):
|𝑇(𝑋 )| ≥ |𝑋 |. Now, 𝑇 has one fewer position than 𝑆, since |𝐶| = |𝐴|−1. So the Inductive
Hypothesis applies to it. Therefore 𝑇 contains an injection with domain 𝐶. Call that
injection 𝑔. This injection, together with the pair (𝑎, 𝑏), gives a new binary relation
𝑓 ∶= 𝑔 ∪ {(𝑎, 𝑏)} which is a subset of 𝑆:

𝑔 ∪ {(𝑎, 𝑏)} ⊆ 𝑆,

since 𝑔 ⊆ 𝑇 and 𝑇 ⊆ 𝑆 and (𝑎, 𝑏) ∈ 𝑆. Furthermore, this new relation 𝑓 is actually an


injection with domain 𝐴, since

• 𝑔 is an injection, and

• 𝑎 appears in no other pair of 𝑔 (so there is no “one-to-many” violation of the


definition of a function), and

• 𝑏 appears in no other pair of 𝑔 (so there is no “many-to-one” violation of the


surjection property), and

• the domain of 𝑓 is 𝐴 (since it is just the domain of 𝑔, namely 𝐴 ∖ {𝑎}, augmented


by 𝑎 which is mapped to 𝑏 by 𝑓).

So 𝑆 does indeed contain an injection with domain 𝐴 in this case.


Case 2: for some nonempty 𝑋 ⊂ 𝐴 we have |𝑋 | = |𝑆(𝑋 )|.
Let 𝑋 ⊂ 𝐴 be such a set of positions.
Consider the restriction of 𝑆 to pairs (𝑎, 𝑏) such that 𝑎 ∈ 𝑋 . Call this binary relation
𝑇. It is a subset of 𝑋 × 𝑆(𝑋 ). Since 𝑋 is a proper subset of 𝐴, we have |𝑋 | < |𝐴|, so
3.12𝜔 i N D U C T i O N : E X T E N D E D E X A M P L E 107

|𝑋 | ≤ 𝑚. So we can apply the Inductive Hypothesis, with 𝐶 = 𝑋 and |𝐶| ≤ 𝑚. Now, the
Inductive Hypothesis is an implication, with condition and consequence as discussed
above. We first establish that the condition holds in our current situation. Recall
that every 𝑌 ⊆ 𝐴 satisfies |𝑌| ≥ |𝑆(𝑌)|, by (3.6). So, certainly every 𝑌 ⊆ 𝐶 satisfies
|𝑌| ≥ |𝑆(𝑌)| (since 𝐶 ⊆ 𝐴). Also, 𝑇 is just the restriction of 𝑆 to 𝐶 ×𝐵, so 𝑇(𝑌) = 𝑆(𝑌)
(because 𝑌 ⊆ 𝐶). Therefore every 𝑌 ⊆ 𝐶 satisfies |𝑌| ≥ |𝑇(𝑌)|. Therefore 𝑇 satisfies
the condition of the Inductive Hypothesis. Therefore, by the Inductive Hypothesis, its
consequence also holds, namely that 𝑇 contains an injection with domain 𝑋 , which we
call 𝑔. It has domain 𝑋 = 𝐶, and one suitable codomain is 𝑇(𝐶) which is the same as
𝑆(𝐶). So we can write 𝑔 ∶ 𝐶 → 𝑆(𝐶).
This injection 𝑔 is a step towards constructing an injection in 𝑆. Its domain is 𝑋 , so
it fills the jobs in that set. But 𝑋 is a proper subset of 𝐴, so there are other jobs what
𝑔 does not fill.
So we need to find a way to fill jobs in 𝐴 ∖ 𝑋 . Again, we work towards applying the
Inductive Hypothesis. Observe that 𝐴 ∖ 𝑋 is a nonempty proper subset of 𝐴, since 𝑋
is too.
Let 𝑍 ⊆ 𝐴 ∖ 𝑋 . We have

|𝑋 | + |𝑍| = |𝑍 ∪ 𝑋 | (since the union is disjoint)


≤ |𝑆(𝑍 ∪ 𝑋 )| (by (3.6))
= |𝑆(𝑋 ) ∪ (𝑆(𝑍) ∖ 𝑆(𝑋 ))| (since 𝑆(𝑍 ∪ 𝑋 ) = 𝑆(𝑋 ) ∪ (𝑆(𝑍) ∖ 𝑆(𝑋 )))
= |𝑆(𝑋 )| + |𝑆(𝑍) ∖ 𝑆(𝑋 )| (since the union is disjoint)
= |𝑋 | + |𝑆(𝑍) ∖ 𝑆(𝑋 )| (since |𝑆(𝑋 )| = |𝑋 |, by the assumption on which this case is based).

Subtracting 𝑋 from the first and last expressions, we have

|𝑍| ≤ |𝑆(𝑍) ∖ 𝑆(𝑋 )|. (3.7)

Let 𝑈 be the binary relation obtained from 𝑋 by restricting to those pairs (𝑎, 𝑏) such
that 𝑎 ∉ 𝑋 and 𝑏 ∉ 𝑆(𝑋 ). In other words, 𝑈 is the restriction of 𝑆 to (𝐴∖𝑋 )×(𝐵∖𝑆(𝑋 )).
If 𝑍 ⊆ 𝐴 ∖ 𝑋 , then 𝑈(𝑍) consists of those pairs (𝑎, 𝑏) such that 𝑎 ∈ 𝑍, 𝑏 ∈ 𝑆(𝑍) and
𝑏 ∉ 𝑆(𝑋 ). So
𝑈(𝑍) = 𝑆(𝑍) ∖ 𝑆(𝑋 ).
This, together with (3.7), gives
|𝑍| ≤ |𝑈(𝑍)|.
We are now in a position to apply the Inductive Hypothesis. It can be used on 𝑈,
because |𝐴 ∖ 𝑋 | ≤ 𝑚 (which follows from the fact that 𝐴 ∖ 𝑋 is a proper subset of 𝐴).
Its condition also holds, because, as we have just seen, |𝑍| ≤ |𝑈(𝑍)| for all 𝑍 ⊆ 𝐴 ∖ 𝑋 .
Therefore the consequence follows, namely that 𝑈 contains an injection with domain
𝐴 ∖ 𝑋 . Call this injection ℎ. Its domain is 𝐴 ∖ 𝑋 and its codomain can be taken to be
𝐵 ∖ 𝑋 . So the domains of 𝑔 and ℎ are disjoint, and their codomains are disjoint too.
108 PROOFS

Since 𝑔 and ℎ are functions on disjoint domains and the union of their domains is 𝐴,
the union 𝑔 ∪ ℎ is also a function and its domain is 𝐴. Also, since 𝑔 and ℎ are both
injections and their codomains are disjoint, their union 𝑔 ∪ ℎ is also an injection.
We have constructed an injection 𝑔 ∪ ℎ with domain 𝐴. Since 𝑔 ⊆ 𝑇 and 𝑇 ⊆ 𝑆, and
since also ℎ ⊆ 𝑈 and 𝑈 ⊆ 𝑆, we have 𝑔 ∪ ℎ ⊆ 𝑆. So 𝑆 contains an injection with domain
𝐴.
This completes Case 2. Since Cases 1 and 2 cover all possibilities, the Inductive Step
is now complete.

Conclusion:
Therefore, by the Principle of Mathematical Induction, the theorem holds.

Putting Theorem 20 and Theorem 21 together we have


Corollary 2222. Let 𝑆 ⊆ 𝐴 ×𝐵 be a binary relation. The relation 𝑆 contains an injection
if and only if for all 𝑋 ⊆ 𝐴 we have |𝑋 | ≤ |𝑆(𝑋 )|. □
This gives us an important and useful characterisation of situations when there is an
injection. It does not give us an algorithm, though it does contain ideas that are useful
in the development of algorithms for this and related problems. What it does give us is
succinct, easily-verified evidence for both positive and negative cases. We explain this
now, for each case in turn.
• Positive case: if 𝑆 does contain an injection, then the evidence of this is the
injection itself. Once we have an injection 𝑓, we can easily check that it is an
injection. First, check that every job 𝑎 ∈ 𝐴 belongs to some pair (𝑎, 𝑏) ∈ 𝑓, so that
the domain of 𝑓 is indeed 𝐴. Then check, for each 𝑎 ∈ 𝐴, that it belongs to no
more than one pair in 𝑓, so that 𝑓 is indeed a function. Then check, for each 𝑏 ∈ 𝐵,
that it belongs to no more than one pair in 𝑓, so that 𝑓 is indeed an injection.
• Negative case: if 𝑆 does not contain an injection, then the evidence of this is some
set 𝑋 ⊆ 𝐴 such that |𝑋 | > |𝑆(𝑋 )|. Once we have such a subset 𝑋 , we can easily
check that |𝑋 | and |𝑆(𝑋 )| satisfy this inequality, as follows. For each 𝑎 ∈ 𝑋 in
turn, we find all pairs (𝑎, 𝑏) ∈ 𝑆, and for each such pair, we add 𝑏 to our set 𝑆(𝑋 )
if it is not already there. Once we have finished compiling 𝑆(𝑋 ), we check that its
size is < |𝑋 |.
The details of these checking methods would depend on the specific data structure
used to store the binary relation 𝑆. It would be a good programming exercise to write
programs to do each of these two checking methods and to study how long they take to
run, as a function of the sizes of 𝐴 and 𝐵 and the number of ordered pairs in 𝑆.

3.13 M AT H E M AT i C A L i N D U C T i O N A N D S TAT i S T i C A L i N D U C T i O N

The use of the term “induction” here is different to the use of “induction” in statistics,
which is the process of drawing general conclusions from data. Statistical induction is
3.14 P R O G R A M S A N D P R O O F S 109

typically used in situations where there is some randomness in the data, and conclusions
drawn can include some amount of error provided the conclusions drawn are significant
enough that the error is unimportant. It is not a process of logical deduction; this,
together with the presence of errors, means that it cannot be used as a step in a math-
ematical proof. By contrast, Mathematical Induction is a rigorous and very powerful
tool for logical reasoning in mathematical proofs.

3.14 PROGRAMS AND PROOFS

At the start of this chapter, we discussed the role of proofs in computer science, in
particular their importance in providing rigorous, rational support for general claims
about programs and the structures they work with.
The different types of proofs we have introduced can each be used to prove statements
about particular programming language constructs. If we want to prove something
about an if-statement, then we would normally use proof by cases. If we want to prove
something about a function that calls itself recursively, then we would normally use
proof by induction. Proving statements about loops in programs is also often done by
induction.
But there is a deeper reason for programmers to develop skills in writing proofs.
Programming is often thought of as a completely different activity to writing math-
ematical proofs. In fact, the two activities are surprisingly similar, and developing the
skill of writing mathematical proofs will make you a better programmer.
Let’s compare programs and proofs.
Each consists of a sequence of precise statements. Each of these statements obeys
the rules of some language: for programs, this is the programming language; for proofs,
this is the language of mathematics augmented by precise use of a natural language such
as English.
Each uses variables, which are names that can refer to any object of some type.
Variables can be combined into expressions using the operations that are available for
objects of that type. One difference between programs and proofs is that, in programs,
variables have memory set aside for them, but this does not happen in proofs.
Each statement must depend on previous statements in a precise way. In a program,
when an operation is applied to a variable, it uses the most recently computed value
of that variable; it does not “look ahead” and use some other value of the variable that
will be computed later. In a proof, each statement is a logical consequence of previous
statements (not of later statements), or it might be just an already-known fact or axiom.
In a program, we can use an if-statement to decide, according to some logical con-
dition, which set of statements to execute next. In a proof, we can use a finite number
of cases, which together cover all possible situations (§ 3.8). Each case pertains to
situations that satisfy some logical condition.
Programs can call other programs, which may have been written by you or by other
people. Proofs can use other theorems, which may have been written by you or by other
110 PROOFS

people. So building a proof, using theorems that have been proved previously, is like
writing a function in your program that calls other functions.
In a program, a block of statements can be executed repeatedly, for as long as some
logical condition is satisfied. Another way to do repetition is to use recursion, in which
a function or method calls itself. The analogue in proofs is mathematical induction.
A program crashes if it encounters a situation that its statements cannot deal with.
We say it has a bug, and this should be fixed. Ideally, the program never crashes. If a
proof cannot deal with some situation it is supposed to cover, then we might also call
this a bug, or a “hole” or a “gap”. This is more serious for proofs than for programs.
A program with a bug might still be of some use, provided it crashes rarely and can
correctly deal with the inputs that don’t cause it to crash (even though it should be
fixed). But a “proof” with a hole is actually not a proof at all. It might be able to be
repaired, to turn it into a proof, or it might not; in the latter case, the “theorem” being
“proved” might actually be false, in which case it has no proof, and is therefore not a
theorem after all.
Programming and proof-writing have differences as well as similarities, but the anal-
ogy is close. Writing mathematical proofs is like programming in a different “paradigm”.
A programming paradigm is, loosely speaking, an approach to programming and a
way of thinking about it. Important programming paradigms include procedural pro-
gramming (historically the first), object-oriented programming, logic programming and
functional programming. Learning to program in more than one paradigm (as well as
in more than one language) improves your thinking about programming and makes you
a better programmer in any language. Learning to write proofs — in the “programming
paradigm” of mathematical reasoning — yields similar benefits to a programmer.

3.15 EXERCiSES

1. Recall the Collatz function from p. 55 in § 2.9.

Prove by cases that, for all positive integers 𝑛,

Collatz(2) (𝑛) ≤ 3
2 𝑛 + 1.

2.
A set of strings 𝐿 is called hereditary if it has the following property:
For every nonempty string 𝑥 in 𝐿, there is a character in 𝑥 which can be
deleted from 𝑥 to give another string in 𝐿.
Prove by contradiction that every nonempty hereditary set of strings contains the
empty string.
3.15 E X E R C i S E S 111

3. Review the statements of all the theorems (including corollaries) in the previous
chapters.

(a) Which ones are implications?

(b) Which ones are equivalences?

(c) Which ones are existential statements?

(d) Which ones are universal statements?

4. Review the proofs of all the theorems in the previous chapters.

(a) Which ones use proof by construction?

(b) Which ones prove an implication by proving its contrapositive?

(c) Which ones use proof by cases?

5. Let 𝐸 be a finite set and let ℐ be a collection of subsets of 𝐸 that satisfies the
following three properties:

( i 1 ) ∅ ∈ ℐ.

( i 2 ) Whenever 𝑋 ⊆ 𝑌 and 𝑌 ∈ ℐ, we also have 𝑋 ∈ ℐ. This is called the hereditary property.


property

( i 3 ) Whenever 𝑋 , 𝑌 ∈ ℐ and |𝑋 | < |𝑌|, there exists 𝑦 ∈ 𝑌 ∖𝑋 such that 𝑋 ∪{𝑦} ∈ ℐ. In


other words, given two different-sized sets in ℐ, you can always enlarge the smaller
set, using some new member from the larger set, to get another set in ℐ. This is
called the independence augmentation property.property

The sets in ℐ are called independent


independent.
For example, if 𝐸 = {𝑎, 𝑏, 𝑐}, then each of the following collections satisfies all three
properties (I1)–(I3):

• ℐ = {∅, {𝑎}, {𝑏}, {𝑎, 𝑏}}

• ℐ = {∅, {𝑎}, {𝑏}, {𝑐}}

• ℐ = {∅, {𝑎}, {𝑏}, {𝑐}, {𝑎, 𝑐}, {𝑏, 𝑐}}

• ℐ = {∅, {𝑎}, {𝑏}, {𝑐}, {𝑎, 𝑏}, {𝑎, 𝑐}, {𝑏, 𝑐}}

• ℐ = {∅}

By way of counterexample, using the same 𝐸, each of the following collections fails to
satisfy all three properties (I1)–(I3):
112 PROOFS

• ℐ = {{𝑎}, {𝑏}, {𝑎, 𝑏}}. This fails (I1) and (I2).

• ℐ = {∅, {𝑎}, {𝑎, 𝑏}}. This fails (I2).

• ℐ = {∅, {𝑎}, {𝑏}, {𝑐}, {𝑎, 𝑏}}. This satisfies (I1) and (I2) but fails (I3).

• ℐ = ∅. This fails (I1)


(a) For each counterexample above that fails (I2) and/or (I3), give specific sets 𝑋 and
𝑌 for which the property fails.

(b) Prove that, for a social network (see § 1.7, § 2.14, § 2.13), the set of cliques satisfies
(I1) and (I2) but not, in general, (I3).

(c) Prove by contradiction that all maximal independent sets have the same size.

Suppose that each member 𝑒 ∈ 𝐸 has a nonnegative real weight 𝑤(𝑒). So 𝑤 ∶ 𝐸 → ℝ+ 0.


We use this weight function to also give weights to subsets of 𝐸, by defining the weight
of a set to be the sum of the weights of its elements: for all 𝑋 ⊆ 𝐸, define

𝑤(𝑋 ) ∶=  𝑤(𝑒).
𝑒∈𝑋

Suppose we seek an independent set of maximum weight.


Consider the following greedy algorithm,
algorithm so called because it always makes the
choice that gives biggest immediate gain:
Input: a finite set 𝐸, a function 𝑤 ∶ 𝐸 → ℝ+
0 , a collection ℐ of subsets of 𝐸.
Initialisation:
𝑋 ∶= ∅
𝑖 ∶= 1.
If there exists 𝑦 ∈ 𝐸 ∖ 𝑋 such that 𝑋 ∪ {𝑦} ∈ ℐ:
From all these 𝑦, choose the one with largest weight 𝑤(𝑦). (This is where greed comes in.) Call
Add 𝑒𝑖 to 𝑋 .
else [we cannot enlarge 𝑋 further]
Stop and output 𝑋 .

Greedy algorithms are important because they are simple, easy to program, and efficient.
In general, they do not give best-possible solutions. But it is important to recognise sit-
uations when they do give best-possible solutions, since then we attain a rare state of
algorithmic bliss: simplicity, speed, optimality.

(d) Construct a simple social network, and give a nonnegative real weight to each
person, such that the greedy algorithm, with ℐ being the set of all cliques, does not find
3.15 E X E R C i S E S 113

the maximum weight clique.

Now suppose ℐ satisfies (I1)–(I3). The greedy algorithm chooses elements of 𝐸 in a


specific order, which we represent as

𝑒1 , 𝑒2 , … , 𝑒𝑟 .

(e) Prove by contradiction that the weights of the chosen elements are decreasing, i.e.,

𝑤(𝑒1 ) ≥ 𝑤(𝑒2 ) ≥ ⋯ ≥ 𝑤(𝑒𝑟 ).

By “decreasing” we allow the possibility that some weights stay the same as we go along
the list; we don’t require that they be strictly decreasing (which is a stronger property
and would entail replacing each ≥ by >).

(f) Prove that the greedy algorithm finds a maximum weight independent set.

So, whenever (I1)–(I3) hold, we can confidently use the greedy algorithm, knowing
that it will give us the best possible solution.
Less obviously, it turns out that properties (I1)–(I3) give a precise characterisation
of situations where greedy algorithms give optimum solutions! To be specific, if ℐ is
a collection of sets satisfying (I1) and (I2), and the greedy algorithm always gives the
maximum weight member of ℐ (regardless of the weights of the elements of 𝐸), then ℐ
must satisfy (I3) too. It is an interesting but challenging exercise to try to prove this.
So, it is useful to be able to recognise structures where properties (I1)–(I3) hold,
because they are precisely the situations where you can’t do better than a greedy al-
gorithm. These structures are known as matroids and their role in optimisation goes
beyond their connection with greedy algorithms. They also have many applications in
the study of networks, vectors, matrices, and geometry.

6.
Prove the following statement, by mathematical induction:

For all 𝑘,

(*) the sum of the first 𝑘 odd numbers equals 𝑘2 .

(a) First, give a simple expression for the 𝑘-th odd number.

(b) Inductive basis: now prove the statement (∗) for 𝑘 = 1.

Inductive Step:
Let 𝑘 ≥ 1.
114 PROOFS

Assume the statement (∗) true for 𝑘. This is our Inductive Hypothesis.

(c) Express the sum of the first 𝑘 + 1 odd numbers …

1 + 3 + ⋯ + ((𝑘 + 1)-th odd number)

…in terms of the sum of the first 𝑘 odd numbers, plus something else.

(d) Use the inductive hypothesis to replace the sum of the first 𝑘 odd numbers by some-
thing else.

(e) Now simplify your expression. What do you notice?

(f) When drawing your final conclusion, don’t forget to briefly state that you are using
the Principle of Mathematical Induction!

7. Prove by induction on 𝑛 that, for all 𝑛 ∈ ℕ0 ,

1 + 2 + 22 + 23 + ⋯ + 2𝑛 = 2𝑛+1 − 1.

8. Prove by induction that, for all 𝑛:

𝑛(𝑛 + 1)(2𝑛 + 1)
12 + 2 2 + ⋯ + 𝑛 2 = .
6

9. Prove by induction that, for all 𝑛:

13 + 23 + ⋯ + 𝑛3 = (1 + 2 + ⋯ + 𝑛)2 .

10.
Consider the following algorithm:

for each 𝑖 = 1, 2, … , 𝑛:
for each 𝑗 = 1, 2, … , 𝑖:
for each 𝑘 = 1, 2, … , 𝑗:
[some action]
3.15 E X E R C i S E S 115

(a) For each pair 𝑖, 𝑗, how many times is the action inside the innermost loop performed?
Express this in terms of 𝑖 or 𝑗 or both of them.

(b) For each 𝑖, how many times is the action inside the innermost loop performed?
Write this as both a sum of some number of terms, and (using a theorem in this
chapter) a simple algebraic expression. Both these expression should be in terms
of 𝑖.

(c) Now write a sum of some number of terms, expressed in terms of 𝑛, for the total
number of times the action is performed.

(d) Now work out this expression for 𝑛 = 1, 2, 3, 4, 5, and for as many further values of
𝑛 as you like.

(e) Explore how these numbers behave.

(f) Conjecture a simple algebraic expression, in terms of 𝑛, for the total number of
times the action is performed.

(g) Prove, by induction on 𝑛, that your expression is correct.

11.
The 𝑛-th harmonic number 𝐻𝑛 is defined by
1 1 1
𝐻𝑛 = 1 + + + ⋯ + .
2 3 𝑛
These numbers have many applications in computer science. We will meet at least one
later in this unit.
In this exercise, we prove by induction that 𝐻𝑛 ≥ log𝑒 (𝑛 + 1). (It follows from this
that the harmonic numbers increase without bound, even though the differences between
them get vanishingly small so that 𝐻𝑛 grows more and more slowly as 𝑛 increases.)

(i) Inductive basis: prove that 𝐻1 ≥ log𝑒 (1 + 1).

(ii) Let 𝑛 ≥ 1. Assume that 𝐻𝑛 ≥ log𝑒 (𝑛 + 1); this is our inductive hypothesis. Now,
consider 𝐻𝑛+1 . Write it recursively, using 𝐻𝑛 . Then use the inductive hypothesis
to obtain 𝐻𝑛+1 ≥ … … (where you fill in the gap). Then use the fact that log𝑒 (1 +
𝑥) ≤ 𝑥, and an elementary property of logarithms, to show that 𝐻𝑛+1 ≥ log𝑒 (𝑛 +2).

(iii) In (i) you showed that 𝐻1 ≥ log𝑒 (1 + 1), and in (ii) you showed that if 𝐻𝑛 ≥
log𝑒 (𝑛 + 1) then 𝐻𝑛+1 ≥ log𝑒 ((𝑛 + 1) + 1). What can you now conclude, and why?

Advanced afterthoughts:
116 PROOFS

• The above inequality implies that 𝐻𝑛 ≥ log𝑒 𝑛, since log𝑒 (𝑛 + 1) ≥ log𝑒 𝑛. It is instructive
to try to prove directly, by induction, that 𝐻𝑛 ≥ log𝑒 𝑛. You will probably run into a snag.
This illustrates that for induction to succeed, you sometimes need to prove something that
is stronger than what you set out to prove.

• Would your proof work for logarithms to other bases, apart from 𝑒? Where in the proof
do you use the base 𝑒?

• It is known that 𝐻𝑛 ≤ (log𝑒 𝑛) + 1. Can you prove this?

12. (Challenge)
Let 𝐸 be a finite set, and let 𝒮 be a set of subsets of 𝐸. We say 𝒮 is linear under △
if, for every two sets 𝑋 , 𝑌 ∈ 𝒮, we have 𝑋 △𝑌 ∈ 𝒮. So, applying symmetric difference to
sets in 𝒮 always gives sets in 𝒮; the operation never takes you out of 𝒮.
The span of 𝒮, written span(𝒮), is the set of all symmetric differences of any number
of members of 𝒮. In other words,

span(𝒮) ∶= {𝑋1 △𝑋2 △ ⋯ △𝑋𝑘 ∶ 𝑋1 , 𝑋2 , … , 𝑋𝑘 ∈ 𝒳, 𝑘 ∈ ℕ}.

By convention, the symmetric difference of zero sets is the empty set. Therefore, the
span of any set 𝒮 always contains the empty set.
𝒮 is dependent under △ if, for some 𝑋1 , 𝑋2 , … , 𝑋𝑘 ∈ 𝒮 which are all distinct; i.e.,
𝑋𝑖 ≠ 𝑋𝑗 whenever 1 ≤ 𝑖 < 𝑗 ≤ 𝑘, we have

𝑋1 △𝑋2 △ ⋯ △𝑋𝑘 = ∅.

If 𝒮 is not dependent under △ then we say it is independent under △. △


𝒮 is a circuit under △ if it is dependent under △ and also minimal with respect
to that property.
𝒮 is a base under △ if it is independent under △ and also maximal with respect
to that property.
(Recall the precise usage of “minimal” and “maximal” from § 1.7.)

(a) Let 𝒮 be linear under △. Prove, by induction on 𝑘, that for all 𝑘 and all
𝑋1 , 𝑋2 , … , 𝑋𝑘 ∈ 𝒮,
𝑋1 △𝑋2 △ ⋯ △𝑋𝑘 ∈ 𝒮.
(b) Prove that, for every set 𝒮 of subsets of 𝐸, the set 𝒮 is linear under △ if and only
if it equals its own span, i.e.,
𝒮 = span(𝒮).
(c) Prove by induction on 𝑘, that for all 𝑘 and all 𝑋1 , 𝑋2 , … , 𝑋𝑘 ∈ 𝑆, either

𝑋1 △𝑋2 △ ⋯ △𝑋𝑘 = ∅,
3.15 E X E R C i S E S 117

or there exist 𝑌1 , 𝑌2 , … , 𝑌𝑛 ∈ 𝑆 with 𝑛 ≤ 𝑘, such that 𝑌𝑖 ≠ 𝑌𝑗 for all 𝑖 ≠ 𝑗 and

𝑋1 △𝑋2 △ ⋯ △𝑋𝑘 = 𝑌1 △𝑌2 △ ⋯ △𝑌𝑛 .

(d) Prove that 𝒮 is dependent under △ if and only if there exists 𝑋 ∈ 𝒮 such that

𝑋 ∈ span(𝒮 ∖ {𝑋 }).

(e) Prove that if 𝒮 is a minimal dependent set under △, then for every 𝑋 ∈ 𝒮 we have

𝑋 ∈ span(𝒮 ∖ {𝑋 }).

(f) Prove that every minimal spanning set under △ is independent under △.

(g) Now prove that every minimal spanning set under △ is a maximal independent
set under △.

(h) Prove that every maximal independent set under △ is also a spanning set under
△.

(i) Now prove that every maximal independent set under △ is also a minimal spanning
set under △.

(j) Prove that 𝒮 is a minimal spanning set under △ if and only if it is a maximal
independent set under △.

(k) Prove that, if 𝒮 is independent under △ then

|span(𝒮)| = 2|𝒮| .

(l) Prove the independence augmentation property: if 𝒮 and 𝒯 are independent


under △ and |𝒯 | = |𝒮|+1, then there exits 𝑋 ∈ 𝒯 ∖𝒮 such that 𝒮 ∪{𝑋 } is independent
under △.

(m) Prove that all bases under △ have the same size.

(n) Do all minimal dependent sets under △ have the same size? Give a proof of your
claim.

(o) Prove that, for every base ℬ of 𝒮 under △, and every 𝑋 ∈ 𝒮, there is a unique way to
write 𝑋 as a symmetric difference of members of ℬ, using each element of ℬ at most once.
118 PROOFS

(p) For each previous part of this question, (a)–(o), state what type of proof you have
used.

(q) Design a simple algorithm for the following problem.

INPUT: a collection 𝒮 of subsets of 𝐸, together with a nonnegative real weight


for each member of 𝒮.
OUTPUT: a subset 𝒳 of 𝒮 that is independent under △ such that the sum
of the weights of all members of 𝒳 is maximum.

Does this algorithm always finds the best possible solution? Justify your answer, using
a previous exercise.

13. Identify the errors in the following “theorem” and “proof”, which are both
incorrect.
The steps in the “proof” are numbered for convenience.

“Theorem.” For all 𝑛 ∈ ℕ, a circular disc can be cut into 2𝑛 pieces by 𝑛 straight lines.

“Proof.”

1. Consider 𝑛 = 0. The disc is not cut at all, so it is still in one piece. So the number
of pieces is 20 = 1. So the claim is true for 𝑛 = 0.

2. Consider 𝑛 = 1. A single line across a disc cuts it into two pieces, so the number
of pieces is 21 . So the claim is true for 𝑛 = 1.

3. Now consider 𝑛 = 2. If the circle is cut by one line into two pieces, then we can cut
it again by a line that crosses the first line. Such a line must cut right across both
of the pieces created by the first cut. Therefore each of those pieces is divided in
two, so the total number of pieces is 2 × 2 = 22 , so the claim is true for 𝑛 = 2.

4. It is clear from the cases considered so far that, once the circle is cut by some
number of lines, then another line can be used to divide every piece into two
pieces, thereby doubling the number of pieces.

5. So, if we use 𝑛 lines, then the repeated doubling gives 2𝑛 pieces. Therefore the
claim is true for all 𝑛.

“□”

14. Identify the errors in the following “theorem” and “proof”, which are both incorrect.
3.15 E X E R C i S E S 119

“Theorem.” Every rational number is an integer.

“Proof.”
1. We need to show that the sets ℤ and ℚ are equal.
2. To prove that two sets are equal, we can prove the appropriate subset and superset
relations between the two sets.
3. We first prove the subset relation between these two sets, ℤ ⊆ ℚ.
4. Let 𝑥 ∈ ℤ.
𝑥
5. Since 𝑥 is an integer, we have 𝑥 = .
1
6. So 𝑥 is a quotient of integers.
7. Therefore 𝑥 is rational, i.e., 𝑥 ∈ ℚ.
8. So we have proved that ℤ ⊆ ℚ.
9. Now we need to prove the superset relation, which is the reverse of subset.
10. So we will prove that ℚ ⊇ ℤ.
11. Let 𝑥 ∉ ℚ.
𝑝
12. Then 𝑥 cannot be written in the form where 𝑝 and 𝑞 are integers.
𝑞
13. So it certainly cannot be written in this form with 𝑞 = 1.
14. Therefore 𝑥 is not an integer, i.e., 𝑥 ∉ ℤ.
15. So we have proved that ℚ ⊇ ℤ.
16. It follows from our subset and superset relations between the two sets that ℤ = ℚ.
“□”

“God made the integers; all else is the work of man.”


Leopold Kronecker (1823–1891)

15. Let 𝐺 be the cryptosystem obtained from Caesar slide by restricting the keyspace
to the first half of the alphabet. (See Exercise 2.15 for definitions relating to the Caesar
slide cryptosystem.) So the keyspace 𝐾𝐺 is

{a, b, c, d, e, f, g, h, i, j, k, l, m}.

Prove that 𝐺 is not idempotent.


4
PROPOSITIONAL LOGIC

Computers are logical machines. This is not just a description of their behaviour, but
also of their nature. Their hardware consists of huge numbers of tiny components that
each do simple operations based on logic, and that do them very quickly. In their
memory, information is stored as sequences of basic pieces which can each be True or
False. The instructions used to get computers to do things — their programs — are
expressed in a language that specifies what to do using formal, precise rules that use
logic. Even before a program is written, a formal specification of the task to be done
will make essential use of logic.
But logic is not just for machines. It is fundamental to clear thinking and pre-
cise communication. By learning about logic, you will improve your own thinking and
communication, and therefore your prospects in life.
Logic plays a fundamental role in rigorous reasoning, especially in proofs. It will
pervade everything we do in this unit and is fundamental to your future studies, not
only in computer science but in mathematics, other sciences, engineering, economics,
philosophy, and indeed any field of human activity where precise reasoning is important.
We will therefore now study the most fundamental types of logic. This week we
study Propositional Logic, and next week we study Predicate Logic.

4.1𝛼 T R U T H VA L U E S

values False and True.1 These are


Logic is built on just two values, called truth values:
abbreviated as F and T respectively.
Physically, truth values may correspond to the absence or presence of electrical
current in a wire, or the absence or presence of magnetisation, or a switch being off or
on, and so on. We use the abstraction of truth values in order to be able to discuss the
logical workings of computers in a way that is independent of the particular technology
used.

1 There are forms of logic that use other values too, including three-valued logics which use the extra truth
value Unknown. But we will focus entirely on classical two-valued logic, firstly because it is what computers
are based on, and secondly because it is embedded within most other logics anyway, so that understanding
two-valued logic is necessary in order to understand those other logics.

121
122 PROPOSiTiONAL LOGiC

Another common abstraction of the notion of two possible states for something is
the bit
bit, which can be 0 or 1. This view is intimately related to truth values: in many
situations, we can represent False by 0 and True by 1. It is worth exploring how the
mathematical behaviour of 0 and 1 under basic arithmetic operations compares with
that of False and True under logical operations. We will develop this link further in
§ 7.4.

4.2𝛼 B O O L E A N VA R i A B L E S

You are familiar with variables for numerical quantities, whose value can be any number
from some set. We have already been using variables for other objects too. So it is
natural to use variables whose values are truth values.
A Boolean variable,
variable also called a propositional variable,
variable is a variable whose
value can be True or False.
Often, Boolean variables are used as names for statements which are either True or
False.

4.3𝛼 PROPOSiTiONS

A proposition is a statement which is either true or false.

Examples
1+1 = 2 — a proposition which is true.
Xià Péisù designed the first computer in China — a proposition which is true.
The earth is flat. — a proposition which is false.
It will rain tomorrow. — a proposition.

The left domino fell. — a proposition.


See § 3.2.

’Twas brillig, and the slithy toves


did gyre and gimble in the wabe. — not a proposition.
From: Lewis Carroll, Through the Looking Glass, and What
Alice Found There, Macmillan, London, 1871.

Come and work for us! — not a proposition.


This statement is false. — not a proposition.

For brevity, a proposition may be given a name, which is a Boolean variable. For
example, let 𝑋 be the proposition 1 + 1 = 2. Then the truth value of 𝑋 is True.
4.4𝛼 L O G i C A L O P E R AT i O N S 123

4.4𝛼 L O G i C A L O P E R AT i O N S

Propositions may be combined to make other propositions using logical operations. Here
are the basic logical operations we will use. We will define each of them shortly.

Not ¬ ( ∼, , − )
And ∧ (&)
Or ∨
Implies ⇒ (→)
Equivalence ⇔ (↔)
Logical operations are also called connectives
connectives. Inside computers, they are imple-
mented in electronic circuits called logic gates.
gates

4.5 N E G AT i O N

Logical negation is a unary operation which changes True to False and False to True. It is
denoted by ¬, which is placed before its argument. So, ¬True = False and ¬False = True.
If a proposition 𝑃 is True then ¬𝑃 is False, and if 𝑃 is False then ¬𝑃 is True.

Example:
𝑃: You have prepared for next week’s tutorial.
¬𝑃: You have not prepared for next week’s tutorial.

Other notation for ¬𝑃 that you may come across: ∼𝑃, 𝑃, −𝑃, !𝑃

We can specify how ¬ works using a truth table,


table which lists all possible values of
the arguments and, for each, states what the resulting value is. In this case, we only
have one argument (let’s call it 𝑃 again), which has two possible values, so the table
just has two rows (excluding the headings). The left column gives the possible values of
the argument 𝑃 and the right column gives the corresponding values of ¬𝑃.

𝑃 ¬𝑃
F T
T F
Logical negation is reminiscent of set complementation (§ 1.11). In each case, you
get something completely opposite, and doing it twice gets you back where you started.
For logical negation, we have
¬¬𝑃 = 𝑃.
If a proposition asserts membership of a set, then its logical negation asserts membership
of the complement. For example, consider the proposition √2 ∈ ℚ (which is False) and
124 PROPOSiTiONAL LOGiC

suppose our universal set is ℝ. Its logical negation ¬(√2 ∈ ℚ) may be written √2 ∉ ℚ
or √2 ∈ ℝ ∖ ℚ (which is True).

4.6 CONjUNCTiON

Conjunction is a binary logical operation, i.e., it has two Boolean arguments. The result
is True if and only if both its arguments are True. So, if at least one of its arguments is
False, then the result is False.
We denote conjunction by ∧ and read it as “and”. So the conjunction of 𝑃 and 𝑄
is written 𝑃 ∧ 𝑄 and read as “𝑃 and 𝑄”. Note that we are using the English word “and”
in a strict, precise, logical sense here, which is narrower than the full range of meanings
this word can have in English.

Example:

𝑃 Radhanath was a computer.


𝑄 Radhanath was a person.

𝑃 ∧𝑄 Radhanath was a computer and a person. Radhanath Sikdar (1813–1870)


http://news.bbc.co.uk/2/hi/south_
asia/3193576.stm

We can define conjunction symbolically using its truth table. It has two arguments
(let’s call them 𝑃 and 𝑄 again), each of which can have two possible values (True and
False), so there are 22 = 4 combinations of arguments, hence four rows of the truth table.
In each row, the corresponding value of 𝑃 ∧ 𝑄 is given in the last column.

𝑃 𝑄 𝑃 ∧𝑄
F F F
F T F
T F F
T T T
Conjunction is closely related to set intersection. If 𝑥 is an object and 𝐴 and 𝐵
are sets, then the conjunction of the propositions 𝑥 ∈ 𝐴 and 𝑥 ∈ 𝐵 is the proposition
𝑥 ∈ 𝐴 ∩ 𝐵:
(𝑥 ∈ 𝐴) ∧ (𝑥 ∈ 𝐵) = (𝑥 ∈ 𝐴 ∩ 𝐵).
Restating (1.12) using conjunction, we have

𝐴 ∩ 𝐵 = {𝑥 ∶ 𝑥 ∈ 𝐴 ∧ 𝑥 ∈ 𝐵}.
4.7 D i S j U N C T i O N 125

4.7 DiSjUNCTiON

Disjunction is another binary logical operation. Its result is True if and only if at least
one of its arguments is True. So, if both of its arguments are False, then the result is
False.
We denote disjunction by ∨ and read it as “or”. The disjunction of 𝑃 and 𝑄 is written
𝑃 ∨ 𝑄 and read as “𝑃 or 𝑄”. Again, our use of English words is unusually specific: “or”
is being used here in a strict, precise, logical sense, much narrower than its full range
of English meanings. An analogous situation arose with “and” previously. Also, we are
using the word “or” inclusively, so that a disjunction is True whenever any one or more
of its arguments are True. For this reason, disjunction is sometimes called inclusive-OR.
(This contrasts with the exclusive-or of two propositions, which is True precisely when
exactly one of its two arguments is True; we discuss it in § 4.11.)

Example:
𝑃 I will study FIT3155 Advanced Data Structures & Algorithms.
𝑄 I will study MTH3170 Network Mathematics.

𝑃 ∨𝑄 I’ll study FIT3155 or I’ll study MTH3170.


I’ll study at least one of FIT3155 and MTH3170.

Here is the truth table definition of disjunction.

𝑃 𝑄 𝑃 ∨𝑄
F F F
F T T
T F T
T T T
Disjunction is closely related to set union. If 𝑥 is an object and 𝐴 and 𝐵 are sets,
then the disjunction of the propositions 𝑥 ∈ 𝐴 and 𝑥 ∈ 𝐵 is the proposition 𝑥 ∈ 𝐴 ∪ 𝐵:

(𝑥 ∈ 𝐴) ∨ (𝑥 ∈ 𝐵) = (𝑥 ∈ 𝐴 ∪ 𝐵).

Restating (1.11) using disjunction, we have

𝐴 ∪ 𝐵 = {𝑥 ∶ 𝑥 ∈ 𝐴 ∨ 𝑥 ∈ 𝐵}.

At this point, it is worth reflecting on other set operations we discussed in Chapter 1


and thinking about what logical operations they might correspond to. Feel free to invent
your own logical operations if that helps.
126 PROPOSiTiONAL LOGiC

4.8 D E M O R G A N ’ S L AW S

Conjunction and disjunction seem somewhat oppo-


site in character, but there is a close relationship be-
tween them, a kind of logical duality. This is cap-
tured by De Morgan’s Laws.Laws

¬(𝑃 ∨ 𝑄) = ¬𝑃 ∧ ¬𝑄

Augustus De Morgan (1806–1871)


¬(𝑃 ∧ 𝑄) = ¬𝑃 ∨ ¬𝑄 https://mathshistory.st-andrews.ac.
uk/Biographies/De_Morgan/

These laws can be proved using truth tables. Consider the table below, which proves
the first of De Morgan’s Laws, ¬(𝑃 ∨ 𝑄) = ¬𝑃 ∧ ¬𝑄. We start out with the usual two
columns giving all combinations of truth values of our variables 𝑃 and 𝑄. The overall
approach is to gradually work along to the right, adding new columns that give some part
of one of the expressions we are interested in, always using the columns we’ve already
constructed in order to construct new columns. We’ll first work towards constructing a
column giving truth values for the left-hand side of the equation, ¬(𝑃 ∨ 𝑄). As a step
towards this, we make a column for 𝑃 ∨ 𝑄: this becomes our third column. In fact, our
first three columns are just the truth table for the disjunction 𝑃 ∨𝑄, which we have seen
before. Then we negate each entry in the third column to give the entries of the fourth
column, which gives the truth table for ¬(𝑃 ∨ 𝑄). So we’ve done the left-hand side of
the equation. Then we start on the right-hand side of the equation, which is ¬𝑃 ∧ ¬𝑄.
For this, we’ll need ¬𝑃 and ¬𝑄, which are obtained by negating the columns for 𝑃 and
𝑄 respectively. This gives the fifth and sixth columns. Finally we form ¬𝑃 ∧ ¬𝑄 in the
seventh column by just taking the conjunction of the corresponding entries in the fifth
and sixth columns.
We now have columns giving the truth tables of both sides of the first of De Morgan’s
Laws: these are the fourth and seventh columns below, shown in green. These columns
are identical! This shows that the two expressions ¬(𝑃 ∨ 𝑄) and ¬𝑃 ∧ ¬𝑄 are logically
equivalent, i.e., their truth values are the same for all possible assignments of truth
values to their arguments 𝑃 and 𝑄. In other words, as Boolean expressions, they are
equal. So ¬(𝑃 ∨ 𝑄) = ¬𝑃 ∧ ¬𝑄. This proves the first of De Morgan’s Laws.

𝑃 𝑄 𝑃 ∨𝑄 ¬(𝑃 ∨ 𝑄) ¬𝑃 ¬𝑄 ¬𝑃 ∧ ¬𝑄
F F F T T T T
F T T F T F F
T F T F F T F
T T T F F F F
4.9 i M P L i C AT i O N 127

We could prove the second of De Morgan’s Laws by the same method. But, now
that we know the first of De Morgan’s Laws, it is natural to ask: can we use it to prove
the second law more easily, so that we avoid doing the same amount of work all over
again? In other words, assuming that ¬(𝑃 ∨ 𝑄) = ¬𝑃 ∧ ¬𝑄 holds for all 𝑃 and 𝑄, can
you prove that ¬(𝑃 ∧𝑄) = ¬𝑃 ∨¬𝑄 holds for all 𝑃 and 𝑄 without starting from scratch
again and going through the same kind of detailed truth table argument? (Exercise 4.3)
De Morgan’s Laws can also be proved by reasoning in a way that covers all possible
combinations of truth values, rather than working laboriously through each combination
of truth values separately (which is what the truth table proof does). We give such a
proof now. For good measure, we do it for a more general version of the Law which
caters for arbitrarily long conjunctions and disjunctions. This would not be possible
just using the truth table approach, since we’d need a separate truth table for each 𝑛,
which means we’d need infinitely many truth tables.

Theorem 23.
23 For all 𝑛:

¬(𝑃1 ∨ ⋯ ∨ 𝑃𝑛 ) = ¬𝑃1 ∧ ⋯ ∧ ¬𝑃𝑛

Proof.

Left-Hand Side is True if and only if 𝑃1 ∨ ⋯ ∨ 𝑃𝑛 is False


if and only if 𝑃1 , … , 𝑃𝑛 are all False
if and only if ¬𝑃1 , … , ¬𝑃𝑛 are all True
if and only if Right-Hand Side is True.

Again, we ask: having proved the first of De Morgan’s Laws (in this more general
form), can we use it to prove the second law more easily? How would you prove the
second law?
There is a clear correspondence between De Morgan’s Laws for Sets, in Theorem 1
and Corollary 2, and De Morgan’s Laws for Logic.

4.9 i M P L i C AT i O N

Our next logical operation is implication


implication, also called conditional
conditional, from § 3.2. Recall
that, if 𝑃 and 𝑄 are its arguments, it is written 𝑃 ⇒ 𝑄 and read “𝑃 implies 𝑄” or “if 𝑃
then 𝑄”. It means that if 𝑃 is True then 𝑄 must also be True. The only way its result
can be False is if 𝑃 is True and 𝑄 is False.

Example:
128 PROPOSiTiONAL LOGiC

𝑃 Stars are visible.


𝑄 The sun has set.

𝑃 ⇒𝑄 If stars are visible then the sun has set.


Stars being visible implies the sun has set.
Stars are visible only if the sun has set.
Stars are visible is sufficient for the sun to have set.

Here is its truth table.

𝑃 𝑄 𝑃 ⇒𝑄
F F T
F T T
T F F
T T T
As we discussed in § 3.2, implication is closely related to the subset relation.

4.10 E Q U i VA L E N C E

Our next binary logical operation is equivalence


equivalence, also called biimplication or biconditional
biconditional.
We discussed this previously towards the end of § 1.6 on p. 9, and in the last paragraph
of § 3.2 on p. 92.
For arguments 𝑃 and 𝑄, equivalence is written 𝑃 ⇔ 𝑄 and read “𝑃 if and only if
𝑄” or “𝑃 is equivalent to 𝑄”. It is true precisely when 𝑃 and 𝑄 are either both True or
both False. In other words, as Boolean variables, their values are always equal.
You can view 𝑃 ⇔ 𝑄 as the conjunction of 𝑃 ⇒ 𝑄 and 𝑄 ⇒ 𝑃:

𝑃 ⇔ 𝑄 = (𝑃 ⇒ 𝑄) ∧ (𝑄 ⇒ 𝑃).

𝑃 ⇔ 𝑄 can be written the other way round, as 𝑄 ⇔ 𝑃. They have the same meaning.

Example:

𝑐 𝑐 𝑐
𝑃 The triangle is right-angled. 𝑏 𝑏 𝑏
𝑄 The side lengths satisfy
𝑎2 + 𝑏2 = 𝑐 2 . 𝑎 𝑎 𝑎

𝑎2 + 𝑏2 < 𝑐 2 𝑎2 + 𝑏2 = 𝑐 2 𝑎2 + 𝑏2 > 𝑐 2
4.11 E X C L U S i V E - O R 129

𝑃 ⇔𝑄 The triangle is right-angled if and only if 𝑎 2 + 𝑏 2 = 𝑐 2 .

The triangle being right-angled is a necessary and sufficient condition


for 𝑎 2 + 𝑏 2 = 𝑐 2 .

𝑎 2 + 𝑏 2 = 𝑐 2 is a necessary and sufficient condition


for the triangle being right-angled.

Here is its truth table.

𝑃 𝑄 𝑃 ⇔𝑄
F F T
F T F
T F F
T T T
Equivalence is closely related to set equality. If 𝑥 is an object and 𝐴 and 𝐵 are sets,
then the propositions 𝑥 ∈ 𝐴 and 𝑥 ∈ 𝐵 are equivalent if 𝑥 belongs to both sets or neither
of them. If this holds for all 𝑥 then the two sets are identical. Conversely, if two sets
are identical then the propositions 𝑥 ∈ 𝐴 and 𝑥 ∈ 𝐵 are always equivalent.

4.11 EXCLUSiVE-OR

Our final logical operation is exclusive-or


exclusive-or. For arguments 𝑃 and 𝑄, exclusive-or is
written 𝑃 ⊕ 𝑄 (or sometimes 𝑃 XOR 𝑄), and read “𝑃 exclusive-or 𝑄” or “𝑃 ex-or 𝑄”. It
is true precisely when either 𝑃 is True or 𝑄 is True, but not both.
Exclusive-or differs from inclusive-or (disjunction, § 4.7) in its value when both 𝑃
and 𝑄 are True. In that case, exclusive-or is False but inclusive-or is True.
Here is its truth table.

𝑃 𝑄 𝑃 ⊕𝑄
F F F
F T T
T F T
T T F
It is evident from their truth tables that exclusive-or is actually the logical negation
of equivalence:
𝑃 ⊕ 𝑄 = ¬(𝑃 ⇔ 𝑄).
The exclusive-or of two propositions is True if and only if exactly one of them is True.
What happens if we combine three propositions with exclusive-or, as in 𝑃1 ⊕ 𝑃2 ⊕ 𝑃3 ?
The exclusive-or of the first two, 𝑃1 ⊕ 𝑃2 , is True precisely when exactly one of 𝑃1 and
𝑃2 is True, and in that case 𝑃1 ⊕ 𝑃2 ⊕ 𝑃3 can only be True if 𝑃3 is False, so we still
130 PROPOSiTiONAL LOGiC

have exactly one of the propositions 𝑃1 , 𝑃2 , 𝑃3 being True. But there is another way for
𝑃1 ⊕ 𝑃2 ⊕ 𝑃3 to be True, namely if 𝑃1 ⊕ 𝑃2 is False and 𝑃3 is True. For 𝑃1 ⊕ 𝑃2 to be
False, neither of 𝑃1 and 𝑃2 is True, or they both are. So we might again have exactly
one of 𝑃1 , 𝑃2 , 𝑃3 being True, or we might in fact have all three of 𝑃1 , 𝑃2 , 𝑃3 being True.
We have covered all the possible things that can happen, if 𝑃1 ⊕𝑃2 ⊕𝑃3 is to be True,
and we have found that the number of 𝑃1 , 𝑃2 , 𝑃3 that are True must be 1 or 3. So it
needn’t be exactly one; fortunately, we have avoided a common mistake there. In fact,
what we can say is that the number of 𝑃1 , 𝑃2 , 𝑃3 that are True must be odd.
This generalises to arbitrary numbers of propositions. You should play with some
examples (say, with four propositions 𝑃1 , 𝑃2 , 𝑃3 , 𝑃4 ), satisfy yourself that this does indeed
happen in general, and try to understand why. Then you will get more out of reading
the formal proof which we now present.

Theorem 24.
24 For all 𝑛 ∈ ℕ, and for any propositions 𝑃1 , 𝑃2 , … , 𝑃𝑛 ,

True, if an odd number of 𝑃1 , 𝑃2 , … , 𝑃𝑛 is True;


𝑃1 ⊕ 𝑃 2 ⊕ ⋯ ⊕ 𝑃 𝑛 = 
False, if an even number of 𝑃1 , 𝑃2 , … , 𝑃𝑛 is True.

Proof. We prove this by induction on 𝑛.

Inductive Basis:
When 𝑛 = 1, the expression on the left of (24) is just 𝑃1 , and this is True if if just
one of 𝑃1 is True, and False otherwise! So (24) holds in this case.

Inductive Step:
Let 𝑘 ≥ 1. Assume (24) holds for 𝑛 = 𝑘; this is the Inductive Hypothesis.
4.11 E X C L U S i V E - O R 131

Now consider 𝑃1 ⊕ 𝑃2 ⊕ ⋯ ⊕ 𝑃𝑘+1 .

𝑃1 ⊕ 𝑃2 ⊕ ⋯ ⊕ 𝑃𝑘+1
= ⒧𝑃1 ⊕ 𝑃2 ⊕ ⋯ ⊕ 𝑃𝑘 ⒭ ⊕ 𝑃𝑘+1
(identifying a smaller expression of the same type, within this expression)
True, if an odd number of 𝑃1 , 𝑃2 , … , 𝑃𝑛 is True;
= ⒧ ⒭ ⊕ 𝑃𝑘+1
False, if an even number of 𝑃1 , 𝑃2 , … , 𝑃𝑛 is True.
(by the Inductive Hypothesis)
True ⊕ 𝑃𝑘+1 , if an odd number of 𝑃1 , 𝑃2 , … , 𝑃𝑛 is True;
= 
False ⊕ 𝑃𝑘+1 , if an even number of 𝑃1 , 𝑃2 , … , 𝑃𝑛 is True.
⎧ True ⊕ False, if an odd number of 𝑃1 , 𝑃2 , … , 𝑃𝑛 is True and 𝑃𝑘+1 is False;

⎪ True ⊕ True, if an odd number of 𝑃1 , 𝑃2 , … , 𝑃𝑛 is True and 𝑃𝑘+1 is True;
= ⎨
⎪ False ⊕ False, if an even number of 𝑃1 , 𝑃2 , … , 𝑃𝑛 is True and 𝑃𝑘+1 is False;

⎩ False ⊕ True, if an even number of 𝑃1 , 𝑃2 , … , 𝑃𝑛 is True and 𝑃𝑘+1 is True.
(breaking each option down according to the value of 𝑃𝑘+1 )
⎧ True, if an odd number of 𝑃1 , 𝑃2 , … , 𝑃𝑛 is True and 𝑃𝑘+1 is False;

⎪ False, if an odd number of 𝑃1 , 𝑃2 , … , 𝑃𝑛 is True and 𝑃𝑘+1 is True;
= ⎨
⎪ False, if an even number of 𝑃1 , 𝑃2 , … , 𝑃𝑛 is True and 𝑃𝑘+1 is False;

⎩ True, if an even number of 𝑃1 , 𝑃2 , … , 𝑃𝑛 is True and 𝑃𝑘+1 is True.
(evaluating the exclusive-ors)
True, if an odd number of 𝑃1 , 𝑃2 , … , 𝑃𝑘 , 𝑃𝑘+1 is True;
= 
False, if an even number of 𝑃1 , 𝑃2 , … , 𝑃𝑘 , 𝑃𝑘+1 is True.
(combining the True cases and combining the False cases, and
noting that their conditions also combine to capture precisely
the evenness or oddness of the number of the 𝑃𝑖 that are True).

So (24) holds for 𝑛 = 𝑘 + 1 too. This completes the Inductive Step.

Conclusion:
Therefore, by Mathematical Induction, (24) holds for all 𝑛 ∈ ℕ.

Exclusive-or is closely related to the symmetric difference of sets. If 𝑥 is an object


and 𝐴 and 𝐵 are sets, then

𝑥 ∈ 𝐴△𝐵 if and only if (𝑥 ∈ 𝐴) ⊕ (𝑥 ∈ 𝐵).

More generally, if 𝐴1 , 𝐴2 , … , 𝐴𝑛 are sets, then

𝑥 ∈ 𝐴1 △𝐴2 △ ⋯ △𝐴𝑛 if and only if (𝑥 ∈ 𝐴1 ) ⊕ (𝑥 ∈ 𝐴2 ) ⊕ ⋯ ⊕ (𝑥 ∈ 𝐴𝑛 ).


132 PROPOSiTiONAL LOGiC

4.12 TA U T O L O G i E S A N D L O G i C A L E Q U i VA L E N C E

A tautology is a statement that is always true. In other words, the right-hand column
of its truth table has every entry True.
Two statements 𝑃 and 𝑄 are logically equivalent if their truth tables are identical.
In other words, 𝑃 and 𝑄 are equivalent if and only if 𝑃 ⇔ 𝑄 is a tautology.

Examples:
¬¬𝑃 is logically
equivalent to 𝑃
¬(𝑃 ∨ 𝑄) is logically
equivalent to ¬𝑃 ∧ ¬𝑄
¬(𝑃 ∧ 𝑄) is logically
equivalent to ¬𝑃 ∨ ¬𝑄
𝑃 ⇒𝑄 is logically
equivalent to ¬𝑃 ∨ 𝑄
𝑃 ⇔𝑄 is logically
equivalent to (𝑃 ⇒ 𝑄) ∧ (𝑃 ⇐ 𝑄)
and to (¬𝑃 ∨ 𝑄) ∧ (𝑃 ∨ ¬𝑄)
𝑃 ⊕𝑄 is logically equivalent to ¬(𝑃 ⇔ 𝑄)
and to 𝑃 ⇔ ¬𝑄
and to (𝑃 ⇒ ¬𝑄) ∧ (𝑃 ⇐ ¬𝑄)

These can all be proved using truth tables.


We usually denote logical equivalence by “=”. So we write ¬¬𝑃 = 𝑃, etc.
We have already met logical equivalence. Each of De Morgan’s Laws states that its
left-hand side is logically equivalent to its right-hand side.

4.13𝜔 H i S T O RY

George Boole (1815–1864)


https://mathshistory.st-andrews.ac.uk/Biographies/Boole/
4.14 D i S T R i B U T i V E L AW S 133

4.14 D i S T R i B U T i V E L AW S

𝑃 ∧ (𝑄 ∨ 𝑅) = (𝑃 ∧ 𝑄) ∨ (𝑃 ∧ 𝑅) (4.1)

𝑃 ∨ (𝑄 ∧ 𝑅) = (𝑃 ∨ 𝑄) ∧ (𝑃 ∨ 𝑅) (4.2)

It is notable that we have two distributive laws in propositional logic. Conjunction


is distributive over disjunction (first law above, (4.1)), and disjunction is distributive
over conjunction (second law above, (4.2)).
This contrasts with the situation for ordinary algebra of numbers, when multiplica-
tion is distributive over addition, but addition is not distributive over multiplication.

𝑝 × (𝑞 + 𝑟) = (𝑝 × 𝑞) + (𝑝 × 𝑟)
but
𝑝 + (𝑞 × 𝑟) ≠ (𝑝 + 𝑞) × (𝑝 + 𝑟)

Just as for De Morgan’s Laws, we see a correspondence between the algebra of sets
and the algebra of logic. The Distributive Laws for sets, Theorem 3, correspond to the
Distributive Laws for Logic.

4.15 L AW S O F B O O L E A N A L G E B R A

Here is a full listing of the laws of Boolean algebra, which we may use to convert propo-
sitional expressions from one form to another. Reasons why we might do this include
finding simpler forms for Boolean expressions and determining algebraically whether or
not two given Boolean expressions are logically equivalent.
Because of the logical duality between conjunction and disjunction, these laws may
be arranged in dual pairs. In the table below, each law involving conjunction or disjunc-
tion is written on the same line as its dual.

¬¬𝑃 = 𝑃
¬True = False ¬False = True
𝑃 ∧𝑄 = 𝑄∧𝑃 𝑃 ∨𝑄 = 𝑄∨𝑃
(𝑃 ∧ 𝑄) ∧ 𝑅 = 𝑃 ∧ (𝑄 ∧ 𝑅) (𝑃 ∨ 𝑄) ∨ 𝑅 = 𝑃 ∨ (𝑄 ∨ 𝑅)
𝑃 ∧𝑃 = 𝑃 𝑃 ∨𝑃 = 𝑃
𝑃 ∧ ¬𝑃 = False 𝑃 ∨ ¬𝑃 = True
𝑃 ∧ True = P 𝑃 ∨ False = P
𝑃 ∧ False = False 𝑃 ∨ True = True
134 PROPOSiTiONAL LOGiC

Distributive Laws
𝑃 ∧ (𝑄 ∨ 𝑅) = (𝑃 ∧ 𝑄) ∨ (𝑃 ∧ 𝑅) 𝑃 ∨ (𝑄 ∧ 𝑅) = (𝑃 ∨ 𝑄) ∧ (𝑃 ∨ 𝑅)
De Morgan’s Laws
¬(𝑃 ∨ 𝑄) = ¬𝑃 ∧ ¬𝑄 ¬(𝑃 ∧ 𝑄) = ¬𝑃 ∨ ¬𝑄

4.16 DiSjUNCTiVE NORMAL FORM

We will introduce two standard ways of writing logical expressions. The first of these is
Disjunctive Normal Form (DNF), treated in this section. The second is Conjunctive
Normal Form (CNF), treated in the next section.
We treat DNF first, but CNF will be much more important for us. In brief, that’s
because CNF is more natural for encoding logical problems and real-world conditions.
You can always convert one to the other, but at considerable cost in both time and space
as we will see.
A literal is an appearance of a logical variable in which it is either unnegated or
negated just once. So, if 𝑋 is a logical variable, then its corresponding literals are 𝑋
and ¬𝑋 . Separate appearances of a logical variable within a larger logical expression are
counted as separate literals. For example, the expression (¬𝑋 ∧¬𝑌)∨(¬𝑋 ∧𝑌)∨(𝑋 ∧𝑌)
has six literals (even though some are equivalent to each other). We do not consider
¬¬𝑋 to be a literal, as it is, but it is equivalent to 𝑋 , which is a literal. Similarly, ¬¬¬𝑋
is not a literal but is equivalent to the literal ¬𝑋 .
A Boolean expression is in Disjunctive Normal Form (DNF) if

• it is written as a disjunction of some number of parts, where

• each part is a conjunction of some number of literals.

Examples:

(¬𝑋 ∧ ¬𝑌) ∨ (¬𝑋 ∧ 𝑌) ∨ (𝑋 ∧ 𝑌) a disjunction of three parts, each of which is a


conjunction of two literals.
(𝐴 ∧ ¬𝐵) ∨ (¬𝐴 ∧ ¬𝐵 ∧ 𝐶 ∧ 𝐷) a disjunction of two parts, where the first part is
a conjunction of two literals and the second part
is a conjunction of four literals.
𝑃 ∨ 𝑄 ∨ ¬𝑅 a disjunction of three parts, each containing a single literal.
𝑃 ∧𝑄 ∧𝑅 a disjunction of one part, which contains three literals.
𝑍 a disjunction of one part, which contains one literal.

You can convert any proposition into an equivalent one in DNF using its truth table.
Consider the proposition 𝑃 given by the following truth table.
4.16 D i S j U N C T i V E N O R M A L F O R M 135

𝑋 𝑌 𝑃
F F T
F T T
T F F
T T T

As a step towards designing a logical expression for the entire proposition 𝑃, we will
design one logical expression for each row where 𝑃 is True.
Consider the first row. This is for when 𝑋 = False and 𝑌 = False. We want the
result to be True. We can do this using ¬𝑋 ∧ ¬𝑌. Satisfy yourself that this is True
when 𝑋 = False and 𝑌 = False, and also that it is False for every other combination of
truth values for 𝑋 and 𝑌. So it is True only in this first row, as its own truth table shows:

𝑋 𝑌 ¬𝑋 ∧ ¬𝑌
F F T
F T F
T F F
T T F

Now consider the second row of the truth table for 𝑃, which is for when 𝑋 = False
and 𝑌 = True. This time we will use ¬𝑋 ∧ 𝑌, which is True for this row but False for all
the other rows. We can add an extra column to include its truth table too.

𝑋 𝑌 ¬𝑋 ∧ ¬𝑌 ¬𝑋 ∧ 𝑌
F F T F
F T F T
T F F F
T T F F

The third row of the truth table for 𝑃 has 𝑃 = False, so we will ignore that.
The fourth row of the truth table for 𝑃 has 𝑃 = True again. We will now use ¬𝑋 ∧𝑌,
which is True for this row but for no other. We add a further column for its truth table.

𝑋 𝑌 ¬𝑋 ∧ ¬𝑌 ¬𝑋 ∧ 𝑌 𝑋 ∧𝑌
F F T F F
F T F T F
T F F F F
T T F F T

Now look at what happens when we take the disjunction of the last three columns.
136 PROPOSiTiONAL LOGiC

𝑋 𝑌 ¬𝑋 ∧ ¬𝑌 ¬𝑋 ∧ 𝑌 𝑋 ∧𝑌 (¬𝑋 ∧ ¬𝑌) ∨ (¬𝑋 ∧ 𝑌) ∨ (𝑋 ∧ 𝑌)


F F T F F T
F T F T F T
T F F F F F
T T F F T T

This shows that the DNF expression (¬𝑋 ∧ ¬𝑌) ∨ (¬𝑋 ∧ 𝑌) ∨ (𝑋 ∧ 𝑌) is equivalent
to 𝑃.
conjunction conjunction conjunction

𝑃 = (¬𝑋 ∧ ¬𝑌) ∨ (¬𝑋 
∧ 𝑌) ∨ 
(𝑋 ∧ 𝑌)

disjunction
dis

DNF expressions like this can be read easily from truth tables. You don’t have to
add extra columns as we did above. In each row, look at the pattern of truth values
for the variables. Then write down a conjunction of literals where variables that are
True are written normally and variables that are False are written in negated form. For
example, in the second row of our truth table, the variable 𝑋 is False so we negate it,
whereas 𝑌 is True so we just use it unchanged. Taking the conjunction of these two
literals gives ¬𝑋 ∧ 𝑌, which is the part we want for that row. Do this for every row in
which the proposition 𝑃 is True. This is shown for our current example in the following
table.

𝑋 𝑌 𝑃
F F T ¬𝑋 ∧ ¬𝑌
F T T ¬𝑋 ∧ 𝑌
T F F
T T T 𝑋∧ 𝑌

When this is done, just take the disjunction of all the parts in the final column and
you have your DNF expression for 𝑃.

Exercise: simplify the above expression 𝑃 as much as possible, using Boolean algebra.

Here is another example of this method. The columns 𝑋 , 𝑌, 𝑍 correspond to three


variables 𝑋 , 𝑌, 𝑍, and the fourth column gives the truth table for a new proposition 𝑃.
In the fifth column, we look at each row where 𝑃 = True and convert the first three truth
values in that row into a conjunction of literals that follows the required pattern. For
this particular 𝑃, this gives four conjunctions.
4.16 D i S j U N C T i V E N O R M A L F O R M 137

𝑋 𝑌 𝑍 𝑃
F F F T ¬𝑋 ∧ ¬𝑌 ∧ ¬𝑍
F F T F
F T F T ¬𝑋 ∧ 𝑌 ∧ ¬𝑍
F T T F
T F F F
T F T T 𝑋 ∧ ¬𝑌 ∧ 𝑍
T T F T 𝑋 ∧ 𝑌 ∧ ¬𝑍
T T T F

Taking the disjunction of all the conjunctions in the last column gives a DNF ex-
pression for 𝑃.

𝑃 = (¬𝑋 ∧ ¬𝑌 ∧ ¬𝑍) ∨ (¬𝑋 ∧ 𝑌 ∧ ¬𝑍) ∨ ( 𝑋 ∧ ¬𝑌 ∧ 𝑍) ∨ ( 𝑋 ∧ 𝑌 ∧ ¬𝑍)

The importance of this method is that it shows that every logical expression is
equivalent to one in DNF. So DNF can be viewed as a “standard form” into which any
logical expression can be transformed.
BUT this transformation comes at a price. The DNF expression has as many parts
as there are truth table rows where the expression is True. An expression with 𝑘 vari-
ables has 2𝑘 rows in its truth table, which is exponential in the number of variables. If
the expression is True for a large proportion of its truth table rows, then the number of
parts in the DNF expression may also be exponentially large in the number of variables.
So it may be too large and unwieldy to be useful, unless the number of variables is very
small. (If a Boolean expression is actually provided in the form of its entire truth table,
then constructing its equivalent DNF expression by the above method is ok, since the
expression you get won’t be any larger than the truth table. But if the Boolean expres-
sion is provided as a compact formula, then the size of its equivalent DNF expression
may be exponential in the size of the formula you started with.)
One apparent attraction of DNF is that it is easy to tell if a DNF expression is
satisfiable, that is, if there is some assignment of truth values to its variables that makes
the whole expression True. In fact, if you take any of the parts of a DNF expression, the
pattern of the literals (i.e., whether each appears plainly or negated) tells you a truth
assignment that makes that part True, and then the whole disjunction must also be True
because a disjunction is True precisely when at least one of its parts is True. In effect,
the parts of a DNF expression yield a kind of encoded listing of all the truth assignments
that make the whole expression True.
On the other hand, it does not seem so easy to tell if a DNF expression is a tautology.
For a DNF expression to not be a tautology, there would have to be some truth assign-
ment to its variables that makes it False. There is no known way to do test for this
efficiently.
The big problem with DNF, though, is that, in real life, logical rules are not usually
specified in a form that is amenable to DNF. They are typically described by listing
138 PROPOSiTiONAL LOGiC

conditions that must be satisfied together. In other words, they are described in a way
that lends itself to expression as a conjunction rather than a disjunction.

4.17 CONjUNCTiVE NORMAL FORM

A Boolean expression is in Conjunctive Normal Form (CNF) if


• it is written as a conjunction of some number of parts (sometimes called clauses
clauses),
where

• each part is a disjunctijon of some number of literals.


For example:
disjunction disjunction

(¬𝑃 ∨ 𝑄) ∧ 
(𝑃 ∨ ¬𝑄)

conjunction
con

There is a close relationship — a kind of logical duality — between CNF and DNF.
Suppose you have an expression 𝑃 in CNF, and suppose you negate it, giving ¬𝑃. So
¬𝑃 is a negation of a conjunction of a disjunction of literals. But, by De Morgan’s Law,
a negation of a conjunction is a disjunction of negations. So ¬𝑃 will then be expressed
as a disjunction of negations of disjunctions of literals. But, again by De Morgan’s Law,
each negation of a disjunction is equivalent to a conjunction of negations. So ¬𝑃 is
now a disjunction of conjunctions of negations of literals. But the negation of a literal is
always equivalent to another literal (since ¬¬𝑋 = 𝑋 ). So we see that ¬𝑃 is a disjunction
of conjunctions of literals. In other words, it’s in DNF.
We now have a way to convert any logical expression 𝑃 in truth table form to a CNF
expression. To do so, we just
1. negate all the truth values in the output column (turning it into a truth table for
¬𝑃),

2. use the method of the previous subsection to construct a DNF expression for ¬𝑃,

3. then negate the expression (so that it will now be an expression for 𝑃 again),

4. and use De Morgan’s Laws to transform the negated DNF expression into CNF.
This establishes the important theoretical point that every expression is equivalent
to a CNF expression. But this is usually not a good way to construct CNF expressions
in practice, because:
• Truth tables are too large. As discussed earlier, their size is exponential in the
number of variables. Complex logical conditions in real computational problems
usually contain enough variables for truth tables to be unusable. Furthermore,
even for more modest-sized problems, the truth table approach for CNF uses a lot
of time and space and, for manual work, is quite error-prone.
4.18 R E P R E S E N T i N G L O G i C A L S TAT E M E N T S 139

• Logical conditions are usually expressed in a way that makes CNF a natural way
to represent them.

4.18 R E P R E S E N T i N G L O G i C A L S TAT E M E N T S

Suppose we are given a set of rules or conditions that we want to model as a logical
expression. These are often expressed as conditions that must be satisfied together. For
example, the rules of a game must all be followed, to play the game correctly; you can’t
just ignore the rules you don’t like on the grounds that you’re following some of the
rules! A software specification typically stipulates a set of conditions that must all be
met, and similarly for legal contracts, acts of parliament, traffic regulations, itineraries,
and so on. While some rules may offer choices as to how they can be satisfied, at the
highest level a set of rules is usually best modelled as a conjunction.
So, if you are given some specifications and you want to model them by a logical
expression, one first step you can take (working “top-down”) is to identify how the rule
is structured at the top level as a conjunction. What are the parts of this conjunction?
You can then keep working top-down and try to decompose those parts as conjunctions
too. A conjunction of conjunctions is just one larger conjunction.
Working from the other direction (“bottom-up”), you also have to think about what
your most elementary logical “atoms” are. In other words, think about the simplest,
most basic assertion that could hypothetically be made about this situation, without
worrying about whether it might be True or False. In fact, in this kind of situation,
you won’t initially know what values your logical variables might have; you’re merely
encoding your problem in logical form, without solving it yet, so you avoid thinking
about actual truth values for any of your variables. You are just trying to identify the
kinds of “atomic assertions” that are needed to describe any hypothetical situation in
your scenario.

Example:
You are planning a dinner party. Your guest list must have:
• at least one of: Harry, Ron, Hermione, Ginny
• Hagrid only if it also has Norberta
• none, or both, of Fred and George
• no more than one of: Voldemort, Bellatrix, Dolores.

At the top level, we can see that this is a conjunction:


(at least one of: Harry, Ron, Hermione, Ginny)
∧ (Hagrid only if it also has Norberta)
∧ (none, or both, of Fred and George)
∧ (no more than one of: Voldemort, Bellatrix, Dolores).
140 PROPOSiTiONAL LOGiC

Note that we’re not trying to convert everything to logic at once. The four parts of
our conjunction are not yet expressed in logical form; they’re still just written in English
text. That’s ok, in this intermediate stage.
Early in the process, we should think about what Boolean variables to use, and what
they should represent. In this case, that is fairly straightforward. The simplest logical
statement we can make in this situation is that a specific person is on your guest list.
So, for each person, we’ll introduce a Boolean variable with the intended interpretation
that the person is on your guest list. So, the variable Harry is intended to mean that
the person Harry is on your guest list, and so on. This gives us eleven variables, one
for each of our eleven guests. As far as we know at the moment, each of the variables
might be True or False; it is the role of the logical expression we are constructing to
ensure that the combinations of truth values for these eleven variables must correspond
to valid guest lists. We will do that by properly representing the rules in logic. We
will not try, at this stage, to enumerate all possible guest lists, or even to find one valid
guest list. Our current task is to encode the problem’s rules in logic, not to solve the
problem. (That can be tackled later, and is a different skill.)
Now, let’s look at each of the four parts of our conjunction, in turn, and see how
they may be logically expressed using our variables.

• at least one of: Harry, Ron, Hermione, Ginny.


This is a disjunction of the four corresponding variables:

Harry ∨ Ron ∨ Hermione ∨ Ginny

• Hagrid only if it also has Norberta.


This is modelled by implication:

Hagrid ⇒ Norberta

• none, or both, of Fred and George.


This is a job for the biconditional:

Fred ⇔ George

• no more than one of: Voldemort, Bellatrix, Dolores.


This requires some more thought! Unlike the previous parts, it doesn’t correspond
immediately to a logical operation we have already met. So we will try to see if it
can be broken down further into a conjunction of simpler conditions. To do this,
consider that “no more than one of …” is the same as saying that every pair of
them is forbidden. So, the pair Voldemort & Bellatrix is forbidden, and the pair
4.18 R E P R E S E N T i N G L O G i C A L S TAT E M E N T S 141

Voldemort & Dolores is forbidden, and the pair Bellatrix & Dolores is forbidden.
See the logical structure emerging: “…and …and …”. So we have:

(not both Voldemort & Bellatrix) ∧


(not both Voldemort & Dolores) ∧
(not both Bellatrix & Dolores)

So our expression is now:

(Harry ∨ Ron ∨ Hermione ∨ Ginny)


∧ (Hagrid ⇒ Norberta)
∧ (Fred ⇔ George)
∧ (not both Voldemort & Bellatrix)
∧ (not both Voldemort & Dolores)
∧ (not both Bellatrix & Dolores).

We are now getting to the point where we can use logical manipulations (Boolean
algebra) to transform each of the four parts into a disjunction of literals. The first part
is already a disjunction. The second part can be written as the disjunction ¬Hagrid ∨
Norberta, as we have already seen. The third part can be written

(Fred ⇒ George) ∧ (Fred ⇐ George)

and turning each implication into an appropriate disjunction gives

(¬Fred ∨ George) ∧ (Fred ∨ ¬George).

For the fourth part, requiring that Voldemort and Bellatrix are not both True is the
same as requiring at least one of them to be False, which is the same as requiring at
least one of their negations to be True, which is captured by ¬Voldemort ∨ ¬Bellatrix.
Treating the other two pairs the same way gives the conjunction of disjunctions

(¬Voldemort ∨ ¬Bellatrix) ∧ (¬Voldemort ∨ ¬Dolores) ∧ (¬Bellatrix ∨ ¬Dolores)

Putting these altogether gives the expression

(Harry ∨ Ron ∨ Hermione ∨ Ginny)


∧ (¬Hagrid ∨ Norberta)
∧ (¬Fred ∨ George) ∧ (Fred ∨ ¬George)
∧ (¬Voldemort ∨ ¬Bellatrix) ∧ (¬Voldemort ∨ ¬Dolores) ∧ (¬Bellatrix ∨ ¬Dolores)

This is now in CNF.

Challenge: how long would an equivalent DNF expression be?


142 PROPOSiTiONAL LOGiC

4.19 S TAT E M E N T S A B O U T H O W M A N Y VA R i A B L E S A R E T R U E

Given a collection of Boolean variables, we are often interested in how many of them
are True. We might want to state that at least two of them are True, or that at most
two of them are True, or exactly two of them are True. We might want to make similar
statements with “two” replaced by some other number. Conditions of this kind are
examples of cardinality contraints, since they are only about the number of variables
with a given value.
We saw examples of this in the previous section. We wanted at least one of Harry,
Ron, Hermione and Ginny to be True. We also wanted at most one of Voldemort,
Bellatrix and Dolores to be True. So, to build your intuition about dealing with these
situations, it would be worth pausing now and spending a few minutes reading that
analysis again, and thinking through how the reasoning given there could be extended
to situations where the number of True variables involved is some other number (i.e.,
not one).
If ever we want to specify that exactly 𝑘 variables are True, we can express this as a
conjunction:
(at least 𝑘 are True) ∧ (at most 𝑘 are True).
So we now focus our discussion on just the “at least” and “at most” cases.
Suppose we want to state that at most 𝑘 of the 𝑛 variables 𝑥1 , 𝑥2 , … , 𝑥𝑛 are True.
This means that, for every set of 𝑘 +1 variables, at least one of them is False, or in other
words, at least one of their negations is True. So, for every set of 𝑘 + 1 variables, we
form a disjunction of their negations (to say that at least one of these negations is True),
and then we combine all these disjunctions into a larger conjunction. The number of
disjunctions we use is just the number of subsets of 𝑘 + 1 variables chosen from our 𝑛
𝑛
variables, which is ⒧𝑘+1 ⒭.
For example, suppose we want to say that at most two of the four variables 𝑤, 𝑥, 𝑦, 𝑧
is True (i.e., 𝑘 = 2 and 𝑛 = 4). This means that, for every three of the variables, at least
one of them is False. So, at least one of 𝑤, 𝑥, 𝑦 is False, and at least one of 𝑤, 𝑥, 𝑧 is
False, and so on. But saying that at least one of a set of variables is False is the same as
saying that at least one of their negations is True. For example, at least one of 𝑤, 𝑥, 𝑦 is
False if and only if at least one of ¬𝑤, ¬𝑥, ¬𝑦 is True. This is now a job for disjunction:
at least one of ¬𝑤, ¬𝑥, ¬𝑦 is True if and only if ¬𝑤 ∨ ¬𝑥 ∨ ¬𝑦 is True. So, we create all
disjunctions of triples (𝑘 + 1 = 3) of negated literals, which gives the disjunctions

¬𝑤 ∨ ¬𝑥 ∨ ¬𝑦, ¬𝑤 ∨ ¬𝑥 ∨ ¬𝑧, ¬𝑤 ∨ ¬𝑦 ∨ ¬𝑧, ¬𝑥 ∨ ¬𝑦 ∨ ¬𝑧 .


𝑛
The number of these disjunctions is ⒧𝑘+1⒭ = ⒧43⒭ = 4. These disjunctions are then com-
bined by conjunction:

(¬𝑤 ∨ ¬𝑥 ∨ ¬𝑦) ∧ (¬𝑤 ∨ ¬𝑥 ∨ ¬𝑧) ∧ (¬𝑤 ∨ ¬𝑦 ∨ ¬𝑧) ∧ (¬𝑥 ∨ ¬𝑦 ∨ ¬𝑧).


4.20 U N i V E R S A L S E T S O F O P E R AT i O N S 143

Now suppose we want to state that at least 𝑘 of the 𝑛 variables 𝑥1 , 𝑥2 , … , 𝑥𝑛 are True.
This means that at most 𝑛 − 𝑘 of them are False. This means that, for every set of
𝑛 − 𝑘 + 1 variables, at least one of them is True. So, for every set of 𝑛 − 𝑘 + 1 variables,
we form a disjunction of them (to say that at least one of them is True), and then we
combine all these disjunctions into a larger conjunction.
For example, suppose we want to say that at least two of the four variables 𝑤, 𝑥, 𝑦, 𝑧 is
True (i.e., 𝑘 = 2 and 𝑛 = 4). We create all disjunctions of triples (𝑛−𝑘+1 = 4−2+1 = 3)
of literals (unnegated, this time), which gives the disjunctions

𝑤 ∨ 𝑥 ∨ 𝑦, 𝑤 ∨ 𝑥 ∨ 𝑧, 𝑤 ∨ 𝑦 ∨ 𝑧, 𝑥∨𝑦 ∨𝑧.

These are then combined by conjunction:

(𝑤 ∨ 𝑥 ∨ 𝑦) ∧ (𝑤 ∨ 𝑥 ∨ 𝑧) ∧ (𝑤 ∨ 𝑦 ∨ 𝑧) ∧ (𝑥 ∨ 𝑦 ∨ 𝑧).

Finally, if we want to say that exactly two of the four variables are True, then we
take the conjunction of the expressions for “at least” and “at most”, giving

(¬𝑤 ∨ ¬𝑥 ∨ ¬𝑦) ∧ (¬𝑤 ∨ ¬𝑥 ∨ ¬𝑧) ∧ (¬𝑤 ∨ ¬𝑦 ∨ ¬𝑧) ∧ (¬𝑥 ∨ ¬𝑦 ∨ ¬𝑧) ∧


(𝑤 ∨ 𝑥 ∨ 𝑦) ∧ (𝑤 ∨ 𝑥 ∨ 𝑧) ∧ (𝑤 ∨ 𝑦 ∨ 𝑧) ∧ (𝑥 ∨ 𝑦 ∨ 𝑧).

4.20 U N i V E R S A L S E T S O F O P E R AT i O N S

We saw in § 4.16 that every logical expression is equivalent to a DNF expression. Later, in
§ 4.17, we observed that every logical expression is also equivalent to a CNF expression.
One consequence of this is that every logical expression is equivalent to an expression
that only uses the operations from the operation set {∧, ∨, ¬}.
We say that a set of operations 𝑋 is universal if every logical expression 𝑃 is equiv-
alent to an expression 𝑄 that only uses operations in that set (together with variables
from the original expression). So the set of operations that actually appear in 𝑄 must
be a subset of the operation set 𝑋 .
Our observations above, about DNF and CNF, demonstrate that the operation set

{∧, ∨, ¬}

is universal.
One reason for considering universal sets of operations relates to the construction of
electronic circuits that compute the values of logical expressions. It is easier to make
these circuits if we can put them together from simple components of only a few different
types. Simple components are easier to construct, and the complexity of manufacturing
is reduced if we don’t have to make too many different types of components. We also
gain economies of scale from making large numbers of these simple components instead
of smaller numbers of more complex components.
144 PROPOSiTiONAL LOGiC

The operation set {∧, ∨, ¬} is not the smallest possible universal set of operations.
De Morgan’s Laws (§ 4.8) show how to express ∧ in terms of ∨ and ¬, and dually how
to express ∨ in terms of ∧ and ¬. So either ∧ or ∨ could be dropped from our universal
set, as long as we retain the other as well as ¬. So the operation set

{∧, ¬}

is also universal, and so is


{∨, ¬}.
Having just seen that we can drop either ∧ or ∨ from our universal set {∧, ∨, ¬},
while still retaining the universal property, it is natural to ask if we can drop ¬ and keep
the other two. In other words, is the set

{∧, ∨}

universal?
Having found universal sets of just two operations, we can ask if this is the smallest
possible. Does there exist a universal set of just one operation?
Clearly the set {¬} is not universal, since ¬ is a unary operation that cannot be
used to combine logical variables together. So, if we seek a single universal operation,
we should start by looking at binary operations.

4.21 EXERCiSES

1. Suppose we have the following propositions 𝐵, 𝑀 , 𝑆 regarding an overseas trip you


are doing.

𝐵: you visit Brazil


𝑀: you visit Malaysia
𝑆: you visit Singapore

Use 𝐵, 𝑀 and 𝑆 to write a Boolean expression which is true if and only if you visit
Brazil or both the other countries (but not all three).

2. Suppose we have the following propositions 𝑃, 𝑆 about a certain unknown integer.

𝑁: the integer is negative


𝑃: the integer is prime
𝑆: the integer is a square

Use 𝑃, 𝑆 to write a Boolean expression with the following meaning:

If the integer is a square, then it is neither negative nor prime.


4.21 E X E R C i S E S 145

3. Assuming the first of De Morgan’s Laws,

¬(𝑃 ∨ 𝑄) = ¬𝑃 ∧ ¬𝑄,

prove the second of De Morgan’s Laws,

¬(𝑃 ∧ 𝑄) = ¬𝑃 ∨ ¬𝑄,

as simply as possible.

4. Prove the two Distributive Laws ((4.1) and (4.2) in § 4.14).

5. Do you need parentheses in the expression 𝑎∧𝑏∨𝑐? Investigate the two expressions

𝑎 ∧ (𝑏 ∨ 𝑐) and (𝑎 ∧ 𝑏) ∨ 𝑐.

Are they logically equivalent? If not, what can you say about the relationship between
them? Does either imply the other?

6.
Prove that

(𝑃1 ∧ ⋯ ∧ 𝑃𝑛 ) ⇒ 𝐶 is logically equivalent to ¬𝑃1 ∨ ⋯ ∨ ¬𝑃𝑛 ∨ 𝐶

A disjunction of the form ¬𝑃1 ∨ ⋯ ∨ ¬𝑃𝑛 ∨ 𝐶 is called a Horn clause. These play a big
role in the theory of logic programming.

7. Simplify the following expression as much as possible, using Boolean algebra:

(¬𝑋 ∧ ¬𝑌) ∨ (¬𝑋 ∧ 𝑌) ∨ (𝑋 ∧ 𝑌).

8. This question is about using Boolean algebra to describe the algebra of switching.
An electrical switch is represented diagrammatically as follows.

The state of a switch can be represented by a Boolean variable, with the states Off
and On being represented by False and True respectively. Let 𝑥 be a Boolean variable
representing the state of a switch. Then 𝑥 = True represents the switch being On, so
that electrical current can pass through it; 𝑥 = False represents the switch being Off, so
that there is no electrical current through the switch.
146 PROPOSiTiONAL LOGiC

We can put switches together to make more complicated circuits. Those circuits can
then be described by Boolean expressions in the variables that represent the switches.
In the following circuits, 𝑣, 𝑤, 𝑥, 𝑦, 𝑧 are Boolean variables representing the indicated
switches.

(a) For each of the two switching circuits below, write a Boolean expression in 𝑥 and 𝑦
for the proposition that current flows between the two ends of the circuit. (The ends are
shown as 𝐴 and 𝐵 in the diagram. In effect, the diagrams only show part of a complete
circuit. The rest of the circuit would contain a power supply, such as a battery, and an
electrical device that operates when the current flows, such as a light or an appliance.)

𝐴 𝐵 𝑦
𝑥
𝑥 𝑦

(b) For each of the next two circuits, again construct a Boolean expression to represent
the proposition that current flows between the two ends of the circuit.
Compare the two expressions you construct. What can you say about the relationship
between them?
4.21 E X E R C i S E S 147

𝑦 𝑣 𝑦
𝑣

𝑤 𝑧 𝑤 𝑧

(c) Similarly, construct Boolean expressions for the next two circuits.
Compare the two expressions you construct. What can you say about the relationship
between them?

𝑣 𝑦
𝑣 𝑦

𝑥
𝑥
𝑤 𝑧
𝑤 𝑧

(d) For each of our circuit pairs in (a)–(c), discuss how they are related to each other,
and what this means for how the Boolean expressions derived from them are related to
each other.

9. Does logically negating an implication reverse it? In other words, are the
expressions
¬(𝑃 ⇒ 𝑄) and 𝑃 ⇐𝑄
logically equivalent? If so, prove it; if not, determine (with proof) if either implies the
other.
148 PROPOSiTiONAL LOGiC

10. Prove that


(𝑃 ∧ (𝑃 ⇒ 𝑄)) ⇒ 𝑄
is a tautology. Comment on what this means for logical deduction.

11. Prove that

(a) 𝑃 ⊕ 𝑄 = (𝑃 ∧ ¬𝑄) ∨ (¬𝑃 ∧ 𝑄).

(b) 𝑃 ⊕ 𝑄 = (𝑃 ∨ 𝑄) ∧ (¬𝑃 ∨ ¬𝑄).

12.

(a) Express 𝑃 ∨ 𝑄 using just the operations ⊕ and ∧, and no negation.

(b) Express 𝑃 ∧ 𝑄 using just the operations ⊕ and ∨, and no negation.

13. We saw two distributive laws for ∨ and ∧ in § 4.14, and contrasted that with the
situation for ordinary numbers under + and ×, when only one distributive law holds.
Investigate the situation for ⊕ and ∧. Do you have two distributive laws, or one
(and which one?), or none? Give explanations for your answers.

14. Consider the following truth table for a proposition 𝑃 in terms of 𝑋 , 𝑌 and 𝑍.

𝑋 𝑌 𝑍 𝑃
F F F F
F F T F
F T F F
F T T T
T F F F
T F T T
T T F T
T T T T

(a) Express 𝑃 in Disjunctive Normal Form (DNF).

(b) Express ¬𝑃 in DNF.

(c) Using (b) as your starting point, express 𝑃 in Conjunctive Normal Form (CNF).

(d) An alternative way to try to express 𝑃 in CNF might be to start from the DNF
for 𝑃 you found in (a) and then expand the whole thing using the Distributive Laws. If
you start with your expression from (a) and do all this expansion without yet doing any
further simplification, how many parts are combined together in the large conjunction
4.21 E X E R C i S E S 149

you construct? How many literals does it have? How does this approach compare with
(c), with respect to ease and efficiency?

15.
A meeting about moon mission software is held at NASA in 1969. Participants may
include Judith Cohen (electrical engineer), Margaret Hamilton (computer scientist), and
Katherine Johnson (mathematician). Let Judith, Margaret and Katherine be propositions
with the following meanings.

Judith Judith Cohen is in the meeting.


Margaret Margaret Hamilton is in the meeting.
Katherine Katherine Johnson is in the meeting.

For each of the following statements, write a proposition in Conjunctive Normal


Form with the same meaning.

(a) Judith and Margaret are not both in the meeting.


(b) Either Judith or Margaret, but not both of them, is in the meeting. (This is the
“exclusive-OR”.)
(c) At least one of Judith, Margaret and Katherine is in the meeting.
(d) At most one of Judith, Margaret and Katherine is in the meeting.
(e) Exactly one of Judith, Margaret and Katherine is in the meeting.
(f) At least two of Judith, Margaret and Katherine are in the meeting.
(g) At most two of Judith, Margaret and Katherine are in the meeting.
(h) Exactly two of Judith, Margaret and Katherine are in the meeting.
(i) Exactly three of Judith, Margaret and Katherine are in the meeting.
(j) None of Judith, Margaret and Katherine is in the meeting.

16.
Recall the logical expression given on p. 141 in § 4.18 for your dinner party guest list:

(Harry ∨ Ron ∨ Hermione ∨ Ginny)


∧ (¬Hagrid ∨ Norberta)
∧ (¬Fred ∨ George) ∧ (Fred ∨ ¬George)
∧ (¬Voldemort ∨ ¬Bellatrix) ∧ (¬Voldemort ∨ ¬Dolores) ∧ (¬Bellatrix ∨ ¬Dolores)

How long would an equivalent DNF expression be? Specifically, how many disjuncts —
smaller expressions combined using ∨ to make the whole expression — would it have?

17. A Boolean function is in algebraic normal form if it is an exclusive-or of some


number of parts, where each part is a conjunction of unnegated variables or the logical
constant True, and we also allow the entire expression to be just the logical constant
False (but otherwise False is not allowed to appear in the expression).
150 PROPOSiTiONAL LOGiC

𝐴
𝐶
𝐵

Figure 4.1: An AND gate. 𝐴 and 𝐵 are the inputs and 𝐶 is the output (which is therefore true
if and only if both 𝐴 and 𝐵 are true).

So, for example, the following expressions are in algebraic normal form:

𝑥 ⊕ 𝑦 ⊕ (𝑥 ∧ 𝑦), True ⊕ 𝑦 ⊕ (𝑥 ∧ 𝑦 ∧ 𝑧), 𝑥, True ⊕ 𝑥, 𝑥 ∧ 𝑧, True, False.

But the following expressions are not in algebraic normal form:

𝑥 ⊕ 𝑦 ⊕ (¬𝑥 ∧ 𝑦), False ⊕ 𝑦 ⊕ (𝑥 ∧ 𝑦 ∧ 𝑧), ¬𝑥, 𝑥 ∨ 𝑧, True ⊕ True.

In Exercise 12(a), you expressed the disjunction of two variables in algebraic normal
form.

Prove that every Boolean function can be written in algebraic normal form. (Use
Exercise 12(a) and what you know about DNF.)

18. Suppose you have an unlimited supply of AND gates, which are logic gates for the
binary operation ∧. Each gate has two inputs, for the two arguments, and one output,
which gives the conjunction of the inputs. The output of one gate can be used for the
input of another gate.
To represent an AND gate in a circuit diagram we will use the symbol in Figure 4.1.

Show how to put AND gates together to compute the conjunction of eight logical
arguments 𝑥1 , 𝑥2 , … , 𝑥8 .
How many AND gates do you need?
Suppose each AND gate computes its output as soon as both its inputs are available,
and that it takes time 𝑡 to compute the output. Assume that all the initial inputs
𝑥1 , 𝑥2 , … , 𝑥8 are available simultaneously, at time 0. How long does your combination of
AND gates take to compute its output?
Try and put your AND gates together to minimise the total time taken to compute
the final output.

19. Prove, by induction on 𝑛, that the conjunction of 2𝑛 inputs can be computed in


time 𝑛𝑡 by a suitable combination of AND gates.

Hence prove that the conjunction of 𝑚 arguments (where 𝑚 is not necessarily a


power of 2) can be computed by a suitable combination of AND gates in time ⌈log2 𝑚⌉.
4.21 E X E R C i S E S 151

20. A Boolean expression is called affine if it is of one of the following forms:

𝑥1 ⊕ 𝑥 2 ⊕ ⋯ ⊕ 𝑥 𝑛
or
𝑥1 ⊕ 𝑥2 ⊕ ⋯ ⊕ 𝑥𝑛 ⊕ True.

Prove, by induction on 𝑛, that for all 𝑛 ∈ ℕ, the number of satisfying truth assignments
of an affine Boolean expression with 𝑛 variables is 2𝑛−1 .
It follows that half of all truth assignments are satisfying for the expression, and half
are not.
Here, a truth assignment is just an assignment of a truth value to each variable,
and it is satisfying if the assignment makes the whole expression True.

21. Let 𝑥1 𝑥0 be the bits of a two-bit binary number 𝑥, representing an integer in


{0, 1, 2, 3}. So 𝑥1 , 𝑥0 ∈ {0, 1} and we have 𝑥 = 2𝑥1 + 𝑥0 .
Let 𝑦2 𝑦1 𝑦0 be the three-bit binary representation of the number 𝑥 +1 obtained from
𝑥 by incrementing it. So 𝑦 ∈ {1, 2, 3, 4}, and its leading bit is 0 except if 𝑥 = 3, in which
case 𝑦 = 4 which in binary is 100.
In this exercise we use 0 and 1 for the truth values False and True (as mentioned on
p. 122 in § 4.1𝛼 ).
Give Boolean expressions for each of the three bits of 𝑦, in terms of the two bits 𝑥1
and 𝑥0 of 𝑥.

22. Is the operation set {⊕, ¬} universal? Justify your answer.

23. Prove or disprove the claim that there exists a universal operation set containing
just a single operation.
You need not restrict consideration to the three operations ∧, ∨, ¬. You can use
another operation of at most two arguments (in other words, a binary operation) if you
wish.
What type of proof did you use? (Recall the proof types discussed in Chapter 3.)

24. Computer circuits actually perform the very same Boolean logic that we have
studied using logic gates (§ 4.4𝛼 ). The idea is that you have one or more wires coming
into a gate, and usually one output wire. If an input wire has current flowing through,
the variable it represents is set to True, and if not, then False. The logic gate then takes
those signals and gives off an output signal if the result should be True, and no signal if
the result is False. Below are the most common logic gates seen in circuits.
152 PROPOSiTiONAL LOGiC

AND OR
a a
Output Output
b b

NOT XOR
a
a Output Output
b

NAND NOR
a a
Output Output
b b

XNOR
a
Output
b

The AND, OR, and NOT gates function the same as what we have seen previously.
If both signals of an AND gate are on, it will output a signal. For OR, only one input
signal is required to be on, and for NOT, there will only be an output signal if the input
signal is off (False).
XOR is “EXCLUSIVE OR”, which outputs a signal only if a or b are on, but not both.
NAND, NOR, and XNOR are simply the inversions of AND, OR, and XOR respectively
(i.e. NAND is “not both”, NOR is “neither”, and XNOR is “both or neither”). These
gates can also be connected in series, like in the example below, which is equivalent to
the logical expression (¬𝑎 ∨ ¬𝑏).

(¬𝑎 ∨ ¬𝑏)

(a) What single logic gate is the above circuit equivalent to?
4.21 E X E R C i S E S 153

(b) We saw in § 4.20, that {∨, ∧, ¬} is a universal set of operations, meaning that
any logical expression can be expressed with only these operations. How would
you express the function of an XOR gate as a Boolean expression using only these
operations?

(c) Draw a circuit diagram which performs the same functionality as an XOR gate using
only AND, OR, and NOT gates.

(d) Do we even need three types of gates? Given that {∧, ∨, ¬} is a universal set of
operations, use De Morgan’s Law to prove that {∧, ¬} and {∨, ¬} are also a universal
sets of operations. (Hint: Can you express ∨ with the other two operators?)

(e) Draw a circuit diagram that performs an OR operation using only AND and NOT
gates, and also one which performs AND using only OR and NOT gates.

(f) Can you perform the function of a NOT gate, or an AND gate, with only NAND
gates? What does this tell you about NAND gates?
5
P R E D I C AT E L O G I C

Propositional logic, covered in the previous chapter, enables us to do logical reasoning


with propositions, in which everything must be either True or False. Any variables we
used were Boolean variables, and could only take the values True and False.
But we don’t want to only do logical reasoning about Boolean variables and expres-
sions. In computer science and mathematics, we want to reason about a huge variety of
objects, such as numbers, strings, sets, sequences, networks, files, algorithms, programs,
systems, and so on. And we want to be able to reason at a general level, so we can make
statements that cover whole categories of objects without having to make separate state-
ments about each and every individual object in the category. So we need variables, by
which a name can be used to refer to any object of some type, and predicates which are
a way of making assertions about variables and the objects they can represent. We use
sets to specify the values that a variable can represent and allow functions that convert
some objects into other objects. We want to be able to say whether or not a statement
about variables is always true, or sometimes true, or never true.
Predicate logic gives us the logical concepts and machinery to do all these things. It
is powerful and flexible enough that it can be used to express any precise mathemat-
ical statement and then do logical reasoning about it. It builds on the foundation of
propositional logic but extends the realm of logic much further.
Mastery of predicate logic is not only important for people who work with mathe-
matical statements and proofs. It is also used by computer programs that do automated
reasoning about mathematical statements. As such, it is an important tool in Artificial
Intelligence.

5.1𝛼 R E L AT i O N S , P R E D i C AT E S , A N D T R U T H - VA L U E D F U N C T i O N S

We have already met predicates. We mentioned in § 2.13 and § 2.17 that the term
“predicate” is just a synonym for “relation”. Restating this as a definition, we have:

A predicate is a relation.

So, when discussing predicates, we can draw on all the terminology and theory that we
developed for relations. For example, we can talk about the arguments of a predicate,

155
156 P R E D i C AT E L O G i C

and the domain of each of its arguments. We can call it a 𝑘-ary predicate if it has 𝑘
arguments.
Although a predicate is nothing more or less than a relation, the term tends to
indicate an intention to do logic with it. So, we are interested in when things are true
or false, and we might want to combine things using logical operations.
Consider the relation Parent from § 2.13. This is a subset of ℙ×ℙ, where ℙ is the set
of all people who have ever lived. Some ordered pairs of people belong to this relation,
others do not.

(Ada Lovelace , George Gordon Byron) ∈ Parent,


(George Boole , George Everest) ∉ Parent.

For the pairs that belong to it, we may state that fact using prefix notation:

Parent(Ada Lovelace, George Gordon Byron).

This is a true statement. In logic, we need to be able to work with both true and false
statements. So we want to be able to make assertions of both types.

Parent(Ada Lovelace, George Gordon Byron) is True,


Parent(George Boole, George Everest) is False.

In fact, if we give each argument of Parent a specific value from its domain, then we get
a specific statement about those values which is either true or false. In other words, we
get a proposition.
It is a short step from here to treat Parent as returning either True or False, so we
can write

Parent(Ada Lovelace, George Gordon Byron) = True,


Parent(George Boole, George Everest) = False.

In summary, we started with a relation, Parent, treating it as a subset of ℙ×ℙ, and then
treated it as a function that takes any pair of people and returns either True or False.
So, we can view it as a truth-valued function, i.e., a function that returns a truth value.1
In this case, the domain of the function is ℙ × ℙ and its codomain is {True, False}.
These are just different ways of viewing the same thing. When we use the word
“predicate” for a relation, we are not really introducing anything new, but rather just
signalling our intention that we want to use the relation in a logical context, and to use
it to make logical statements, and maybe to build more complex logical statements from
it.

1 Predicates are occasionally called propositional functions, since they yield specific propositions about
specific values of the various arguments.
5.2𝛼 VA R i A B L E S A N D C O N S TA N T S 157

Since predicates enable us to make true or false statements about their arguments,
we can combine them using our usual logical operations: ¬, ∧, ∨, ⇒, ⇔.
statement truth value
¬ Parent(George Boole, George Everest) True
Parent(Plato, Ariston) ∧ Parent(Hypatia, Theon) True
Parent(Plato, Ariston) ∧ Parent(Ariston, Plato) False
Parent(Plato, Ariston) ∨ Parent(Ariston, Plato) True
When defining new predicates, it is conventional to use prefix notation, as we have
done with Parent. But for some predicates based on binary relations, particularly those
relating to ordering, containment or equivalence of some kind, infix notation is used.
So, when using the predicate <, we place the name of the predicate, <, in between its
arguments. (Recall the discussion of prefix and infix notation on p. 47 in § 2.5𝛼 .)
The equality predicate, =, is always considered to be available. This is because you
can’t really say anything about a class of objects if you can’t even tell when two objects
are really the same object! You do not need to be told that you are allowed to use it;
you always can, no matter what type of objects you are working with. It is the only
predicate for which we make this sweeping assumption.
A predicate with one argument (i.e., a unary predicate) is also called a property.
It corresponds naturally to a specific set, namely the set of elements of the domain of
the argument for which the predicate is True.
For example, define the predicate isNegative(𝑋 ) to be a unary predicate whose
variable 𝑋 has domain ℝ. It captures the property of being negative, and corresponds
to the set ℝ− of negative real numbers.
Exercise: what could a predicate with no arguments be?

5.2𝛼 VA R i A B L E S A N D C O N S TA N T S

A variable is a name for an object that allows us to refer to the object without specifying
it exactly.
• Often, the name is a letter from some alphabet, like 𝑥 or 𝑃 or 𝜃. It might also
have subscripts, to distinguish it from other variables where we would like to use
the same letter; the subscript is considered part of the name, so that 𝑥1 and 𝑥2
are different variables. We can also use words for names, like myFavouriteColour
or dateToday. Using words for names is usual in programming; it is much less
common in mathematics.2
• Often, we use a name for something when we don’t know it, but still want to reason
about it. This was one reason for introducing variables for unknown quantities
2 This is for at least a couple of reasons. Using entire words for variable names can tend to lead to expressions
that are cumbersome or cluttered, potentially obscuring the mathematical structure. Multi-letter names can
sometimes look like products of many variables, due to the mathematical convention of denoting products
of variables by juxtaposition (i.e., placing them next to each other).
158 P R E D i C AT E L O G i C

when you first started doing algebra at school. But this need not be the only
reason for introducing variables.

• Variables can be used when we want to talk about any member of some set in a
general way. If we want to say that the logarithm of a product is the sum of the
logarithms, we can let variables 𝑥 and 𝑦 represent any positive real numbers and
write log(𝑥𝑦) = log 𝑥 + log 𝑦.

• Variables in programming and mathematics are similar in nature, but not exactly
the same thing. In programming, a variable is associated with a piece of memory
in which the object is stored. This does not happen in mathematics. In program-
ming, variables can only refer to finite objects, but there is no such restriction in
mathematics.

We have been using variables throughout this unit. We have done so every time we
introduce a name for some object we are discussing. We often referred to sets 𝐴, 𝐵, 𝐶, …,
functions 𝑓, 𝑔, ℎ, …, and propositions 𝑃, 𝑄, …. In the previous chapter we made extensive
use of Boolean variables, which each refer to a truth value, False or True.
Every variable has an associated domain
domain, which is the set of objects which that
variable might represent. If we say, “Let 𝑥 ∈ ℤ”, then 𝑥 is a variable with domain
ℤ. Every Boolean variable has domain {False, True}, in fact that’s the definition of a
Boolean variable.
We can think of a variable as being able to take values
values, which are just the objects in
its domain. But, when we use a variable, we are not implying that it has some specific
value. If we define a variable 𝑥 with domain ℤ, then we are not assuming that 𝑥 stands
for some specific integer.
We also use the term constant for an object which belongs to the domain of some
variable. So, if we have a variable 𝑥 with domain ℤ, then any specific integer is a
constant. So, our constants include −273, −1, 0, 1, 8, 1729, and of course every other
integer.

5.3𝛼 P R E D i C AT E S A N D VA R i A B L E S

We saw in § 5.1𝛼 that, for any predicate, you can obtain specific propositions by giving
specific values to all its arguments.
In order to make more general logical statements, we can also giving variables to
some, or all, of its arguments. When doing so, we must ensure that the domain of the
variable equals the domain of that argument, keeping in mind that every variable has a
domain (p. 158) and every argument of every predicate has a domain (p. 75).
For example: if 𝑋 is a variable with domain ℙ, then we can write

Parent(Alan Turing, 𝑋 ). (5.1)


5.3𝛼 P R E D i C AT E S A N D VA R i A B L E S 159

The variable 𝑋 in this statement is freefree, meaning that no value is (yet) given to it and
it is available for values to be given it. Because it contains a free variable, this statement
does not yet have a truth value, so it is not yet a proposition.
We can, if we wish, assign values to one or more of the free variables in a logical
statement. As usual, any value given to a variable must belong to its domain. If
every free variable has been given a value from its domain, the statement becomes a
proposition. Each combination of values you give to the variables in a statement creates
a different specific proposition, potentially with different truth values.
In the statement (5.1), we can create specific propositions, with truth values, by
giving values from ℙ to the variable 𝑋 .
value of 𝑋 proposition truth value
Sara Turing Parent(Alan Turing, Sara Turing) True
Julius Turing Parent(Alan Turing, Julius Turing) True
Alonzo Church3 Parent(Alan Turing, Alonzo Church) False
For another example, consider the binary predicate <. We suppose its two arguments
each have domain ℝ. This is just the usual < relation on the real numbers. We can
create propositions by giving values to all its arguments:
statement truth value
5 < 15 True
−5 < −15 False
√2 < 1.618 True
𝜋 <𝑒 False
Suppose we have real variables 𝑥, 𝑦 (i.e., variables with domain ℝ). If we plug these into
the < predicate’s two arguments, then we get a logical statement about those variables:

𝑥 < 𝑦.

But the two variables remain free so the statement is not a proposition and has no truth
value.
We can also use a variable for one argument and a value for the other. For exam-
ple, using the real variable 𝑥 and the value −3.7 for the first and second arguments
respectively, we obtain the statement

𝑥 < −3.7.

Although it does include a substitution of a value for an argument, it also still has a free
variable, so it is not a proposition and does not have a truth value.
For another example, consider the predicate designedFirstComputerIn, which has two
arguments, the first having domain ℙ and the second having as its domain the set 𝔸 of
3 Alonzo Church (1903–1995) was Alan Turing’s PhD supervisor at Princeton, 1936–1938. So he might be
described informally as Turing’s academic father! But he is not a parent of Turing’s in the usual sense, so
(Alan Turing, Alonzo Church) ∉ Parent.
160 P R E D i C AT E L O G i C

all countries. As a relation, it consists of all pairs (𝑝, 𝑐) such that person 𝑝 designed the
first computer that was designed and built in country 𝑐. We can create some specific
propositions by assigning values to these arguments:
designedFirstComputerIn(Trevor Pearcey, Australia) True
designedFirstComputerIn(Maston Beard, Australia) True
designedFirstComputerIn(Xià Péisù, China) True
designedFirstComputerIn(Blaise Pascal, France) False
⋮ ⋮
If variable 𝑃 has domain ℙ and variable 𝑄 has domain 𝔸, we can make statements like
statement free variables
designedFirstComputerIn(𝑃, 𝑄) 𝑃, 𝑄
designedFirstComputerIn(Trevor Pearcey, 𝐶) 𝐶
designedFirstComputerIn(𝑃, France) 𝑃
But, in this case, it would be an error to write designedFirstComputerIn(𝑄, 𝑃), with 𝑃
and 𝑄 the other way round, because the domains do not match: variable 𝑃, with domain
ℙ, is given to the second argument, which has domain 𝔸 (and a similar mis-match for
variable 𝑄 in the first argument).

5.4 A R G U M E N T S O F P R E D i C AT E S

When using a predicate, we need to put something into each of its arguments.
We have seen in § 5.1𝛼 and § 5.3𝛼 that we can put constants and/or variables (or a mix
of these) into the arguments of a predicate, provided the domains match appropriately.
But we also want to be able to put more complex expressions into the arguments.
For example, we may want to make statements like
𝑎2 + 𝑏2 < 𝑐 2
𝑒𝑖𝜋 + 1 = 0
Parent(Caroline Herschel, MotherOf(William Herschel))
knows(X, MotherOf(FatherOf(Y))).
To do this, the expressions we use for predicate arguments need to be able to use func-
tions as well as constants and variables. But which functions can we use? Usually, we
will work in settings where the available functions are specified up-front. But we still
need to ensure that the expressions that use functions are properly constructed.
Informally, the expressions that can be put into an argument of a predicate can be
any expression that makes sense in the domain of that argument and which only uses
functions that are known to be available.
For example, if we are dealing with the predicate < with real arguments, then the
first statement above, 𝑎 2 + 𝑏 2 < 𝑐 2 , is ok provided that
• the squaring operation is taken to be a function from ℝ to ℝ, so that its codomain
matches the domain of the second argument of <;
5.4 A R G U M E N T S O F P R E D i C AT E S 161

• the variables 𝑎, 𝑏, 𝑐 each have domain ℝ, which ensures they can be used as argu-
ments for the real squaring function;

• the addition operation, +, is taken to be a function from ℝ × ℝ to ℝ. Again, this


ensures that its codomain matches the domain of the first argument of <. Its
domain being ℝ × ℝ also ensures that we can plug 𝑎 2 and 𝑏 2 into its arguments,
assuming again that we are using the squaring function on the reals.

The last example above, knows(𝑋 , MotherOf(FatherOf(𝑌))), is ok provided

• variable 𝑋 has domain ℙ (the set of all people who have ever lived), so that its
domain matches the domain of the first argument of knows;

• the functions MotherOf and FatherOf each have ℙ as their domain and codomain,
to ensure that the composition MotherOf ∘ FatherOf is defined, and to ensure that
the composition’s codomain matches the domain of the second argument of knows
(which it gets pluggged into);

• variable 𝑌 also has domain ℙ, to match the domain of the function FatherOf, since
𝑌 is plugged into the argument of the FatherOf.

In general, apart from constants and variables from appropriate domains, the other
things we can put into a predicate arguments are all expressions that use a function.
In fact, such an expression might contain several uses of functions, like in 𝑎 2 + 𝑏 2 or
MotherOf(FatherOf(𝑌))). But there will always be one function in the expression which
was applied last, i.e., which we think of as producing something to be given to the
argument of the predicate.

• In the expression 𝑎 2 + 𝑏 2 , the addition function is applied last, after both the
squarings have been done. We think of the addition as producing something which
goes into the first argument of the predicate <.

• In the expression MotherOf(FatherOf(𝑌))), the function MotherOf is applied last,


and we think of it as producing a person (in fact, a mother) who is then put into
the second argument of the predicate knows.

But, as we saw from our examples above, it is not enough that the function’s codomain
matches the domain of the predicate’s argument. The function also has arguments,
each of which also has a domain, and anything put there must match the domain of the
argument where it is put. What can we put into a function’s arguments? The same rules
apply here as apply to arguments of predicates. The things we can put into a function’s
arguments are constants from that argument’s domain, variables with the same domain,
and (again) functions whose codomain equals the domain of the argument where it is
put. And those functions, too, have arguments, and the same rule applies to them! And
so on, and so on, …although we are not allowed to go on forever and construct infinite
expressions! So, following these rules, we can construct, from constants, variables and
162 P R E D i C AT E L O G i C

functions, any expression that is allowed to be put into the argument of any other
function, and ultimately, any expression that is allowed to be put into the argument of
a predicate.
Expressions that are allowed to be put into the arguments of predicates can be
defined more formally, and we do so now for completeness. Such an expression is called
a term.
We suppose, at the outset, that we have the following ingredients:

• a set of variables, each with an associated domain;

• a set of functions.

A term of domain 𝐷 consists of any of the following:

• a constant from 𝐷;

• a variable with domain 𝐷

• a function with codomain 𝐷, in which each argument has a term whose domain is
the same as the domain of the argument.

We can illustrate this definition using a couple of our recent examples.

• The expression 𝑎 2 + 𝑏 2 is a term with domain ℝ because it consists of a function,


namely real addition, with codomain ℝ, whose two arguments (each with domain
ℝ) have each been given a term with domain ℝ. The two terms given to the two
arguments of + are 𝑎 2 and 𝑏 2 , which are each formed from the squaring function
with codomain ℝ whose domain has been given a real variable, 𝑎 or 𝑏.

• The expression MotherOf(FatherOf(𝑌)) is a term of domain ℙ because it is formed


from a function, MotherOf, whose codomain is ℙ, and whose argument (with do-
main ℙ) has been given a term of the same domain, FatherOf(𝑌). And that, in
turn, is a term of domain ℙ because it has codomain ℙ and its argument has the
same domain as the variable it has been given.

If we are working with real numbers and we have functions +, −, ×, /, √ and real
variables 𝑥 and 𝑦, then some examples of terms are:

−5, 22/7, 𝑦, 3 × 𝑥 + 1, √𝑥 × 𝑥 − 3 × 3, 2×𝜋 ×𝑥

But not everything is a term:

• The following six “expressions” are not terms because they have been incorrectly
formed (either by breaking the rules for how to use the provided functions, or
using objects, variables or functions that are not available):

2 + 3+ , 𝑥 ++, log 𝑥 , 𝑧, 𝑒𝑖𝜋 , 𝑥∧𝑦


5.5 B U i L D i N G L O G i C A L E X P R E S S i O N S W i T H P R E D i C AT E S 163

• In this scenario, character constants are not considered, so the expression w + 16 is


not a term. (Here, the letter w is not being used as a variable name, but is “just
itself”, i.e., the 23rd letter of the alphabet and nothing more).
We will not usually fuss over these details of the construction of expressions we use
inside predicates. Mostly, we will draw on our prior experience of what constitutes a valid
expression using standard operations in domains we are used to. We know, for example,
how to construct a valid expression involving constants, variables and functions involving
numbers, and we know how to compose functions. But it is important to understand
that expressions are constructed according to formal rules, and also that, in any given
situation, we can only use the functions that are available to us, and usually these will
be clearly specified.
Now that we know the kinds of expressions we can put into the arguments of pred-
icates, we can recognise many instances of this in earlier chapters. For example, The-
orem 1 stated that 𝐴 ∪ 𝐵 = 𝐴 ∩ 𝐵. This uses the equality predicate for sets, with first
argument 𝐴 ∪ 𝐵 and second argument 𝐴 ∩𝐵. Each of these is an expression in sets, using
the set functions complement, union and intersection and set variables 𝐴 and 𝐵.

5.5 B U i L D i N G L O G i C A L E X P R E S S i O N S W i T H P R E D i C AT E S

Once we have predicates whose arguments are terms with appropriate domains, we
can combine them using all the Boolean operations (logical connectives) we studied in
Chapter 4. So we can construct expressions like those in the left column of the following
table (with free variables listed in the right column).
logical expression free variables
(1 < 𝑖) ∧ (𝑖 < 𝑛) 𝑖, 𝑛
1 1+2 1+2 2
⒧ < ⒭∧⒧ < ⒭ none
3 3+5 3+5 5
((𝑥 > 0) ∧ (𝑦 > 0)) ⇒ (log(𝑥𝑦) = log 𝑥 + log 𝑦) 𝑥, 𝑦
knows(𝑃, 𝑄) ∧ knows(𝑄, 𝑅) ∧ ¬knows(𝑃, 𝑅) 𝑃, 𝑄, 𝑅
Parent(𝑋 , 𝑌) ⇔ (𝑌 = MotherOf(𝑋 )) ∨ (𝑌 = FatherOf(𝑋 )) 𝑋, 𝑌

Formation of predicate logic expressions is done according to the following rules,


noting that this list of rules is incomplete and we will meet more later (in § 5.11).
• The truth values True and False are predicate logic expressions.

• If terms of the required domains are plugged into the arguments of a predicate,
then the result is a predicate logic expression.

• If 𝐸 and 𝐹 is a predicate logic expression, then each of the following is a predicate


logic expression:

(𝐸), ¬𝐸, 𝐸 ∧ 𝐹, 𝐸 ∨ 𝐹, 𝐸 ⇒ 𝐹, 𝐸 ⇔ 𝐹.
164 P R E D i C AT E L O G i C

We can manipulate predicate logic expressions using the usual rules of Boolean
algebra. This extends the realm of those rules. They were introduced in Chapter 4 in the
context of propositions, but now we are using them for predicate expressions, and these
may have variables and therefore might not be propositions. But every assignment of
values to all the variables makes a predicate expression True or False, so on that basis we
can combine these expressions using logical operations just as we combined propositions
using logical operations in Chapter 4.
For example, consider again the first expression in the above table,

(1 < 𝑖) ∧ (𝑖 < 𝑛),

which is really just a more detailed way of writing 1 < 𝑖 < 𝑛 (in fact, it makes clear
the exact logical meaning of that chain of two inequalities). If we want to say that the
expression is not satisfied, then we can negate it, using De Morgan’s Law:

¬((1 < 𝑖) ∧ (𝑖 < 𝑛)) = ¬(1 < 𝑖) ∨ ¬(𝑖 < 𝑛),

which accords with our understanding that if 𝑖 does not lie between 1 and 𝑛 then it
must be ≤ 1 or ≥ 𝑛.

5.6 EXiSTENTiAL QUANTiFiER

At the start of this chapter, we mentioned that predicate logic enables us to make
statements about variables that are always true, or sometimes true, or never true. We
can do this with quantifiers. We now look at the one of these, which covers “sometimes
true”.
The existential quantifier is written ∃ and read as “there exists”. It is placed before
a variable to mean that there exists some value of that variable, within the variable’s
domain, that makes the subsequent statement True.
For example, consider the statement

There’s a fly in my soup.

This may be written

∃𝑋 ∶ (𝑋 is a fly) ∧ (𝑋 is in my soup).

Suppose we have two unary predicates, Fly and InMySoup. For each of these, suppose
that the domain of its sole argument is the set of everything on Earth, and that the
variable 𝑋 has this domain too. So Fly(𝑋 ) is a predicate logic expression meaning that
“𝑋 is a fly”, and InMySoup(𝑋 ) is a predicate logic expression meaning that “𝑋 is in my
soup”.
5.6 E X i S T E N T i A L Q U A N T i F i E R 165

The conjunction of any two predicate logic expressions is another predicate logic
expression (§ 5.5). So
Fly(𝑋 ) ∧ InMySoup(𝑋 )
is another predicate logic expression, meaning that “𝑋 is a fly and 𝑋 is in my soup”.
For any specific object on Earth, plugging it into 𝑋 in this expression gives a specific
proposition, which may be True or False. In this sense, this predicate logic expression
may be viewed as representing many possible statements, one for each object on Earth.
Putting ∃𝑋 in front of (i.e., at the very left of) this expression gives the statement

∃𝑋 ∶ Fly(𝑋 ) ∧ InMySoup(𝑋 ). (5.2)

This asserts that

There exists 𝑋 such that 𝑋 is a fly and 𝑋 is in my soup.

This is just a rewording of the statement we started with. And note that it is just one
single statement, rather than representing many possible statements.
The colon after “∃𝑋 ” in (5.2) is merely punctuation. It is usually read as “such that”,
and provides convenient visual separation between “∃𝑋 ” and the condition that 𝑋 must
satisfy. But it is fine to omit the colon, if the expression is still clear. Sometimes a full
stop is used instead of the colon. So either of the following is also correct:

∃𝑋 Fly(𝑋 ) ∧ InMySoup(𝑋 ),
∃𝑋 . Fly(𝑋 ) ∧ InMySoup(𝑋 ).

The domain of each variable is often clear from the context. Alternatively, it might be
specified as part of the written expression by specifying domain membership immediately
after the variable being quantified, as in, ∃𝑌 ∈ ℚ ⋯ (so in this case the domain of 𝑌 is
ℚ). The domain of a variable certainly affects the meaning of the expression it is part
of, in general; changing the domain might change the truth value of the expression.
For example, consider the following expression, written as text on the left and as a
logical expression on the right.

There exists 𝑊 : 𝑊 is negative. ∃𝑊 ∶ 𝑊 < 0.

If domain of 𝑊 is ℕ, then we could write the expression as ∃𝑊 ∈ ℕ ∶ 𝑊 < 0, and it


is False.
If domain of 𝑊 is ℤ, then we could write the expression as ∃𝑊 ∈ ℤ ∶ 𝑊 < 0, and it
is True.

There is an analogy between the existential quantifier and disjunction. In each case,
the expression that uses them is True if and only if at least one of its “possibilities” is
True. For a disjunction, we require that at least one of the parts of the disjunction is
True; for an existential quantifier applied to some variable 𝑋 , we require that at least
166 P R E D i C AT E L O G i C

one value of 𝑋 makes the entire expression True. In other words, at least one member
of the domain of 𝑋 may be assigned to 𝑋 to make the expression True.
For example, suppose we have the statement “Someone did it”. We may write this
as
∃𝑋 ∶ 𝑋 did it. (5.3)
Suppose the domain of 𝑋 is a large set of people,

{… , Annie, Edward, Henrietta, Radhanath, …}.

Then the quantified expression (5.3) is sort-of like a disjunction …

⋯ ⋯ ∨ (Annie did it) ∨ (Edward did it) ∨ (Henrietta did it) ∨ (Radhanath did it) ∨ ⋯ ⋯

However, the existential quantifier is not just a shorthand notation for disjunction.
Firstly, if the domain of a variable is infinite, then existential quantification over that
variable cannot be replaced by a disjunction because a disjunction is only allowed to
have finitely many parts (and, indeed, logical expressions in general must be of finite
size). Secondly, variables and their quantifiers allow us to do some reasoning that cannot
be done in propositional logic.

Once a variable in an expression has had a quantifier applied to it, so that all
occurrences of the variable come after the quantifier and are subject to it, the variable
is said to be bound. You can no longer give specific values to the variable. So you can
no longer create specific propositions by giving specific values to every free variable.
So, for example, in our statement “there’s a fly in my soup”, formalised in (5.2), there
is no free variable. The variable 𝑋 is bound. We now have one specific proposition,
which has a specific truth value. The variable 𝑋 has not been given any value, but that
does not mean it is available to have values plugged into it. Now that 𝑋 is bound, it is
no longer free, and no longer available to receive values.
Normally, we put a quantifier in front of an expression containing the quantified
variable, as in the example
∃𝑤 ∶ 𝑤 < 0.
It’s also legal to put a quantifier, with its variable, in front of an expression that does
not contain that variable. Here are some examples of this.

∃𝑤 ∶ 𝑧 < 0, ∃𝑤 ∶ 1 + 1 = 2, ∃𝑤 ∶ 1 + 1 = 3.

In each case, the quantifier may as well not be there. The variable 𝑤 is irrelevant to the
truth of 𝑧 < 0, so whether this first statement is true or not depends solely on the truth
or otherwise of 𝑧 < 0. In the second and third examples, 𝑤 is also irrelevant, and in
those cases there are no other variables so we can discard the quantifiers and conclude
5.7 R E S T R i C T i N G E X i S T E N T i A L LY Q U A N T i F i E D VA R i A B L E S 167

that the second statement is True and the third is False. So these three statements are
equivalent to
𝑧 < 0, 1 + 1 = 2, 1 + 1 = 3,
respectively. For another couple of examples, where the expression after ∃𝑥 is just a
logical constant:

∃𝑥 True is equivalent to True,


∃𝑥 False is equivalent to False.

Note that quantifiers can only be used with variables. Using them with constant
objects makes no sense. It is an error to write something like ∃5, ∃Annie.

5.7 R E S T R i C T i N G E X i S T E N T i A L LY Q U A N T i F i E D VA R i A B L E S

It is useful to consider how to incorporate a restriction on an existentially-quantified


variable into a predicate logic statement.
Suppose we have the statement

Some computer is human. i.e., There exists a human computer.

To begin with, suppose the domain of 𝑋 is { computers }, and that we have the
predicate human(𝑋 ) which is intended to mean that 𝑋 is human.
Then our statement may be written

∃𝑋 ∶ human(𝑋 ).

But what if the domain of 𝑋 is { everything on Earth } ?


In this case, suppose we have another predicate, computer(𝑋 ), which means that 𝑋
is a computer.
The following table shows the correct expression on the left and an incorrect attempt
at the expression on the right, together with different ways of writing each in words.

Correct: Incorrect:
∃𝑋 ∶ computer(𝑋 ) ∧ human(𝑋 ) ∃𝑋 ∶ computer(𝑋 ) ⇒ human(𝑋 )

• “There exists something that is both • “There exists something which is not
computer and human.” a computer or is human.”

• “There exists a human computer.” • “There exists something which is not


both a computer and non-human.”
• “Some computer is human.”
• “Not everything is a nonhuman com-
puter.”
168 P R E D i C AT E L O G i C

The general principle at work here is as follows. Let 𝑃 be a unary predicate whose
sole argument has domain 𝐷, and let 𝑋 be a variable with the same domain, 𝐷. The
existential statement
∃𝑋 ∶ 𝑃(𝑋 )
says that there is at least one 𝑋 ∈ 𝐷 for which 𝑃(𝑋 ) holds. But suppose we want to
make this assertion only for those 𝑋 that also satisfy 𝑅(𝑋 ) (for some other predicate
𝑅, and with 𝑋 still having domain 𝐷). In other words, we want to say that there is
at least one 𝑋 satisfying 𝑅(𝑋 ) that also satisfies 𝑃(𝑋 ). Then we can do this with the
statement
∃𝑋 ∶ 𝑅(𝑋 ) ∧ 𝑃(𝑋 ).
So, the restriction to those 𝑋 that satisfy 𝑅(𝑋 ) is done by putting 𝑅(𝑋 ) in conjunction
with 𝑃(𝑋 ).

5.8 UNiVERSAL QUANTiFiER

Now we consider how to assert that a statements about variables is “always true”.
The universal quantifier is written ∀ and read as “for all” (or “for every” or “for
each” 4 ). It is placed before a variable to mean that for all values of that variable, within
the variable’s domain, the subsequent statement is True.
For example, consider the statement

Everyone can pass. i.e., For every 𝑋 : 𝑋 can pass.

This may be written


∀𝑋 ∶ canPass(𝑋 ),
using canPass as a name for the unary predicate we are using. This statement happens
to be True, provided the domain of 𝑋 is the set of students who have the required
prerequisite knowledge have been allowed to enrol in the subject.
Again, the colon serves as convenient punctuation, but is not compulsory. It can
be omitted, or sometimes a full stop is used instead. So each of the following is also
allowed.
∀𝑋 canPass(𝑋 ),
∀𝑋 . canPass(𝑋 ).
This time, we don’t read out the colon as “such that”, since that wording does not read
naturally in this context.
Here is a famous statement, written in three equivalent ways, with the third using
isInteresting as a name for the predicate we are using:

All numbers are interesting. ∀𝑋 ∶ 𝑋 is interesting. ∀𝑋 ∶ isInteresting(𝑋 ).

This statement is also True, and we proved it in Theorem 16!


4 but we do not read it as “for some”, because that means the same as “there exists”.
5.9 R E S T R i C T i N G U N i V E R S A L LY Q U A N T i F i E D VA R i A B L E S 169

The following statement is False, whether the domain of 𝑊 is ℕ or ℤ or ℝ.

For all 𝑊 : 𝑊 is negative ∀ 𝑊 ∶ 𝑊 < 0.

Now that we’ve applied quantifiers to the variables in these expressions, the variables
are all bound
bound. Again, once a variable is bound, we can no longer assign values to it.
As with the existential quantifier, if we use the universal quantifier with a variable
and then follow it with an expression that does not include that variable, then the
quantication is irrelevant and can be dropped. It’s like saying, “for every dog, 𝑛 is prime”,
which is equivalent to just saying that “𝑛 is prime” since the primality or otherwise of 𝑛
has nothing to do with dogs. So the statements

∀𝑤 ∶ 𝑧 < 0, ∀𝑤 ∶ 1 + 1 = 2, ∀𝑤 ∶ 1 + 1 = 3.

are equivalent to
𝑧 < 0, 1 + 1 = 2, 1 + 1 = 3,
respectively.

5.9 R E S T R i C T i N G U N i V E R S A L LY Q U A N T i F i E D VA R i A B L E S

We now consider how to incorporate a restriction on a universally-quantified variable


into a predicate logic statement.
Again, we use our statement that “Every computer is human” and suppose we have
the predicate human(𝑋 ) with the same meaning as before, in § 5.7.
If the domain of 𝑋 is { computers }, then our statement may be written

∀𝑋 ∶ human(𝑋 ).

But what if the domain of 𝑋 is { everything on Earth } ?


As before (§ 5.7), suppose we have the predicate computer(𝑋 ).
The following table shows the correct expression on the right (not the left, this time)
and an incorrect attempt at the expression on the left. Again, we give a few different
ways of writing each in words.

Incorrect: Correct:
∀𝑋 ∶ computer(𝑋 ) ∧ human(𝑋 ) ∀𝑋 ∶ computer(𝑋 ) ⇒ human(𝑋 )

• “Everything is both computer and • “For everything, if it’s a computer,


human.” then it’s human.”

• “Everything is a human computer.” • “Everything that’s a computer is also


human.”

• “Every computer is human.”


170 P R E D i C AT E L O G i C

The general principle is as follows. Let 𝑃 be a unary predicate whose sole argu-
ment has domain 𝐷, and let 𝑋 be a variable with the same domain, 𝐷. The universal
statement
∀𝑋 ∶ 𝑃(𝑋 )
says that every 𝑋 ∈ 𝐷 satisfies 𝑃(𝑋 ). But suppose we want to make this assertion only
for those 𝑋 that also satisfy 𝑅(𝑋 ) (for some other predicate 𝑅, and with 𝑋 still having
domain 𝐷). In other words, we want to say that every 𝑋 satisfying 𝑅(𝑋 ) also satisfies
𝑃(𝑋 ). Then we can do this with the statement

∀𝑋 ∶ 𝑅(𝑋 ) ⇒ 𝑃(𝑋 ).

So, the restriction to those 𝑋 that satisfy 𝑅(𝑋 ) is done by making 𝑅(𝑋 ) imply 𝑃(𝑋 )
in the expression after the quantified variable.

5.10 M U LT i P L E Q U A N T i F i E R S

It is permissible to have multiple quantifiers in an expression.


If there are two consecutive quantifiers at the start of an expression, then they must
have different variables. (It makes no sense to write something like ∃𝑋 ∀𝑋 ⋯.)
We illustrate how quantifiers work together using the binary predicate hasVisited,
defined as follows. Its first argument can be any person and its second argument can
be any country. If 𝑃 is a person and 𝐶 is a language, then hasVisited(𝑃, 𝐶) means that
person 𝑃 has visited country 𝐶.
Consider the statement

“Someone has visited every country”,

which may be written using our predicate as

∃𝑃 ∀𝐶 hasVisited(𝑃, 𝐶). (5.4)

After the first quantifier and its variable, ∃𝑃, we have the expression

∀𝐶 hasVisited(𝑃, 𝐶) (5.5)

which asserts that 𝑃 has visited every country. Two variables — 𝑃 and 𝐶 — appear
in (5.5), but they play different roles, since 𝑃 is free in (5.5) but 𝐶 is bound by the ∀𝐶
at the start. So the expression (5.5) has only one free variable. The presence of this free
variable still prevents this expression (5.5) from being a proposition. In fact, we could
use it as the definition of a new unary predicate, hasVisitedEveryCountry: if 𝑃 is any
person, then hasVisitedEveryCountry(𝑃) means that 𝑃 has indeed visited every country:

hasVisitedEveryCountry(𝑃) if and only if ∀𝐶 hasVisited(𝑃, 𝐶).


5.10 M U LT i P L E Q U A N T i F i E R S 171

Then, we put ∃𝑃 in front of (5.5) to make (5.4). This means 𝑃 is now quantified
and therefore bound, so the full expression in (5.4) has no free variable and becomes a
proposition. That full expression has the same meaning as

∃𝑃 hasVisitedEveryCountry(𝑃).

We summarise these remarks as follows.


hasVisitedEveryCountry(𝑃)

∃𝑃 ∀𝐶 hasVisited(𝑃, 𝐶)

𝑃, 𝐶 both free

𝑃 free, 𝐶 bound

𝑃, 𝐶 both bound

The following table shows how we might represent various statements involving
the predicate hasVisited, including the one we have just discussed in detail, with the
statements on the left and the predicate logic expressions on the right.

Someone has visited some country. ∃𝑃 ∃𝐶 ∶ hasVisited(𝑃, 𝐶).

Everyone has visited every country. ∀𝑃 ∀𝐶 ∶ hasVisited(𝑃, 𝐶).

Someone has visited every country. ∃𝑃 ∀𝐶 ∶ hasVisited(𝑃, 𝐶).

Everyone has visited some country. ∀𝑃 ∃𝐶 ∶ hasVisited(𝑃, 𝐶).

Some country has been visited by everyone. ∃𝐶 ∀𝑃 ∶ hasVisited(𝑃, 𝐶).

Every country has been visited by someone. ∀𝐶 ∃𝑃 ∶ hasVisited(𝑃, 𝐶).

You should think about each of these examples carefully. Think about what they
each mean and how they differ from each other. Some are obviously true, some are
obviously false, some seem likely to be true although you may not know that for a fact.
One important issue to think about is whether the order of the quantifiers matters.

• We have given one example with two existential quantifiers, and one with two
universal quantifiers. In each case, we could try putting them the other way
round, e.g., we could try ∃𝑌∃𝑋 ⋯ instead of ∃𝑋 ∃𝑌 ⋯. Would that give us a new
statement, or is it just another way of saying the same thing?

• We have given four examples that use a mix of our two quantifier types. Are they
all logically different, or are some of them equivalent?
172 P R E D i C AT E L O G i C

When you have two variables in successive identical quantifiers, i.e., ∃𝑋 ∃𝑌 ⋯ or


∀𝑋 ∀𝑌 ⋯, they can be replaced by a single quantifier of the same type, applied to a
pair of variables. So ∃𝑋 ∃𝑌 ⋯ can be replaced by ∃(𝑋 , 𝑌) ⋯, which is sometimes writ-
ten ∃𝑋 , 𝑌 ⋯, and ∀𝑋 ∀𝑌 ⋯ can be replaced by ∀(𝑋 , 𝑌) ⋯, which is sometimes written
∀𝑋 , 𝑌 ⋯. So, in the above example, ∃𝑃 ∃𝐶 ∶ hasVisited(𝑃, 𝐶) can be rewritten as
∃(𝑃, 𝐶) ∶ hasVisited(𝑃, 𝐶), and so on.
For two consecutive uses of the same quantifier, it also does not matter which way
round we put the quantified variables at the start. So, ∃𝑋 ∃𝑌 ⋯ and ∃𝑌∃𝑋 ⋯ amount
to the same thing, and both are the same as ∃(𝑋 , 𝑌) ⋯, and ∃𝑋 , 𝑌 ⋯, and ∃𝑌, 𝑋 ⋯.
Similarly, ∀𝑋 ∀𝑌 ⋯ and ∀𝑌∀𝑋 ⋯ are equivalent, and both are the same as ∀(𝑋 , 𝑌) ⋯,
and ∀𝑋 , 𝑌 ⋯, and ∀𝑌, 𝑋 ⋯.
But the situation is different when the quantifiers are different. In fact, all the four
possibilities
∃𝑋 ∀𝑌 ⋯
∀𝑋 ∃𝑌 ⋯
∃𝑌∀𝑋 ⋯
∀𝑌∃𝑋 ⋯
are different in meaning, in general.

5.11 P R E D i C AT E L O G i C E X P R E S S i O N S

Now that we have quantifiers, we are at last able to give a complete definition of predicate
logic expressions, extending and finishing the work we began on p. 163 in § 5.5.
A predicate logic expression is any of the following.
• The truth values True and False are predicate logic expressions.

• If terms of the required domains are plugged into the arguments of a predicate,
then the result is a predicate logic expression.

• If 𝐸 and 𝐹 is a predicate logic expression, then each of the following is a predicate


logic expression:

(𝐸), ¬𝐸, 𝐸 ∧ 𝐹, 𝐸 ∨ 𝐹, 𝐸 ⇒ 𝐹, 𝐸 ⇔ 𝐹.

• If 𝐸 is a predicate logic expression and 𝑋 is a free variable of 𝐸, then each of the


following is a predicate logic expression:

∃𝑋 ∶ 𝐸
∀𝑋 ∶ 𝐸

But, in each case, 𝑋 is no longer a free variable in the new expression. So, if 𝑉 is
the set of free variables of the expression 𝐸, then the set of free variables of ∃𝑋 ∶ 𝐸,
and the set of free variables of ∀𝑋 ∶ 𝐸, are each 𝑉 ∖ {𝑋 }.
5.12 D O i N G L O G i C W i T H Q U A N T i F i E R S 173

5.12 DOiNG LOGiC WiTH QUANTiFiERS

All the rules of Boolean algebra (§ 4.15) are available to us when working with predicate
logic. We discussed this at the end of § 5.5.
There are also some other rules for doing logic involving quantifiers.
Let 𝑃(𝑋 ) be a predicate logic expression with a free variable 𝑋 . If we know that

∀𝑋 𝑃(𝑋 )

and obj is any specific object (in the domain of 𝑋 ), then we can deduce that

𝑃(obj).

In other words, if 𝑃(𝑋 ) is True for all 𝑋 , then it’s certainly true for any specific value
from the domain of 𝑋 :
(∀𝑋 𝑃(𝑋 )) ⟹ 𝑃(obj)
In similar vein, if it’s True for a specific value from the domain of 𝑋 , then its certainly
True for some 𝑋 .
𝑃(obj) ⟹ (∃𝑋 𝑃(𝑋 )) (5.6)
Universal quantifiers and conjunction mix in a natural way. For any predicates 𝑃
and 𝑄,

∀𝑋 (𝑃(𝑋 ) ∧ 𝑄(𝑋 )) is logically equivalent to (∀𝑋 𝑃(𝑋 )) ∧ (∀𝑋 𝑄(𝑋 )) (5.7)

Similarly, existential quantifiers and disjunction mix naturally too.

∃𝑋 (𝑃(𝑋 ) ∨ 𝑄(𝑋 )) is logically equivalent to (∃𝑋 𝑃(𝑋 )) ∨ (∃𝑋 𝑄(𝑋 )) (5.8)

The expressions on the right, in (5.7) and (5.8), raise the important issue of the
scope of variables. These expressions each contain two separate quantifications over 𝑋 .
Specifically, (5.7) contains ∀𝑋 twice, and (5.8) contains ∃𝑋 twice. To which parts of the
entire expression do each of these quantifiers apply? Does the first ∀𝑋 in (5.7) apply
to every appearance of 𝑋 in the rest of the expression? If so, how does that first ∀𝑋
interact with the second ∀𝑋 ? If not, how do we know that?
The scope of a quantified variable in a predicate logic expression is the portion of
the expression (i.e., the “sub-expression”) in which that variable has meaning, and goes
from its quantifier to either
• the end of the innermost pair of enclosing parentheses, if such a pair of parentheses
exists, or
• the end of the entire expression, if there is no enclosing pair of parentheses.
So, in the left expression in (5.7), the scope of 𝑋 is the entire expression, since ∀𝑋 is
not enclosed by any parentheses. (There are parentheses in the expression, but they do
174 P R E D i C AT E L O G i C

not enclose ∀𝑋 so they have no bearing on the scope of 𝑋 .) But in the right expression
in (5.7), we actually have two variables with separate scopes. The first ∀𝑋 is enclosed
in a pair of parentheses, and its scope is sub-expression ∀𝑋 𝑃(𝑋 ) on the left of ∧. The
second ∀𝑋 is enclosed in a different, and completely separate, pair of parentheses, and its
scope is the sub-expression ∀𝑋 𝑄(𝑋 ) on the right of ∧. These two scopes do not overlap;
there is no appearance of 𝑋 that belongs to both scopes, so there is no ambiguity over
which quantifier governs each appearance of 𝑋 . In effect, the two appearances of 𝑋 ,
each with its own scope separate from the other’s scope, are local to those scopes. It is
up to the reader to see that these two variables are different, even though they have the
same name, and keep track of their different scopes. Such variables in predicate logic
are like local variables in programs, which many programming languages provide for,
including Python.
Summarising for the examples in (5.7) and (5.8), we have

∀𝑋 (𝑃(𝑋 ) ∧ 𝑄(𝑋 ))
 ( ∀𝑋 𝑃(𝑋 ) ) ∧ ( ∀𝑋
 𝑄(𝑋 ) )

scope of 𝑋 scope of first 𝑋 scope of second 𝑋

∃𝑋 (𝑃(𝑋 ) ∨ 𝑄(𝑋 ))
 ( ∃𝑋 𝑃(𝑋 ) ) ∨ ( ∃𝑋
 𝑄(𝑋 ) )

scope of 𝑋 scope of first 𝑋 scope of second 𝑋

When a variable is bound, we can change its name, provided we do so to every


appearance of that variable within its scope, and to no other variable of the same name.
For example, consider the first 𝑋 in

(∀𝑋 𝑃(𝑋 )) ∧ (∀𝑋 𝑄(𝑋 )).

If we change the first 𝑋 to 𝑊, taking care to do it throughout the scope of that first 𝑋
and nowhere else, then we get the equivalent expression

(∀𝑊 𝑃(𝑊)) ∧ (∀𝑋 𝑄(𝑋 )).

So we can say that

∀𝑋 (𝑃(𝑋 ) ∧ 𝑄(𝑋 )) is logically equivalent to (∀𝑊 𝑃(𝑊)) ∧ (∀𝑋 𝑄(𝑋 )).

Having discussed how ∀ mixes well with ∧, and how ∃ mixes well with ∨, it is natural
to ask: how well do the other pairings mix? How does ∀ mix with ∨? How does ∃ mix
with ∧?
In detail, what can we say about the logical relationship between …

∀𝑋 (𝑃(𝑋 ) ∨ 𝑄(𝑋 )) and (∀𝑋 𝑃(𝑋 )) ∨ (∀𝑋 𝑄(𝑋 ))

…? Are they equivalent, or is there an implication in one direction, or is there no direct


logical relationship?
5.13 D U A L i T Y B E T W E E N Q U A N T i F i E R S 175

Similarly, what is the logical relationship between …

∃𝑋 (𝑃(𝑋 ) ∧ 𝑄(𝑋 )) and (∃𝑋 𝑃(𝑋 )) ∧ (∃𝑋 𝑄(𝑋 ))

…?
We consider this question further in Exercise 14.

5.13 DUALiTY BETWEEN QUANTiFiERS

If you have negation immediately to the left of a quantifier, then you may move it to
the right of the quantifier (and its associated variable) provided you “flip” the quantifier
as you do so (∃ ⟷ ∀).
¬ ∀𝑌 means the same as ∃𝑌 ¬
¬ ∃𝑌 means the same as ∀𝑌 ¬
So, for any predicate logic expression 𝑃(𝑋 ) in which the variable 𝑋 is free, we have
the two laws

¬ ∀𝑋 𝑃(𝑋 ) = ∃𝑋 ¬ 𝑃(𝑋 ),
¬ ∃𝑋 𝑃(𝑋 ) = ∀𝑋 ¬ 𝑃(𝑋 ).

These are De Morgan’s Laws for quantifiers.


quantifiers This terminology reflects the relation-
ship between ∃ and disjunction, and the relationship between ∀ and conjunction.
So, for example, “Not all stars are visible” is the same as “There exists an invisible
star”. To see this using logic, suppose we have unary predicates star and visible:

¬ ∀𝑋 (star(𝑋 ) ⇒ visible(𝑋 )) Not all stars are visible


= ∃𝑋 ¬(star(𝑋 ) ⇒ visible(𝑋 )) (changing ¬ ∀ to ∃ ¬)
= ∃𝑋 ¬(¬star(𝑋 ) ∨ visible(𝑋 )) (representing ⇒ using ¬ and ∨)
= ∃𝑋 (¬¬star(𝑋 ) ∧ ¬visible(𝑋 )) (by De Morgan’s Law)
= ∃𝑋 (star(𝑋 ) ∧ ¬visible(𝑋 )) There exists an invisible star

Similarly,
¬ ∀𝑌 ¬ means the same as ∃𝑌
¬ ∃𝑌 ¬ means the same as ∀𝑌

5.14 S U M M A RY O F R U L E S F O R L O G i C W i T H Q U A N T i F i E R S

When doing predicate logic, we can use all the rules of propositional logic (§ 4.15)
together with the principles for doing logic with quantifiers that we have introduced in
this chapter. We summarise these principles below.
176 P R E D i C AT E L O G i C

The notation we use in our list is:

• 𝑃 and 𝑄 can be any predicates, and more generally they can be any predicate
logic expressions in which the indicated variables appear (e.g., 𝑃(𝑥) can be any
predicate logic expression in which 𝑥 appears);

• 𝑅 can be any predicate logic expression in which 𝑥 does not appear as a free
variable;

• 𝑥 and 𝑦 are free variables in the predicate expressions containing them;

• “obj” can be any specific object in the domain of 𝑥;

• 𝐴 is any subset of the domain of 𝑥, and Π𝐴 is the predicate which is True if


its argument is in 𝐴 and False otherwise (so Π𝐴 is just the set 𝐴 turned into a
predicate).

Here is the list of principles, with the sections discussing them on the right.
∃𝑥 ∈ 𝐴 𝑃(𝑥) is equivalent to ∃𝑥 (Π𝐴 (𝑥) ∧ 𝑃(𝑥)) § 5.7
∀𝑥 ∈ 𝐴 𝑃(𝑥) is equivalent to ∀𝑥 (Π𝐴 (𝑥) ⇒ 𝑃(𝑥)) § 5.9

∃𝑥 𝑅 is equivalent to 𝑅 § 5.7
∀𝑥 𝑅 is equivalent to 𝑅 § 5.9

∃𝑥∃𝑦 𝑃(𝑥, 𝑦) is equivalent to ∃𝑦∃𝑥 𝑃(𝑥, 𝑦)


§ 5.10
∀𝑥∀𝑦 𝑃(𝑥, 𝑦) is equivalent to ∀𝑦∀𝑥 𝑃(𝑥, 𝑦)

∀𝑥 𝑃(𝑥) implies 𝑃(obj)


§ 5.12
𝑃(obj) implies ∃𝑥 𝑃(𝑥)

∃𝑥 (𝑃(𝑥) ∨ 𝑄(𝑥)) is equivalent to (∃𝑥 𝑃(𝑥)) ∨ (∃𝑥 𝑄(𝑥))


§ 5.12
∀𝑥 (𝑃(𝑥) ∧ 𝑄(𝑥)) is equivalent to (∀𝑥 𝑃(𝑥)) ∧ (∀𝑥 𝑄(𝑥))

¬∃𝑥 𝑃(𝑥) is equivalent to ∀𝑥 ¬ 𝑃(𝑥)


§ 5.13
¬∀𝑥 𝑃(𝑥) is equivalent to ∃𝑥 ¬ 𝑃(𝑥)

5.15 SOME EXAMPLES

Suppose we have the property Prime, with domain ℕ, which is True if its argument is a
prime number and False otherwise. Suppose also that we have the binary predicate ≤,
and that any variables we use must have domain ℕ.
How might we use these to state Theorem 17, that there are infinitely many primes?
At first glance, this might look like an existential statement, so we reach for an
existential quantifier. The trouble is, we are asserting the existence of infinitely many
5.15 S O M E E X A M P L E S 177

numbers of a particular type, but we are not allowed to use infinitely many quantifiers
(or to do anything else that makes the statement infinitely long). Furthermore, the
ingredients available to us here is quite limited; they do not allow us to describe arbi-
trarily long sequences. (There are richer settings in which that can be done, but that’s
a different puzzle!) So, what do we do?
Think about what it means for a set to be infinite. This means it goes on forever,
i.e., it’s unbounded; in other words, no matter what bound you might try to put on the
numbers in this set, they eventually get bigger than the bound.
Let’s focus on this, for the set of primes:
Every bound is exceeded by some prime.
We begin to see hints of quantifiers emerge:
Every bound is exceeded by some prime.
Rewording a bit:
For every bound, there exists a prime that is greater than the bound.
This is getting close enough to precise logical language that we can try writing it sym-
bolically:
∀𝑏 ∃𝑛 ∈ {primes} 𝑛 > 𝑏.
Let us move the condition on ∃𝑛 to later in the expression, so there is no qualification
on ∃𝑛, and so that we can use our predicate Prime rather than using other symbols and
relations we haven’t been given in this scenario. To do this, we use the method of § 5.7
(see the first line of the list in § 5.14):

∀𝑏 ∃𝑛 Prime(𝑛) ∧ (𝑛 > 𝑏).

We’re almost there! The remaining detail is that > has not been given to us as an
available predicate (and nor has <). So we need to find a way of saying the same thing
using ≤, which is one of our ingredients. This is straightforward, because 𝑛 > 𝑏 if and
only if 𝑛 ≰ 𝑏, which may be written using logical negation and ≤. So our final statement
in predicate logic is
∀𝑏 ∃𝑛 Prime(𝑛) ∧ ¬(𝑛 ≤ 𝑏). (5.9)

Now, think about the order of quantifiers here. We have seen previously that the
order of different quantifiers does matter (§ 5.10). So, in this case, it would be a mistake
to write the two quantifiers the other way round, as in

∃𝑛 ∀𝑏 Prime(𝑛) ∧ ¬(𝑛 ≤ 𝑏). 7 (5.10)

We will now look at this second statement in detail, to understand how it differs from
(5.9), and so gain insight into the effect of the different orders of quantifiers. In the
process, we will apply some other principles we have learned.
178 P R E D i C AT E L O G i C

What does this second statement Equation 5.10 say? There exists a number 𝑛 such
that, for every number 𝑏, 𝑛 is prime and 𝑛 > 𝑏. This certainly sounds different but its
meaning may not yet be clear.
The rules of predicate logic can help make it clear.
After the existential quantifier in (5.10), we have ∀𝑏 Prime(𝑛) ∧ ¬(𝑛 ≤ 𝑏), which is
of the form ∀𝑥(𝑃(𝑥) ∧ 𝑄(𝑥)) so we can apply one of our principles of doing logic with
quantifiers to deduce that
∀𝑏 Prime(𝑛) ∧ ¬(𝑛 ≤ 𝑏) is equivalent to (∀𝑏 Prime(𝑛)) ∧ (∀𝑏 ¬(𝑛 ≤ 𝑏)).
Now, consider
∀𝑏 Prime(𝑛).
The truth value of this depends entirely on the truth value of Prime(𝑛), because Prime(𝑛)
does not depend on 𝑏. This is an instance of the principle that quantifying an expression
over a variable that does not appear in the expression makes no difference to it, logically.
You can include such a quantifier, or not, according to your preference; the expressions
you get are equivalent. (See the end of § 5.9.) So, omitting this one superfluous universal
quantifier from ∀𝑏 Prime(𝑛), we see that
(∀𝑏 Prime(𝑛)) ∧ (∀𝑏 ¬(𝑛 ≤ 𝑏)) is equivalent to Prime(𝑛) ∧ (∀𝑏 ¬(𝑛 ≤ 𝑏)).
Substituting this back into the portion of (5.10) from ∀𝑏 onwards, we see that it is
equivalent to
∃𝑛 (Prime(𝑛) ∧ (∀𝑏 ¬(𝑛 ≤ 𝑏))).
Another way to write this, using the rule about restricting existential quantifiers (but
going backwards), is
∃𝑛 ∈ {primes} ∀𝑏 ¬(𝑛 ≤ 𝑏).
Rewriting ¬(𝑛 ≤ 𝑏) might make the meaning clearer:

∃𝑛 ∈ {primes} ∀𝑏 𝑛 > 𝑏.

So, this is saying that there is a prime number that is greater than every number!
This is clearly false, and in any case we see that the meanings of (5.9) and (5.10) are
very different. This illustrates the fact that the order of two different quantifiers (one
universal, one existential) does matter.

5.16 EXERCiSES

1. What can a predicate with no arguments be?

2. Suppose you have the predicates prolog and elvish, with the following meanings:
prolog(𝑋 ): 𝑋 knows the Prolog language.
elvish(𝑋 ): 𝑋 knows the Elvish language.
5.16 E X E R C i S E S 179

(a) Write a universal statement in predicate logic with the meaning:

“Nobody knows both Prolog and Elvish.”

(b) Suppose that the statement in (a) is False. Starting with its negation, derive an
existential statement meaning that someone knows both these languages.

3. The predicate supervised has two arguments, both of which are people. The
meaning of supervised(𝑋 , 𝑌) is that person 𝑋 supervised the PhD of person 𝑌.
Express each of the following three sentences in predicate logic.

(a) Alonzo Church was the PhD supervisor of Alan Turing.

(b) No person supervised their own PhD.

(c) Not everyone has supervised someone else’s PhD.

(d) Not every PhD supervisor had a PhD supervisor.


Write this one as an existential statement.

(e) Someone had at least two PhD supervisors.

(f) Alan Turing supervised the PhDs of exactly two people.

4. The predicate designedFirstComputerIn has two arguments. The first argument


is a person, and the second argument is a country. The meaning of designedFirst-
ComputerIn(𝑋 , 𝑌) is that person 𝑋 was one of the designers of the first computer in
country 𝑌.
Express each of the following three sentences in predicate logic.

(a) Trevor Pearcey and Maston Beard designed the first computer in Australia.

(b) No-one designed the first computer in two different countries.

5. (six degrees of separation)


Consider the binary predicate knows from p. 68 in § 2.14. The domain of each of its
arguments is the set of all people.
It has been claimed that, in the human social network, the distance between any
two people is at most 6. (See p. 70 in § 2.15.)

Write this claim in predicate logic, using just the predicate knows.
180 P R E D i C AT E L O G i C

6. Suppose you have the equality predicate for sets, the symmetric difference function
for sets, and set variables 𝐴 and 𝐵. Write the statement of Theorem 4 in predicate logic.

7. For each of the following predicate logic statements, (i) identify the predicates,
functions, variables (including a suggested domain for each) and constants used; (ii)
state whether or not it is a proposition, and if it is, whether it is true or false.

(a) ∀𝑥 𝑥2 ≥ 0

(b) 𝑥2 < 0

(c) ∀𝑥 2𝑥 = 𝑥2

(d) 𝑥 < 0 ⇒ ∀𝑦 ((0 < 𝑦) ⇒ (𝑥 < 𝑦))

(e) ∀𝑥∀𝑧∃𝑦 (𝑥 < 𝑦) ∧ (𝑦 < 𝑧)

(f) ∃𝑥∃𝑦 ¬(𝑥 ⇒ 𝑦)

8. Consider the following predicate logic statement.

∀𝐴∀𝐵 (𝐴 ⊆ 𝐵) ⇒ (𝒫(𝐴) ⊆ 𝒫(𝐵))

What predicates, functions and variables does this statement use? Restate, in words,
the theorem that is being stated here.

9. Suppose that

• 𝑊 is a variable whose domain is the set of all English words,

• 𝐿 is a variable whose domain is the set of all English letters,

• WordContainsLetter is a binary predicate which takes an English word as its first


argument, an English letter as its second argument, and is True whenever that
word contains that letter. For example,

WordContainsLetter(quizzical, z) is True,
WordContainsLetter(quizzical, e) is False.

(a) Write the statement of Theorem 15 in predicate logic.

(b) Now suppose you also have a unary predicate Vowel whose domain is the set of
English letters and which is True when its argument is a vowel.
Now write the statement of Theorem 15 in predicate logic again, but using this new
predicate to write it more compactly this time.
5.16 E X E R C i S E S 181

10. Suppose you have the positive integer properties Even and Prime, which are true
precisely for even numbers and prime numbers respectively, and that you also have the
binary predicate ≤ on ℕ. Write a predicate logic expression for the statement
There are infinitely many odd prime numbers but only finitely many even
prime numbers.

11. The ternary predicate Date is defined for any (𝑑, 𝑚, 𝑦) ∈ ℕ × ℕ × ℕ, and is True
if (𝑑, 𝑚, 𝑦) represents a valid date in day-month-year format in the Gregorian calendar,
and is False otherwise.
Using Date and <, write an expression in predicate logic which is True if and only if
(𝑑1 , 𝑚1 , 𝑦1 ) and (𝑑2 , 𝑚2 , 𝑦2 ) are both valid dates and the first date comes chronologically
before the second.

12. Suppose you have


• the positive integer predicate ≤ ;

• the function 𝑡 ∶ {programs} × 𝐴 ∗ → ℕ defined for all programs 𝑃 and input strings
𝑥 ∈ 𝐴 ∗ (where 𝐴 is an alphabet (§ 1.5)) by

𝑡(𝑃, 𝑥) = the time taken by program 𝑃 when run on input 𝑥;

• the function len ∶ 𝐴 ∗ → ℕ defined for any string 𝑥 ∈ 𝐴 ∗ by

len(𝑥) = the length of the string 𝑥;

• binary functions for multiplication and exponentiation on the positive integers;

• a string variable 𝑥 ∈ 𝐴 ∗ ;

• positive integer variables 𝑐 and 𝑘.


Write the following statement in predicate logic.
There exist integers 𝑐 and 𝑘 such that, for every input string 𝑥, the time
taken by program 𝑃 on input 𝑥 is at most the product of 𝑐 and the 𝑘-th
power of the length of 𝑥.

13. There are many theorems that assert that there is a unique object that satisfies
certain conditions. In this exercise, we look at how to make such statements using
predicate logic.
Suppose you want to say that there is a unique 𝑥 with property 𝑃(𝑥). It’s not
enough to just write
∃𝑥 𝑃(𝑥),
182 P R E D i C AT E L O G i C

because that only enforces existence without guaranteeing uniqueness. Sometimes, peo-
ple use “∃!” as a shorthand for “there exists a unique”. With that shorthand,

∃!𝑥 𝑃(𝑥)

means

There exists a unique 𝑥 such that 𝑃(𝑥) holds.

How would you write this statement just using the tools available to us in predicate
logic, i.e., without using “!”?

14.
(a) Prove that

∃𝑋 (𝑃(𝑋 ) ∧ 𝑄(𝑋 )) implies (∃𝑋 𝑃(𝑋 )) ∧ (∃𝑋 𝑄(𝑋 ))

(b) Prove that

∀𝑋 (𝑃(𝑋 ) ∨ 𝑄(𝑋 )) is implied by (∀𝑋 𝑃(𝑋 )) ∨ (∀𝑋 𝑄(𝑋 ))

in two ways: (i) reason it through directly, (ii) use the result from part (a) and what
you’ve learned about the relationship between existential and universal quantifiers.
6
SEQUENCES & SERIES

A sequence may be regarded as a list of objects. Their defining characteristic is that the
objects are in some order, so there is a first object, then a second object, then a third
object, and so on. We can use sequences to represent:
• items in a file, ordered by their position in the file, such as the words in a text file,
or the lines in a text file, or the frames in a movie file, or the rows in a spreadsheet;

• items ordered by time, such as the children in a family (by birth order), or a
person’s posts to one of their social media platforms, or the population of a city
in each year, or the dishes served in the successive courses of a banquet, or the
world’s early computers in order of when they first ran a program;

• items ordered in space along a line or curve, such as the houses along one side of
a street, or the floors of a building, or the waterfalls along a river, or the amino
acids along a protein molecule;

• items in some order of rank, such as the top ten songs according to some popularity
poll, or the world’s highest mountains in order, or the planets of the solar system
in order of mean distance from the Sun, or the world’s fastest computers in order
of their speed on some suite of benchmark problems;

• the successive letters of a string, in fact a string is just a finite sequence whose
members happen to be letters.
Inside a computer, data is always stored in some order, even in cases where the order
may not be important. So, for example, if a set — in which order does not matter —
is to be represented on a computer, then it must ultimately be represented as some
sequence of items somewhere in the computer’s memory.
Computation is a process that takes place over time. Any characteristic of a compu-
tation — such as the amount of memory it uses, or the amount of energy consumed, or
some aspect of the information displayed — gives rise to a sequence that specifies how
that characteristic changes over time.
We are also fundamentally concerned with how the time taken by some program
depends on the size of the input. A sequence of time measures, in order of input size,
shows how the time grows as the input size increases.

183
184 SEQUENCES & SERiES

Sequences are therefore one of the most fundamental of all abstract models.

6.1𝛼 D E F i N i T i O N S A N D N O TAT i O N

A sequence is a function whose domain is the set of positive integers or some finite
initial portion of it. So the domain of a sequence is either ℕ or [1, 𝑛]ℕ for some 𝑛 ∈ ℕ.
An infinite sequence is a sequence with domain ℕ.
A finite sequence is a sequence with domain [1, 𝑛]ℕ for some 𝑛. This is really the
same as an 𝑛-tuple, and finite sequences are often written in tuple notation:

( first object, second object, third object, … , last object ).

For any set 𝐴, a sequence over 𝐴 is a sequence whose codomain is 𝐴.


If 𝑓 ∶ ℕ → 𝐴 is a sequence over 𝐴, then 𝑓(𝑛) is referred to as the 𝑛-th term of the
sequence. The terms of the sequence 𝑓 are the members of the image of 𝑓 — remembering
the definition of the image of a function from p. 39 in § 2.1.2𝛼 — written in the order
given by the domain ℕ.
The length of a finite sequence is the size of its domain, which is the number of
terms of the sequence.
It is common to write a sequence as a comma-separated list of its terms, in order:

𝑓(1), 𝑓(2), 𝑓(3), … (6.1)

This is fine if the sequence is finite and very short, and it can be useful even for long
sequences and infinite sequences provided the pattern is clear. But this way of writing a
sequence is really just for informal exposition, not for defining the sequence. A sequence
cannot be defined without either listing all its members in order (only practical if the
sequence is very short) or giving a precise rule by which, for each 𝑛 ∈ ℕ, the 𝑛-th term
can be determined.
If 𝑓 is a sequence, then its 𝑛-th term 𝑓(𝑛) is often denoted by 𝑓𝑛 . This can be
thought of as just a variation on the usual notation for function values, but one that is
used more often in the context of sequences. With this notation, we might rewrite (6.1)
as
𝑓1 , 𝑓2 , 𝑓3 , … .
We can always define a sequence over 𝐴 by using the fact that it is simply a function
𝑓 ∶ ℕ → 𝐴 and using one of our ways of defining functions (§ 2.3𝛼 ).
There is another common convention for defining sequences, which has a couple of
variants:

( formula for 𝑛-th term ∶ 𝑛 ∈ ℕ ) or ( formula for 𝑛-th term )∞


𝑛=1 .
6.2𝛼 R E C U R S i V E D E F i N i T i O N S O F S E Q U E N C E S 185

The first of these is reminiscent of the way we used formulas to define sets, on p. 3 in
§ 1.2𝛼 , except that now we use parentheses rather than curly braces because the order
matters. This convention also applies to finite sequences:

( formula for 𝑛-th term ∶ 𝑛 ∈ ℕ, 1 ≤ 𝑛 ≤ 𝑁 ) or ( formula for 𝑛-th term )𝑁


𝑛=1 .

For example, the infinite sequence of squares

1, 4, 9, 16, 25, …

can be written formally as

( 𝑛2 ∶ 𝑛 ∈ ℕ ) or ( 𝑛2 )∞
𝑛=1 .

We will be especially interested in sequences over various sets of numbers, which


we call number sequences. Typical codomains for number sequences include ℕ or
ℤ or ℚ or ℝ. The importance of these sequences is due not only to the fact that a
lot of computation uses numbers, but also due to the use of numbers in measuring
characteristics of computation, such as the time spent, the size of the input or output
data, the amount of memory used, the locations of data in memory, energy consumption,
and so on.

6.2𝛼 RECURSiVE DEFiNiTiONS OF SEQUENCES

We have seen how to define sequences by expressing each term as a function of its
position in the sequence. So, the 𝑛-th term 𝑓𝑛 is given by a formula in 𝑛.
Another way to define sequences is to give an expression that uses a previous term
in the sequence, or possibly several previous terms. For example, we could write

𝑓𝑛 = 𝑓𝑛−1 + 2.

But this alone is not sufficient, because we have to specify how to get started. If we
write
𝑓1 = 1, 𝑓𝑛 = 𝑓𝑛−1 + 2,
then we have defined the sequence of odd numbers

1, 3, 5, 7, … (6.2)

which could also be defined as (2𝑛 − 1 ∶ 𝑛 ∈ ℕ) or (2𝑛 − 1)∞


𝑛=1 . Starting with a different
value would give a different sequence, even though the formula for 𝑓𝑛 looks the same.
For example,
𝑓1 = 0, 𝑓𝑛 = 𝑓𝑛−1 + 2, (6.3)
186 SEQUENCES & SERiES

defines the sequence of even numbers

0, 2, 4, 6, … .

For another example, consider the sequence of powers of 2,

2, 4, 8, 16, … .

Previously, we might have defined this sequence as (2𝑛 ∶ 𝑛 ∈ ℕ). Now, we can define it
by
𝑓1 = 2, 𝑓𝑛 = 2𝑓𝑛−1 . (6.4)
In general, a recursive definition of a family of objects consists of:

• the base case:


case a definition of a finite number of the simplest-possible objects in
the family;

• the general case:


case a rule that defines a general object in the family in terms of
simpler objects in the family.

The base cases, together with the general rule, must be sufficient so that, used together,
any object in the family is defined uniquely, precisely and clearly.
For sequences, the base case gives some initial terms explicitly, and then the general
case is given as an expression using previous terms in the sequence. A recursive definition
of a number sequence is called a recurrence relation.
relation We have seen three recurrence
relations so far: for the odd numbers in (6.2), for the even numbers in (6.3), and for the
powers of 2 in (6.4).
These sequences are all familiar and could be defined either by a formula in 𝑛 or
by a recurrence relation, according to taste or the needs of the situation in which they
are being used. But there are many situations where a recurrence relation is the most
natural way to define a sequence. For example, consider the factorials

1, 2, 6, 24, 120, … .

This sequence may be defined by the recurrence relation

𝑓1 = 1, 𝑓𝑛 = 𝑛𝑓𝑛−1 . (6.5)

This also has a formulaic definition, (𝑛! ∶ 𝑛 ∈ ℕ), but 𝑛! is really just a standard ab-
breviation for 𝑛(𝑛 − 1)(𝑛 − 2) ⋯ 3 ⋅ 2 ⋅ 1, so the most succinct definition without using
special abbreviations is the recurrence relation. This sequence also illustrates the point
that, in the rule of a recurrence relation, the position 𝑛 does not have to be confined
to the subscripts; it can also be used, in its own right, in the expression, as it is on the
right-hand side of the second equation in (6.5).
6.3𝛼 A R i T H M E T i C S E Q U E N C E S 187

Exercise: Give recurrence relations for each of the following sequences: the positive
integers; the negative even integers; the squares; the sequence whose 𝑛-th term is the
sum of the first 𝑛 positive integers; the sequence whose 𝑛-th term is the sum of the
reciprocals of the first 𝑛 positive integers.

So far, all our recurrence relations have rules that only use the previous term in the
sequence. But rules can use terms that appear earlier that that. Consider the recurrence
relation
𝑓1 = 2, 𝑓2 = 4, 𝑓𝑛 = 4𝑓𝑛−2 . (6.6)
With the rule 𝑓𝑛 = 4𝑓𝑛−2 , it is not sufficient to just define one base case, such as 𝑓1 = 2.
If we did that, then we could only compute every second term:

𝑓1 = 2, 𝑓3 = 4 ⋅ 𝑓1 = 4 ⋅ 2 = 8, 𝑓5 = 4 ⋅ 𝑓3 = 4 ⋅ 8 = 32, and so on.

We need another base case to specify 𝑓2 , and then the recurrence will give us values at
all even positions too. If we set 𝑓2 = 4, as in (6.6), then we again get the powers of 2, so
in this case we have only come up with a more complicated way to define that sequence.
But we can do other things too. For example, try

𝑓1 = 2, 𝑓2 = 0, 𝑓𝑛 = 4𝑓𝑛−2 . (6.7)

This defines the sequence


2, 0, 8, 0, 32, 0, 128, 0, …
which can also be defined by

2𝑛 , if 𝑛 is odd;
𝑓𝑛 = 
0, if 𝑛 is even.

6.3𝛼 ARiTHMETiC SEQUENCES

An arithmetic sequence,
sequence also called an arithmetic progression,
progression is a number se-
quence in which the difference between every pair of consecutive terms is the same.
This means there is some common difference 𝑑 such that every term is obtained by
adding 𝑑 to its predecessor. This, together with its first term which we’ll call 𝑎, gives
the following recurrence relation:

𝑓1 = 𝑎, 𝑓𝑛 = 𝑓𝑛−1 + 𝑑.

The successive terms are


𝑎, 𝑎 + 𝑑, 𝑎 + 2𝑑, …
with the 𝑛-th term being 𝑎 + (𝑛 − 1)𝑑.
188 SEQUENCES & SERiES

An arithmetic sequence is specified by giving its first term 𝑎, its common difference
𝑑, whether it is finite or infinite and, if it is finite, its number of terms 𝑛. In the finite
case, the sequence looks like

𝑎, 𝑎 + 𝑑, 𝑎 + 2𝑑, … , 𝑎 + (𝑛 − 1)𝑑. (6.8)

Here, the 𝑖-th term is 𝑎 + (𝑖 − 1)𝑑, where 1 ≤ 𝑖 ≤ 𝑛.


Here are some old and new examples.

description first few terms 𝑎 𝑑 length


even numbers 0,2,4,6,… 0 2 ∞
odd numbers 1,3,5,7,… 1 2 ∞
positive integers 1,2,3,… 1 1 ∞
launch countdown 10,9,8,7,6,5,4,3,2,1,0 10 -1 11
positive integers with remainder 2 2,5,8,11,14,… 2 3 ∞
after division by 3

years of the 21st century 2001, 2002, …, 2100 2001 1 100


leap years this century 2004, 2008, …, 20961 2004 4 24
day of each Monday in May 2025 5,12,19,26 5 7 4

an account with one fixed regular 220, 120, 20, −80, −180 220 −100 5
bill
annual balances ($) of a 5-year 100, 110, 120, 130, 140, 150 100 1.1 6
investment with initial bal-
ance $100 and simple interest
accruing at 10% p.a.

6.4𝛼 GEOMETRiC SEQUENCES

A geometric sequence,
sequence also called a geometric progression,
progression is a number sequence
in which the ratio between every pair of consecutive terms is the same. This means
there is some common ratio 𝑟 such that every term is obtained by multiplying its
predecessor by 𝑟.
Again, we call the first term 𝑎. We have the following recurrence relation:

𝑓1 = 𝑎, 𝑓𝑛 = 𝑓𝑛−1 ⋅ 𝑟.

1 The year 2100 does indeed belong to this century, which goes from 2001 to 2100 inclusive. But 2100 is not
a leap year, despite being a multiple of 4.
6.5 H A R M O N i C S E Q U E N C E S 189

The successive terms are


𝑎, 𝑎𝑟, 𝑎𝑟 2 , …
with the 𝑛-th term being 𝑎𝑟 𝑛−1 .
A geometric sequence is specified by giving its first term 𝑎, its common ratio 𝑟,
whether it is finite or infinite and, if it is finite, its number of terms 𝑛. In the finite case,
the sequence looks like
𝑎, 𝑎𝑟, 𝑎𝑟 2 , … , 𝑎𝑟 𝑛−1 . (6.9)
The 𝑖-th term is 𝑎𝑟 𝑖−1 , where 1 ≤ 𝑖 ≤ 𝑛.
Examples:

description first few terms 𝑎 𝑟 length


powers of 2 1,2,4,8,16,… 1 2 ∞
reciprocals of powers of 2 1, 12 , 14 , 18 , 16
1
,… 1 1
2 ∞
decimal place values, going 1, 10, 10 , 103 , …
2
1 10 ∞
leftwards of decimal point
alternating ±1 1, −1, 1, −1,… 1 −1 ∞
area (m2 ) of paper sizes 0.0625, 0.125, 0.25, 0.5 0.0625 2 4
A4, A3, A2, A1
number of transistors on an in- 100 000, 141 000, 200 000, 282 000, … 105 1.41 ?
tegrated circuit, each year
from early 1980s (idealised,
simplified model based on
Moore’s Law)
annual balances ($) of a 5-year 100, 110, 121, 133.1, 146.41, 161.05 100 1.1 6
investment with initial bal-
ance $100 and compound in-
terest accruing at 10% p.a.
value (in $) each year of an 100, 90, 81, 72.9, 65.61, 59.05 100 1.1 6
asset bought for $100 which
depreciates at 10% p.a. over
five years

6.5 HARMONiC SEQUENCES

A harmonic sequence is a sequence of terms whose reciprocals form an arithmetic


sequence. So (𝑎𝑛 ∶ 𝑛 ∈ 𝑀 ) is a harmonic sequence if and only if (1/𝑎𝑛 ∶ 𝑛 ∈ 𝑀 ) is an
arithmetic sequence.
190 SEQUENCES & SERiES

The sequence 3, 4, 6, 12 is harmonic, because its sequence of reciprocals, 13 , 14 , 16 , 12


1
, is
an arithmetic sequence. To see this, express all these fractions over a common denomi-
nator, and the sequence becomes
4 3 2 1
, , , ,
12 12 12 12
which is an arithmetic sequence of four terms with first term 𝑎 = 13 and common differ-
1
ence 𝑑 = − 12 .
One particularly important harmonic sequence is the sequence of reciprocals of pos-
itive integers: (1/𝑛 ∶ 𝑛 ∈ ℕ). This is clearly harmonic because the positive integers form
an arithmetic sequence. We discuss its properties further in § 6.15.

6.6 FROM RECURSiVE DEFiNiTiONS TO FORMULAS

We introduced recursive definitions of sequences in § 6.2𝛼 . Recursive definitions arise


very naturally, and are often the simplest way of writing down a rule for a sequence. But
it is typically not easy to see, from the recursive rule, just how the sequence behaves as
𝑛 increases. So we would like to find a formula for the 𝑛-th term, meaning an expression
for the 𝑛-th term in the sequence that depends only on 𝑛, and not on any previous terms
of the sequence. Even if we cannot find an exact formula for the 𝑛-th term, maybe we
can find upper and/or lower bounds for it, and this can be very useful in analysing the
behaviour of the sequence.
We studied arithmetic sequences in § 6.3𝛼 ; they have a recursive rule of the form
𝑓𝑛 = 𝑓𝑛−1 + 𝑑. Then we studied geometric sequences in § 6.4𝛼 ; these have a rule of the
form 𝑓𝑛 = 𝑟𝑓𝑛−1 . In each of these cases, we also have a formula for the 𝑛-th term: for
the arithmetic sequence, 𝑓𝑛 = 𝑎 + (𝑛 − 1)𝑑, while for the geometric sequence, 𝑓𝑛 = 𝑎𝑟 𝑛−1 .
But many sequences are not of either of those types. Consider for example the
following recursive definition.

𝑓1 = 1, 𝑓𝑛 = 2𝑓𝑛−1 + 1. (6.10)

This sequence is nether arithmetic nor geometric, although it has both a constant mul-
tiplier and an additive constant, so in a way it seems like a mix of both types. Can we
find, and prove, a formula for it?
We use this example to illustrate a very common approach:

1. explore the sequence by working out the first few terms;

2. study those terms, look for patterns, and try and conjecture an expression that
fits the pattern you have observed so far;

3. prove
prove, by induction on 𝑛, that your formula works for all 𝑛.
6.6 F R O M R E C U R S i V E D E F i N i T i O N S T O F O R M U L A S 191

Let us apply this explore-conjecture-prove approach to our recursive definition


in (6.10).

1. explore
explore:

𝑓1 = 1 (given in (6.10))
𝑓2 = 2𝑓1 + 1 = 2 ⋅ 1 + 1 = 3
𝑓3 = 2𝑓2 + 1 = 2 ⋅ 3 + 1 = 7
𝑓4 = 2𝑓3 + 1 = 2 ⋅ 7 + 1 = 15

2. conjecture
conjecture:
We can see that the values of 𝑓𝑛 , for 𝑛 ≤ 4, are each one less than a power of 2. This
trend looks likely to continue! We write this down as a formula, taking care to
get the exponent of 2 correct and that the formula works correctly for the known
initial case 𝑛 = 1. So we propose

𝑓𝑛 = 2𝑛 − 1. (6.11)

3. prove
prove:
Now we prove (6.11) holds for all 𝑛 ∈ ℕ. It is natural to prove this by induction.

Inductive Basis:
If 𝑛 = 1, then we know from the initial condition in (6.10) that 𝑓1 = 1, and for
𝑛 = 1 the formula (6.11) gives 𝑓1 = 21 − 1 = 2 − 1 = 1, so the formula in (6.11) is
correct for 𝑛 = 1.

Inductive step:
Let 𝑘 ≥ 1.
Assume that (6.11) holds for 𝑛 = 𝑘, i.e., that 𝑓𝑘 = 2𝑘 − 1. (This is the Inductive
Hypothesis.)
Now consider 𝑓𝑘+1 . The recursive rule from (6.10) gives

𝑓𝑘+1 = 2𝑓𝑘 + 1 (by (6.10), and using 𝑘 ≥ 1)


𝑘
= 2(2 − 1) + 1 (by the Inductive Hypothesis)
= 2𝑘+1 − 2 + 1
= 2𝑘+1 − 1.
192 SEQUENCES & SERiES

So we have established that (6.11) holds for 𝑛 = 𝑘 + 1 too. This completes the
Inductive Step.

Conclusion:
Therefore, by Mathematical Induction, (6.11) holds for all 𝑛 ∈ ℕ.

6.7 THE FiBONACCi SEQUENCE

Now we consider another recursive definition, this time one where each term depends
on the two previous terms:

𝑓1 = 1, 𝑓2 = 1, 𝑓𝑛 = 𝑓𝑛−1 + 𝑓𝑛−2 (for 𝑛 ≥ 3). (6.12)

This is the famous Fibonacci sequence,


sequence whose first ten terms are given in the second
line of the following table.
𝑛 1 2 3 4 5 6 7 8 9 10 ⋯
𝑓𝑛 1 1 2 3 5 8 13 21 34 55 ⋯
In this section we study this sequence at length, showing how the explore-conjecture-
prove methodology can be used to determine upper bounds, lower bounds, estimates of
long-term behaviour, and eventually an exact formula for the 𝑛-th term. This extended
case study illustrates the approaches and kinds of thinking that can be applied to many
other problems of analysing recursively-defined sequences. We explain the process at
some length, because we are illustrating not just the underlying mathematical principles,
but also the process of investigating the sequence and developing the results and proofs.
For completeness, we include a series of proofs by induction which are each given
in full even though they only differ slightly from each other. This takes up space, but
there is not much in each proof that is new, and we indicate exactly what has changed
in each case.
Dealing with sequences whose terms depend on two previous terms is more com-
plex than when the dependence is on just one previous term (as was the case for the
recursively-defined sequences we gave in § 6.2𝛼 and § 6.6), so studying this case shows
how to deal with this extra complexity.
The techniques described in this section can be extended to deal with sequences
whose terms depend on three or more previous terms. We don’t discuss such cases as
they have more technical detail, although the underlying principles are the same.
There is also particular value in gaining a deep understanding of the Fibonacci se-
quence, since it arises throughout computer science and mathematics.

It is difficult to determine the behaviour of the Fibonacci sequence just from its
recursive definition. So, again, we take the explore-conjecture-prove approach. Our
exploration gives the terms shown above, and you can explore further using, say, a
6.7 T H E F i B O N A C C i S E Q U E N C E 193

spreadsheet or a program in order to develop a better understanding of its long-term


behaviour. This suggests that the sequence grows quite quickly. But how quickly? The
ratio 𝑓𝑛+1 /𝑓𝑛 between successive terms is not constant, but that ratio does seem to be
settling down. Although an exact formula for 𝑓𝑛 is not yet evident, we can use the
apparent settling-down of the ratio 𝑓𝑛+1 /𝑓𝑛 to conjecture some simple upper and lower
bounds, and this is still very useful information.
The ratio 𝑓8 /𝑓7 is 21/13 = 1.615 …, the ratio 𝑓9 /𝑓8 is 34/21 = 1.619 …, and the ratio
𝑓10 /𝑓9 is 55/34 = 1.617 …. While we do not yet know what happens for 𝑛 > 10, we can
look at the table of values and develop some hypotheses for upper and lower bounds for
that ratio and for the terms of the sequence.

6.7.1 Upper bounds

The maximum ratio between any two consecutive terms is 𝑓3 /𝑓2 = 2/1 = 2, and our
explorations suggest that it is likely to stay well below that. So we can conjecture the
upper bound
𝑓𝑛 ≤ 2𝑛 . (6.13)
We now prove this by induction on 𝑛.

Inductive Basis:

• When 𝑛 = 1, we have 𝑓1 = 1, and our upper bound is 21 = 2, so we certainly


have 𝑓1 ≤ 21 .

• We need another base case too, for 𝑛 = 2. The reason for this will become
evident shortly. When 𝑛 = 2, we have 𝑓2 = 1, and our upper bound is 22 = 4,
so we have 𝑓2 ≤ 22 (which is an even looser upper bound than for 𝑛 = 1).

Inductive Step:
Let 𝑘 ≥ 2.
Assume that 𝑓𝑙 ≤ 2𝑙 for all 𝑙 ≤ 𝑘. (This is the Inductive Hypothesis.)

Comment: This is a case of what is sometimes called “strong induction”,


where we assume a claim holds not just for 𝑘, but for all positive integers
≤ 𝑘. This is entirely valid, and in fact it is really still just normal
induction, since it is still a statement just about 𝑘. In the Inductive
Hypothesis, the variable 𝑙 is bound by the universal quantifier, so 𝑙 is
not a free variable (unlike 𝑘), so the Inductive Hypothesis is a statement
about 𝑘 only, not about 𝑙.
194 SEQUENCES & SERiES

Now consider 𝑓𝑘+1 . Since 𝑘 ≥ 2, we have 𝑘 +1 ≥ 3, so we can apply the recursive


rule (6.12):

𝑓𝑘+1 = 𝑓𝑘 + 𝑓𝑘−1 (by (6.12), and using 𝑘 ≥ 2)


𝑘 𝑘−1
≤ 2 +2 (by the Inductive Hypothesis, twice)
𝑘+1
< 2 .

Comment: We can now see why we needed the Inductive Basis to cover
𝑛 = 2 as well as 𝑛 = 1: our recurrence only works for 𝑛 ≤ 3, so smaller
cases need to be dealt with separately.

This completes the Inductive Step.

Conclusion:
Therefore, by Mathematical Induction, our upper bound (6.13) holds for all 𝑛.

There are several points worth noting about this upper bound and its proof.
Firstly, we can make the upper bound tighter by putting 𝑛 − 1 in the exponent
instead of 𝑛:
𝑓𝑛 ≤ 2𝑛−1 . (6.14)
The proof would go through again with very little change, in fact the main changes
needed are to the Inductive Basis. Now, 𝑓1 = 1 ≤ 21−1 and 𝑓2 = 1 ≤ 22−1 , so this tighter
upper bound still holds when 𝑛 = 1 and 𝑛 = 2. The rest of the proof, which only uses
the recursive rule, is unchanged except that, when applying the Inductive Hypothesis,
we insert the new upper bounds instead of the old ones.
Secondly, another way to try to refine the upper bound is to put a constant factor
in front. Let’s try
𝑓𝑛 ≤ 𝛼 ⋅ 2𝑛 . (6.15)
What is the best factor 𝛼 to use? We need the Inductive Basis to still work, so we want

𝑓1 = 1 ≤ 𝛼21 and 𝑓2 = 1 ≤ 𝛼22 .

These tell us that 𝛼 ≥ 1/2, so putting 𝛼 = 1/2 should work. This gives an upper bound

𝑓𝑛 ≤ 12 2𝑛 .

But the right-hand side here equals 2𝑛−1 , so this upper bound is really just a restatement
of (6.14).
This use of a constant factor should be kept in mind, as it is a common technique
for refining formulas for bounds for sequences given by recurrence relations.
Thirdly, the only algebraic property of 2𝑛 that we used is

2𝑘 + 2𝑘−1 ≤ 2𝑘+1 . (6.16)


6.7 T H E F i B O N A C C i S E Q U E N C E 195

Dividing each side by 2𝑘−1 , we see that this is equivalent to

2 + 1 ≤ 22 . (6.17)

This indicates that the Inductive Step would still work if, instead of using the upper
bound 2𝑛 , we used 𝑟 𝑛 where 𝑟 is a smaller number than 2, provided we still have an
inequality like (6.17) except for 𝑟 rather than 2. So we could choose 𝑟 to satisfy

𝑟 + 1 ≤ 𝑟2. (6.18)

This will ensure that


𝑟 𝑘 + 𝑟 𝑘−1 ≤ 𝑟 𝑘+1 , (6.19)
which is analogous to (6.16), and this should make the Inductive Step work. So we
should be able to get the Inductive Step to work for a tighter upper bound of the form
𝑟 𝑛 for some suitable 𝑟 < 2, namely one that satisfies (6.18).
For the whole proof to work, we also need the Inductive Basis to work. This should
be possible in this case; happily, 𝑓1 = 𝑓2 = 1 are each < 𝑟 𝑛 for any 𝑟 > 1 and 𝑛 ≥ 1.
So suppose we have some 𝑟 that satisfies (6.18), for example you could try 𝑟 = 1.62,
and let us re-do the proof by induction, this time proving that, for all 𝑛 ∈ ℕ,

𝑓𝑛 ≤ 𝑟 𝑛 . (6.20)

In presenting the new proof by induction, we give it in full, even though most of it is
unchanged. The parts that are changed are given in blue; this is really just replacing 2
by 𝑟 throughout. (We have also abbreviated some of the explanations.)

Inductive Basis:

• When 𝑛 = 1, we have 𝑓1 = 1, and our upper bound is 𝑟 1 = 𝑟, so we certainly


have 𝑓1 ≤ 𝑟 1 (since 𝑟 > 1).

• When 𝑛 = 2, we have 𝑓2 = 1, and our upper bound is 𝑟 2 , so we have 𝑓2 ≤ 𝑟 2


(since 𝑟 > 1).

Inductive Step:
Let 𝑘 ≥ 2.
Assume that 𝑓𝑙 ≤ 𝑟 𝑙 for all 𝑙 ≤ 𝑘. (This is the Inductive Hypothesis.)
Now consider 𝑓𝑘+1 . Since 𝑘 ≥ 2, we have 𝑘 +1 ≥ 3, so we can apply the recursive
rule (6.12):

𝑓𝑘+1 = 𝑓𝑘 + 𝑓𝑘−1 (by (6.12), and using 𝑘 ≥ 2)


𝑘 𝑘−1
≤ 𝑟 +𝑟 (by the Inductive Hypothesis, twice)
< 𝑟 𝑘+1 (by (6.19), which in turn follows from (6.18)).
196 SEQUENCES & SERiES

This completes the Inductive Step.

Conclusion:
Therefore, by Mathematical Induction, our upper bound (6.20) holds for all 𝑛.

We can actually do even better by choosing 𝑟 to satisfy (6.18) with equality, i.e., by
requiring it to satisfy
𝑟 + 1 = 𝑟2.
Rearranging, this is equivalent to

𝑟 2 − 𝑟 − 1 = 0. (6.21)

So 𝑟 should be a root of this quadratic equation, and the usual formula for these roots
gives
1 ± √5
𝑟 =
2
so the two roots are

1 + √5 1 − √5
𝑟1 = = 1.61803 … and 𝑟2 = = −0.61803 … . (6.22)
2 2
Either of these would enable the Inductive Step to work, but only 𝑟1𝑛 works as an upper
bound, since it works for the Inductive Basis too whereas 𝑟2𝑛 does not work as an upper
bound for either 𝑛 = 1 or 𝑛 = 2.

6.7.2 Lower bounds

The approach we have taken to upper bounds, in § 6.7.1, is readily adapted to lower
bounds, with a little adjustment.
The minimum ratio 𝑓𝑛+1 /𝑓𝑛 between consecutive terms is 𝑓2 /𝑓1 = 1, but using 1𝑛 as
a lower bound for 𝑓𝑛 is not very useful, even though it is clearly correct! So we look for
a higher ratio and see what we can do with it.
Exploration shows that, from the second term onwards, the ratio 𝑓𝑛+1 /𝑓𝑛 between
consecutive terms is ≥ 1.5. It is tempting to go from this ratio to a proposed lower
bound for 𝑓𝑛 of 1.5𝑛 . After all, we did something similar for upper bounds at the start
of the previous section (§ 6.7.1), when we observed that the ratio of consecutive terms
is always ≤ 2 and then found that an upper bound of 2𝑛 would work.
But bounds on ratios of consecutive terms do not always immediately yield bounds
on the sequence itself. In this case, 1.5𝑛 does not work as a lower bound, because it
fails for 𝑛 = 1 and 𝑛 = 2. We need to get a lower bound that works for those cases too.
Then, when we prove our lower bound later, we will be able to get the Inductive Basis
to work.
6.7 T H E F i B O N A C C i S E Q U E N C E 197

So we introduce a constant factor 𝛼, much as we did in (6.15) in § 6.7.1 when seeking


an upper bound:
𝑓𝑛 ≥ 𝛼 ⋅ 1.5𝑛 .
For the Inductive Basis to work, we require

𝑓1 = 1 ≥ 𝛼 ⋅ 1.51 and 𝑓2 = 1 ≥ 𝛼 ⋅ 1.52 .

For these to hold, we need


2 4
𝛼≤ and 𝛼≤ .
3 9
So we choose 𝛼 = 4/9 (noting that this implies 𝛼 ≤ 2/3), and propose the lower bound

𝑓𝑛 ≥ 49 ⋅ 1.5𝑛 . (6.23)

We now prove this by induction on 𝑛. Again, we give the proof in full, with the
new/changed parts in blue.

Inductive Basis:
4
• 𝑛 = 1: we have 𝑓1 = 1, and our lower bound is 9 ⋅ 1.5 = 23 ≤1, so the lower
bound holds.
4
• 𝑛 = 2: we have 𝑓2 = 1, and our lower bound is 9 ⋅ 1.52 = 49 ⋅ 94 =1, so the lower
bound holds (with equality).

Inductive Step:
Let 𝑘 ≥ 2.
Assume that 𝑓𝑙 ≥ 49 ⋅ 1.5𝑙 for all 𝑙 ≤ 𝑘. (This is the Inductive Hypothesis.)
Now consider 𝑓𝑘+1 . Since 𝑘 ≥ 2, we have 𝑘 +1 ≥ 3, so we can apply the recursive
rule (6.12):

𝑓𝑘+1 = 𝑓𝑘 + 𝑓𝑘−1 (by (6.12), and using 𝑘 ≥ 2)


4 𝑘 4 𝑘−1
≥ 9 ⋅ 1.5 + 9 ⋅ 1.5 (by the Inductive Hypothesis, twice)
4 𝑘 𝑘−1
= 9 (1.5 + 1.5 )
4 𝑘+1
≥ 9 ⋅ 1.5 .

This completes the Inductive Step.

Conclusion:
Therefore, by Mathematical Induction, our lower bound (6.23) holds for all 𝑛.

In the Inductive Step, we used the fact that

1.5𝑘 + 1.5𝑘−1 ≥ 1.5𝑘+1 .


198 SEQUENCES & SERiES

The Inductive Step would still work for any bound of the form

𝑓𝑛 ≥ 𝛼 𝑟 𝑛

provided 𝑟 satisfies
𝑟 𝑘 + 𝑟 𝑘−1 ≥ 𝑟 𝑘+1 .
Dividing each side by 𝑟 𝑘−1 , this is equivalent to

𝑟 + 1 ≥ 𝑟2. (6.24)

If we had equality, we would have the same quadratic equation we had previously, in
(6.21), which has the two roots 𝑟1 and 𝑟2 given in (6.22). To ensure that the inequality
(6.24) is satisfied, we require that 𝑟 lies between those two roots:

𝑟2 ≤ 𝑟 ≤ 𝑟 1 .

We can let 𝑟 be equal to 𝑟1 or 𝑟2 , and still the inequality is satisfied, which means the
Inductive Step should still work, but we must still then choose 𝛼 so that the Inductive
Basis works too.
Suppose we use 𝑟1 = (1 + √5)/2 = 1.61803 … and let us determine 𝛼 so that the
proposed lower bound
𝑓𝑛 ≥ 𝛼 𝑟1𝑛
works. For the Inductive Basis, we require

𝑓1 = 1 ≥ 𝛼 𝑟11 and 𝑓2 = 1 ≥ 𝛼 𝑟12 .

For these to hold, we need

𝛼 ≤ 𝑟1−1 and 𝛼 ≤ 𝑟1−2 . (6.25)

Now the fact that 𝑟1 > 1 implies that 𝑟1−2 < 𝑟1−1 , so in fact the second of the two inequal-
ities (6.25) implies the first, so we only need to require that

𝛼 ≤ 𝑟1−2

Now,
−2
1 + √5 3 − √5
𝑟1−2 = ⒧ ⒭ = = 0.381966 … .
2 2

We choose 𝛼 = 𝑟1−2 and propose the lower bound

𝑓𝑛 ≥ 𝑟1−2 𝑟1𝑛
6.7 T H E F i B O N A C C i S E Q U E N C E 199

which is just
𝑓𝑛 ≥ 𝑟1𝑛−2 . (6.26)
We now give the proof by induction that this holds for all 𝑛 ∈ ℕ.

Inductive Basis:

• 𝑛 = 1: we have 𝑓1 = 1, and our lower bound is 𝑟1 1−2 = 𝑟1 −1 = 0.61803 …, so the


lower bound holds.

• 𝑛 = 2: we have 𝑓2 = 1, and our lower bound is 𝑟1 2−2 = 1, so the lower bound


holds (with equality).

Inductive Step:
Let 𝑘 ≥ 2.
Assume that 𝑓𝑙 ≥𝑟1 𝑙−2 for all 𝑙 ≤ 𝑘. (This is the Inductive Hypothesis.)
Now consider 𝑓𝑘+1 . Since 𝑘 ≥ 2, we have 𝑘 +1 ≥ 3, so we can apply the recursive
rule (6.12):

𝑓𝑘+1 = 𝑓𝑘 + 𝑓𝑘−1 (by (6.12), and using 𝑘 ≥ 2)


𝑘−2 𝑘−3
≥ 𝑟1 + 𝑟1 (by the Inductive Hypothesis, twice)
= 𝑟1 𝑘−1 (by (6.21) and (6.22))
(𝑘+1)−2
= 𝑟1

This completes the Inductive Step.

Conclusion:
Therefore, by Mathematical Induction, our lower bound (6.26) holds for all 𝑛.

6.7.3 Asymptotic behaviour

From § 6.7.1 and § 6.7.2 we have

𝑟1𝑛−2 ≤ 𝑓𝑛 ≤ 𝑟1𝑛 .

Dividing each side by 𝑟1𝑛 , we have

𝑓𝑛
𝑟1−2 ≤ ≤ 1.
𝑟1𝑛

This tells us that 𝑓𝑛 always lies within a constant ratio of 𝑟1𝑛 . Although the exact ratio
𝑓𝑛 /𝑟1𝑛 varies, it is constrained to lie between lower and upper bounds that are constant,
i.e., these bounds do not depend on 𝑛.
This tells us that the growth of 𝑓𝑛 , as 𝑛 increases, is very like the growth of 𝑟1𝑛 .
200 SEQUENCES & SERiES

6.7.4 An exact formula

We have managed to “sandwich” 𝑓𝑛 quite tightly between lower and upper bounds that
grow in the same way as 𝑛 increases. Encouraged by this success, we try now to get the
two bounds to coincide, so that we obtain an exact formula for 𝑓𝑛 .
When we looked for upper and lower bounds of the form 𝑟 𝑛 , we used the quadratic
equation
𝑟 2 − 𝑟 − 1 = 0. (6.27)
The intuition behind this is that any 𝑟 that satisfies this equation also satisfies

𝑟 𝑛 = 𝑟 𝑛−1 + 𝑟 𝑛−2 , (6.28)

so that 𝑟 𝑛 actually satisfies the Fibonacci recurrence (where each term is the sum of the
two previous terms), although dealing with the base case is a separate matter.
We took the largest root 𝑟1 = (1 + √5)/2, which is also the only positive root. For
upper bounds, we chose 𝑟 to be ≥ 𝑟1 , while for lower bounds, we chose 𝑟 to be ≤ 𝑟1 .
(For the lower bound, we also needed 𝑟 ≥ 𝑟2 , but this constraint did not rear its head at
us, as the values we considered were so close to 𝑟1 .) In each case, we observed that we
could actually take 𝑟 to be equal to 𝑟1 , thereby using the same value, 𝑟1 , in both upper
and lower bounds. The distinction between upper and lower bounds came down to our
choice of constant factors, 𝛼, and these were determined by the need for our bounds to
work on the base cases 𝑛 = 1 and 𝑛 = 2.
This suggests that an expression of the form

𝛼1 𝑟1𝑛 ,

with suitable choice of constant factor 𝛼1 , might be a very good estimate for 𝑓𝑛 . But it
won’t be exact, so we need something else too.
We have neglected the smaller root, 𝑟2 , so far. But it does provide us with something
else that satisfies the recurrence relation, because

𝑟2𝑛 = 𝑟2𝑛−1 + 𝑟2𝑛−2 ,

by (6.28). So, if we add some multiple of 𝑟2𝑛 to any expression satisfying the Fibonacci
recurrence, then the new expression will also satisfy the Fibonacci recurrrence (although
not necessarily the base cases).
Suppose, then, that our exact formula has contributions not just of the form 𝑟1𝑛 ,
using the larger root 𝑟1 of (6.31), but also of the form 𝑟2𝑛 , using the smaller root 𝑟2 .
Give each of these contributions its own factor; call these factors 𝛼1 and 𝛼2 , respectively.
Then we seek a formula of the form

𝑓𝑛 = 𝛼1 𝑟1𝑛 + 𝛼2 𝑟2𝑛 .
6.7 T H E F i B O N A C C i S E Q U E N C E 201

We now have two constants to vary, namely 𝛼1 and 𝛼2 , in our quest for an exact
expression for 𝑓𝑛 . This seems like progress, because the two base cases give us two
conditions to satisfy. Previously, when we only looked for expressions of the form 𝛼1 𝑟1𝑛 ,
we had only one constant to play with, yet we had two base cases to attend to, for 𝑓1
and 𝑓2 . That was ok when we only wanted bounds (lower or upper); in that situation,
it’s fine if we only get inequality (rather than equality) in one of the base cases, as long
as the inequality goes in the right direction (and we were always able to achieve this).
But if we want an exact formula for 𝑓𝑛 , then we need the formula to work exactly for
both the base cases. That’s two equations (𝑓1 = 1, 𝑓2 = 1), so it’s a good idea to have
two variables to solve for. Let’s see what 𝛼1 and 𝛼2 can do for us.
What should 𝛼1 and 𝛼2 be? Consider the two base cases. When 𝑛 = 1, we have
𝑓1 = 1, so we need
𝛼1 𝑟1 + 𝛼2 𝑟2 = 1.
When 𝑛 = 2, we have 𝑓2 = 1, so we need

𝛼1 𝑟12 + 𝛼𝑟22 = 1.

Here we have two linear equations in the two unknowns 𝛼1 and 𝛼2 . We then use our
favourite technique for solving such systems. Once we have done so, we find that
1 −1
𝛼1 = , 𝛼2 = .
√5 √5

So we propose the formula


−1 −1
𝑓𝑛 = √5 𝑟1𝑛 − √5 𝑟2𝑛 . (6.29)
Given the fame of the Fibonacci numbers, it is worth giving this in full detail too:
𝑛 𝑛
1 1 + √5 1 1 − √5
𝑓𝑛 = ⒧ ⒭ − ⒧ ⒭ . (6.30)
√5 2 √5 2

We now prove by induction that it works. Once again, the new or modified parts of the
proof are in blue.

Inductive Basis:
−1 −1 −1
• 𝑛 = 1: we have 𝑓1 = 1, and our formula gives √5 𝑟11 − √5 𝑟21 = √5 (𝑟1 −
−1
𝑟2 ) = √5 √5 = 1, so the formula works exactly.
−1 −1 −1
• 𝑛 = 2: we have 𝑓2 = 1, and our formula gives √5 𝑟12 − √5 𝑟22 = √5 (𝑟12 −
−1 −1
𝑟22 ) = √5 (𝑟1 + 𝑟2 )(𝑟1 − 𝑟2 ) = √5 ⋅ 1 ⋅ √5 = 1, so the formula works exactly
again.
202 SEQUENCES & SERiES

Inductive Step:
Let 𝑘 ≥ 2.
−1 −1
Assume that 𝑓𝑙 = √5 𝑟1𝑙 −√5 𝑟2𝑙 for all 𝑙 ≤ 𝑘. (This is the Inductive Hypothe-
sis.)
Now consider 𝑓𝑘+1 . Since 𝑘 ≥ 2, we have 𝑘 +1 ≥ 3, so we can apply the recursive
rule (6.12):

𝑓𝑘+1 = 𝑓𝑘 + 𝑓𝑘−1 (by (6.12), and using 𝑘 ≥ 2)


−1 −1 −1 −1
= (√5 𝑟1𝑘 −√5 𝑟2𝑘 ) + (√5 𝑟1𝑘−1 −√5 𝑟2𝑘−1 )
(by the Inductive Hypothesis, twice)
−1 −1 −1 −1
= (√5 𝑟1𝑘 + √5 𝑟1𝑘−1 ) − (√5 𝑟2𝑘 + √5 𝑟2𝑘−1 )
(rearranging, to collect like roots together)
−1 −1
= √5 (𝑟1𝑘 + 𝑟1𝑘−1 ) − √5 (𝑟2𝑘 + 𝑟2𝑘−1 )
−1 −1
= √5 𝑟1𝑘+1 −√5 𝑟2𝑘+1 (by (6.21) and (6.22)).

This completes the Inductive Step.

Conclusion:
Therefore, by Mathematical Induction, our formula (6.29) holds for all 𝑛.

We now have an exact formula for the 𝑛-th term of the Fibonacci sequence. The
formula itself is certainly not obvious. In fact, if you only know the standard recursive
definition and then this formula is presented to you “out of the blue”, then it would
probably seem mysterious and complicated, and you would wonder how anyone came
up with it. The appearance of the irrational numbers

1 + √5 1 − √5
,
2 2
would probably seem strange, especially as all the terms in the sequence are positive
integers.
Hopefully this section has removed some of the mystery, and equipped you with the
skills to derive formulas for some other recursively-defined sequences. You can see now
that those two strange irrational numbers are just the two roots of the equation

𝑟 2 − 𝑟 − 1 = 0, (6.31)

which we used in (6.21) and (6.31). And this equation comes directly from the recurrence
relation defining the sequence: a simple rearrangement of the recurrence 𝑓𝑛 = 𝑓𝑛−1 +𝑓𝑛−2
gives
𝑓𝑛 − 𝑓𝑛−1 − 𝑓𝑛−2 = 0. (6.32)
6.7 T H E F i B O N A C C i S E Q U E N C E 203

Compare (6.31) and (6.32). Although they are different kinds of equations (the
former is a quadratic equation in a single real variable, the latter is a linear equation for
any three successive terms in a sequence), they also have a common pattern. In each
case, the coefficients on the left-hand side are the same: 1, −1, −1, and the right-hand
side is 0. In each case, the three terms may be viewed as being “counted” or “indexed” by
three consecutive decreasing integers: for (6.31), the exponents are 2,1,0, while for (6.32),
the subscripts are 𝑛, 𝑛 − 1, 𝑛 − 2. We say that (6.31) is the characteristic equation
for the recurrence relation (6.32).
This is one instance of a general method for solving recurrences: first convert the
recurrence to its characteristic equation, then find the roots of that equation, and then
try to express the 𝑛-th term as a sum, with suitable coefficients, of 𝑛-th powers of the
roots of the characteristic equation. It does not always work quite like this; in particular,
there are some complicating details to attend to when some root of the characteristic
equation occurs more than once. But this link, between the recurrence and the roots
of the characteristic equation, lies at the heart of a general method that can be used to
solve any linear recurrence relation. We will not give the general method in full here; it
is covered in more advanced courses on discrete mathematics.
An important aspect of the expression for the 𝑛-th Fibonacci number, (6.30), is the
dominant role played by
𝑛
1 + √5
⒧ ⒭ .
2

Of the two roots (1 ± √5)/2 of (6.31), this one is the greater in size, so its 𝑛-th power
will grow much larger than the 𝑛-th power of the other root. Furthermore, the size
of that other root is < 1, so its 𝑛-th power rapidly approaches 0, so it really provides
only a small adjustment to the number calculated. So Fibonacci numbers will be well
approximated, for large 𝑛, by
𝑛
1 1 + √5
𝑓𝑛 ≈ ⒧ ⒭ .
√5 2

This quantity is never an integer, but it does make the nature of the growth of 𝑓𝑛 clear.
It could be said that (𝑓𝑛 ∶ 𝑛 ∈ ℕ) is approximately geometric for large 𝑛, and it shows
that the ratio between successive terms, as 𝑛 → ∞, tends to (1 + √5)/2.
These observations align with our discussion in § 6.7.3, where we observed that the
sequence grows like 𝑟1𝑛 , where 𝑟1 = (1 + √5)/2.
This limiting ratio of the Fibonacci numbers, (1 + √5)/2, is the famous quantity
known as the golden ratio,
ratio often denoted by 𝜑 or 𝜏. In decimal form, it is

1.6180339887 … .

It is rich in mathematical properties and pops up all over the place, not only in computer
science but also in nature and in art. It has an important geometric interpretation: if
204 SEQUENCES & SERiES

sequence list of terms eventual behaviour



1 1 1 1 1
⒧ ⒭ 1, , , , , … approaches 0 arbitrarily closely
𝑛 𝑛=1 2 3 4 5

(−1)𝑛 ∞ 1 1 1 1
⒧ ⒭ −1, , − , , − , … approaches 0 arbitrarily closely
𝑛 𝑛=1 2 3 4 5

1 1 2 3 4
⒧1 − ⒭ 0, , , , , … approaches 1 arbitrarily closely
𝑛 𝑛=1 2 3 4 5

1 1 3 7 15
⒧1 − ⒭ 0, , , , , … approaches 1 arbitrarily closely
2𝑛 𝑛=1 2 4 8 16

(−1)𝑛 ∞
𝑛 1 2 3 4 5 6 odd terms approach −1,
⒧(−1) − ⒭ 0, , − , , − , , , …
𝑛 𝑛=1 2 3 4 5 6 7 even terms approach 1.

Table 6.1: Some sequences and their eventual behaviour.

you take a rectangle whose side lengths are in this ratio (i.e., with the golden ratio as its
aspect ratio), and you trim it by one straight cut to create a square based on the shorter
side of the rectangle, then the smaller rectangle you trim off has the same proportions
as the one you started with. Such a rectangle is called a golden rectangle.
rectangle

6.8 LiMiTS OF iNFiNiTE SEQUENCES

Given a number sequence, we can ask, how does it behave in the long run? What can
we say about how it grows (or declines) as 𝑛 gets very large? We have considered this
already for the sequence of reciprocals of positive integers (§ 6.5) and the Fibonacci
sequence (§ 6.7).
Sequences are very diverse, and various kinds of behaviour are possible. Some se-
quences increase without bound, such as (𝑛)∞ 2 ∞ ∞
𝑛=1 or (𝑛 )𝑛=1 or (log 𝑛)𝑛=1 . Others decrease
∞ 2 ∞ ∞
without bound, such as (−𝑛)𝑛=1 or (−𝑛 )𝑛=1 or (− log 𝑛)𝑛=1 . Others jump around; these
may be bounded both above and below, like ((−1)𝑛 )∞ 𝑛=1 , which starts −1, 1, −1, 1, …, or
unbounded above and below, like ((−1)𝑛 𝑛)∞ 𝑛=1 , which starts −1, 2, −3, 4, …, or bounded
above and unbounded below, or unbounded above and bounded below.
Some sequences eventually “settle down” in such a way that, beyond a certain point,
the terms look like approximations to some specific number, with these approximations
getting arbitrarily close. Consider the examples in Table 6.1.
Intuitively, the limit of a sequence is a number which the sequence’s terms get closer
and closer to, forever, and the terms get arbitrarily close, meaning that however close
6.8 L i M i T S O F i N F i N i T E S E Q U E N C E S 205

you want them to be, they eventually get that close or closer, and stay that close or
closer from some point onwards.
Informally, the limit of a sequence (𝑎𝑛 )∞ 𝑛=1 is a number ℓ such that, however close
to ℓ you want to get, there is a position in the sequence such that all terms beyond that
position are within that distance of ℓ. You can set your required distance from ℓ, which
we call 𝜀, to be as small as you like (except it can’t be 0), and there is always some
position 𝑁 such that all terms from that position onwards are within that distance 𝜀
of ℓ.
Formally, the sequence (𝑎𝑛 )∞𝑛=1 has limit ℓ, and we write

lim 𝑎𝑛 = ℓ,
𝑛→∞

if for all 𝜀 > 0 there exists 𝑁 ∈ ℕ such that for all 𝑛 ≥ 𝑁 we have

|𝑎𝑛 − ℓ| < 𝜀.

We can write this very succinctly and symbolically if we wish:

lim 𝑎𝑛 = ℓ ⟺ ∀𝜀 > 0 ∃𝑁 ∈ ℕ ∀𝑛 ≥ 𝑁 |𝑎𝑛 − ℓ| < 𝜀.


𝑛→∞

It might help to study carefully how the various parts of this formal definition align with
the intuitive description we gave earlier.

lim𝑛→∞ 𝑎𝑛 = ℓ means:

in symbols in words

∀𝜀 > 0 for any desired distance from ℓ, no matter how small,


∃𝑁 ∈ ℕ there is a point in the sequence such that,
∀𝑛 ≥ 𝑁 for any position beyond that point,
|𝑎𝑛 − ℓ| < 𝜀. the term at that position is within the desired distance from ℓ.

For example, suppose 𝑎𝑛 = 1/𝑛. (See § 6.5 and, later, § 6.15 for discussion of
this important sequence.) We claim that its limit, as 𝑛 → ∞, is 0. This makes sense
intuitively, as these numbers 1/𝑛 get smaller and smaller, getting as close to 0 as you like.
Let us see how this notion aligns with the definition given in the previous paragraph.
Pick any positive real number 𝜀, as our measure of how close to 0 we want to get. How
far along the sequence do we have to go, to get and stay that close, or closer? This is
the role of 𝑁 , and this depends on 𝜀. In this case, we can pick 𝑁 to be any positive
integer greater than 1/𝜀:
1
𝑁 > , 𝑁 ∈ ℕ. (6.33)
𝜀
206 SEQUENCES & SERiES

This ensures that, for any 𝑛 ≥ 𝑁 , we also have


1
𝑛> ,
𝜀
which is equivalent to
1
< 𝜀.
𝑛
Therefore, for any 𝜀, there does indeed exist 𝑁 — namely, any positive integer > 1/𝜀 —
such that
1 1
𝑛≥𝑁 ⇒ 𝑛> (because 𝑁 > , by (6.33))
𝜀 𝜀
1
⇒ <𝜀
𝑛
1
⇒  − 0 < 𝜀.
𝑛

The last inequality here says that the distance between the term 𝑎𝑛 = 1/𝑛 and 0 is < 𝜀.
So we have shown that
1
lim = 0. (6.34)
𝑛→∞ 𝑛

For another example, suppose 𝑎𝑛 = 𝑛/(𝑛 +1). What is its limit as 𝑛 → ∞? Consider
the first few terms:
1 2 3 4
, , , ,….
2 3 4 5
From these, it looks like the terms approach 1 from below. In general, the numerator
and denominator of 𝑛/(𝑛 + 1) are very similar, differing only by 1, and this difference
should matter less and less as 𝑛 gets larger and larger. So, intuitively, we might expect
the limit as 𝑛 → ∞ to be 1. We now show that this is indeed the case.
Let 𝜀 > 0. We want to make 𝑛 large enough to ensure that

𝑛
 − 1 < 𝜀. (6.35)
𝑛+1

On the left-hand side here, we are taking the absolute value of a negative quantity, since
𝑛/(𝑛 + 1) < 1. This observation helps us remove the absolute value function and then
we can simplify using algebra:

𝑛 𝑛
 − 1 = 1 −
𝑛+1 𝑛+1
(𝑛 + 1) − 𝑛
=
𝑛+1
1
= .
𝑛+1
6.8 L i M i T S O F i N F i N i T E S E Q U E N C E S 207

So, to ensure that (6.35) holds, it would be sufficient to ensure that

1
< 𝜀.
𝑛+1
But this is equivalent to
1
𝑛+1 >
𝜀
which in turn is equivalent to
1
𝑛> − 1.
𝜀
We can now write our limit proof. Given 𝜀, we define 𝑁 to be a positive integer >
(1/𝜀) − 1. Then, for any 𝑛 ≥ 𝑁 , we have 𝑛 > (1/𝜀) − 1 too. But, as we have just seen,
this is equivalent to
1
<𝜀
𝑛+1
which in turn is equivalent to
𝑛
 − 1 < 𝜀.
𝑛+1
Since this inequality holds for all 𝑛 ≥ 𝑁 , we have completed the proof that
𝑛
lim = 1.
𝑛→∞ 𝑛+1
For yet another example, suppose 𝑎𝑛 = 1/2𝑛 . Again, we claim the limit as 𝑛 → ∞
is 0. Pick any 𝜀 > 0. To get 1/2𝑛 to be < 𝜀, we need 1/2𝑛 < 𝜀, which is equivalent to
2𝑛 > 1/𝜀. Taking logarithms of each side, we obtain 𝑛 > log2 (1/𝜀). So we can take any
positive integer 𝑁 satisfying

𝑁 > log2 (1/𝜀), 𝑁 ∈ ℕ. (6.36)

Having chosen this 𝑁 , any 𝑛 ≥ 𝑁 satisfies 𝑛 > log2 (1/𝜀), and therefore satisfies 1/2𝑛 < 𝜀.
So
1
𝑛 ≥ 𝑁 ⇒  𝑛 − 0 < 𝜀.
2
So we have
1
lim = 0. (6.37)
𝑛→∞ 2𝑛
Now let 𝑟 be any real constant in the range −1 < 𝑟 < 1, and suppose 𝑎𝑛 = 𝑟 𝑛 . Now
what is lim𝑛→∞ 𝑎𝑛 ? The previous example is the case 𝑟 = 1/2, and the argument given
there can be adapted to handle any 𝑟 in the range −1 < 𝑟 < 1.
The algebra is neater when 0 < 𝑟 < 1, so we do that first. Given 𝜀 > 0, choose
positive integer 𝑁 > log1/𝑟 (1/𝜀). Then, for any 𝑛 ≥ 𝑁 , we have

𝑛 > log1/𝑟 (1/𝜀).


208 SEQUENCES & SERiES

Raising 1/𝑟 to the power of each side, we have

(1/𝑟)𝑛 > 1/𝜀.

Taking the reciprocal of each side, which entails reversing the inequality, gives

𝑟𝑛 < 𝜀

which immediately implies


|𝑟 𝑛 − 0| < 𝜀.
Since this holds for all 𝑛 ≥ 𝑁 , we have proved that

lim 𝑟 𝑛 = 0. (6.38)
𝑛→∞

The case 𝑟 = 0 is trivial, since then we have 𝑟 𝑛 = 0𝑛 = 0 for all 𝑛, so the limit is
clearly 0 because the value is always 0.
If −1 < 𝑟 < 0, then 0 < −𝑟 < 1, so we can apply our earlier observation (6.38), with
−𝑟 instead of 𝑟, to deduce that

lim (−𝑟)𝑛 = 0.
𝑛→∞

This sequence ((−𝑟)𝑛 )∞𝑛=1 consists of positive terms that decrease towards 0, approaching
0 in the limit as 𝑛 → ∞. It follows that the sequence of its negations, (−(−𝑟)𝑛 )∞ 𝑛=1 ,
consists entirely of negative terms that increase towards 0, with

lim −(−𝑟)𝑛 = 0.
𝑛→∞

Now the sequence we are interested in, (𝑟 𝑛 )∞


𝑛=1 , is “sandwiched” between these two
𝑛
sequences: each of its terms 𝑟 satisfies

−(−𝑟)𝑛 ≤ 𝑟 𝑛 ≤ (−𝑟)𝑛 .

It follows that the terms 𝑟 𝑛 have no choice but to also converge on 0 as 𝑛 → ∞. They
alternate between being positive and negative, but their sizes decrease towards 0 and
approach it in the limit.:
lim 𝑟 𝑛 = 0.
𝑛→∞

In conclusion, we have

Theorem 25.
25 If −1 < 𝑟 < 1 then

lim 𝑟 𝑛 = 0.
𝑛→∞
6.8 L i M i T S O F i N F i N i T E S E Q U E N C E S 209

We will use this fact later, when we consider sums of infinite geometric sequences
(§ 6.14).

Another important limit, which takes a bit more effort to prove, is the limit of an
𝑛-th root (rather than an 𝑛-th power, as above). If 𝑟 > 0 then we use 𝑟 1/𝑛 to refer to
the sole positive 𝑛-th root of 𝑟.

Theorem 26.
26 If 𝑟 > 0 then
lim 𝑟 1/𝑛 = 1.
𝑛→∞

The condition 𝑟 > 0 is needed because for 𝑟 < 0 taking 𝑛-th roots can involve com-
plex numbers and there may be no positive root or no real root at all.

Once we have established some basic limits like these, we can use them to derive
other limits using standard principles for combining limits. One of these principles is:

The limit of a sum is the sum of the limits.

If we have two sequences (𝑎𝑛 ∶ 𝑛 ∈ ℕ) and (𝑏𝑛 ∶ 𝑛 ∈ ℕ), and

lim 𝑎𝑛 = 𝑎, lim 𝑏𝑛 = 𝑏,
𝑛→∞ 𝑛→∞

then the sequence (𝑎𝑛 + 𝑏𝑛 ∶ 𝑛 ∈ ℕ), formed by adding the corresponding terms in those
two sequences together, has limit 𝑎 + 𝑏:

lim (𝑎𝑛 + 𝑏𝑛 ) = 𝑎 + 𝑏.
𝑛→∞

Similarly,

lim (𝑎𝑛 − 𝑏𝑛 ) = 𝑎 − 𝑏,
𝑛→∞
lim (𝑎𝑛 𝑏𝑛 ) = 𝑎𝑏,
𝑛→∞
lim (𝑎𝑛 /𝑏𝑛 ) = 𝑎/𝑏,
𝑛→∞

with the last one requiring that 𝑏𝑛 > 0 for all 𝑛 and also 𝑏 > 0.
For example,

lim (31/𝑛 + 3−𝑛 ) = lim ⒧31/𝑛 + ( 13 )𝑛 ⒭


𝑛→∞ 𝑛→∞

= lim 31/𝑛 + lim ( 13 )𝑛


𝑛→∞ 𝑛→∞

= 1+0 (by Theorem 25 and Theorem 26)


= 1.
210 SEQUENCES & SERiES

Some sequences increase without bound. They have no finite limit, and not even
any finite upper bound. In such cases, we can say that their limit, as 𝑛 → ∞, is ∞. For
example,
lim 𝑛2 = ∞.
𝑛→∞

If a sequence at some point goes below 0 and keeps decreasing without any finite lower
bound, then we can say that its limit, as 𝑛 → ∞, is −∞. For example,

lim −𝑛2 = −∞.


𝑛→∞

We are often interested in sequences that grow without bound. This is the typical
situation for the running time of a nontrivial algorithm, as a function of some positive
integer measure 𝑛 of the input size. (For example, 𝑛 might be the number of bytes in
an input file, or the number of names in a list to be sorted, or the number of digits in
a number to be factorised.) In this kind of situation, just saying that the limit is ∞
doesn’t say much and isn’t very useful. For example, in different situations you may
𝑛
have algorithms that take time 𝑛, or 𝑛2 , or √𝑛, log𝑛 , or 𝑛 log 𝑛, or 2𝑛 , or 𝑛!, or 22 .
These are wildly different running times, but stating their limits does not reflect this:

lim log 𝑛 = ∞,
𝑛→∞

lim √𝑛 = ∞,
𝑛→∞
lim 𝑛 = ∞,
𝑛→∞
lim 𝑛 log 𝑛 = ∞,
𝑛→∞
lim 𝑛2 = ∞,
𝑛→∞

lim 2𝑛 = ∞,
𝑛→∞
lim 𝑛! = ∞,
𝑛→∞
𝑛
lim 22 = ∞.
𝑛→∞

To talk intelligently about the running time of an algorithm as 𝑛 grows, it’s not enough
to just say “it keeps growing without bound”, which is all we are saying here. We need
to be able to say something that indicates how it grows.

6.9 B i G - O N O TAT i O N

Many sequences we encounter in computer science describe quantities that grow without
bound as 𝑛 increases. As we saw at the end of the previous section, just stating that
their limit is ∞ doesn’t really capture how they grow. In this section, we describe
6.9 B i G - O N O TAT i O N 211

standard notation used throughout computer science for the really important part of
the growth behaviour of a sequence.
Functions like 𝑛2 and log 𝑛 are simple and well-known, so that the growth of se-
quences based on them is well understood. But sequences in computer science are often
significantly more complicated. In particular, the exact running times of a program
can be very complicated indeed, in fact it is often not possible to write down an exact
formula for it. But we we still need to quantify how the costs of running the program
grow as the input gets larger and larger.
Suppose we have a sequence (𝑡𝑛 ∶ 𝑛 ∈ ℕ) whose 𝑛-th term is given by

𝑡𝑛 = 100𝑛 + 10𝑛2 + 2𝑛 + log 𝑛.

This might conceivably be the running time of a program that has four successive stages,
with these stages taking time 100𝑛, 10𝑛2 , 2𝑛 and log 𝑛, respectively (where 𝑛 is the input
size). This expression for 𝑡𝑛 has four summands, each contributing to the growth of this
sequence. You can use a spreadsheet or program to study how each summand grows as
𝑛 increases. You’ll find that, once 𝑛 ≥ 10, the summand 2𝑛 is greater than each of the
others. As 𝑛 grows beyond that, 2𝑛 rapidly dwarfs the others. For 𝑛 = 20, this summand
is > 1, 000, 000, but the other summands are each ≤ 4, 000.
So we can say that the growth of 𝑡𝑛 as 𝑛 → ∞ is dominated by the growth of 2𝑛 .
If this summand had been 3 ⋅ 2𝑛 instead of 2𝑛 , then it would be even more dominant.
But the constant factor 3 is relatively unimportant compared with 2𝑛 . In fact, regardless
of the constant, this would still have been the dominant summand. Even if the constant
factor had been 0.01, as in

𝑡𝑛 = 100𝑛 + 10𝑛2 + 0.01 ⋅ 2𝑛 + log 𝑛,

that summand 0.01 ⋅ 2𝑛 would eventually dominate the others. (Try 𝑛 ≥ 20.)
For sequences representing running times of programs, it might be infeasible to pin
down the exact running time. It might just be too complicated, or it might depend
on some information we don’t know in advance, like the exact way the various data
structures are laid out in memory. But we still need to be able to make well-founded
quantitative statements about how long programs take.
It is important to be able to make positive statements, like

This program takes at most time 2𝑛 .

This identifies something the program can be guaranteed to do, and in particular it
guarantees that a certain quantity of a key resource — in this case, time — is sufficient
for the program to do its job, even if it might actually use less. Such statements are
useful when budgeting computational resources and when estimating how much more
time the program might take as input sizes grow.
Another issue is that we don’t want to be unduly distracted by behaviour for small
𝑛. It is common for some small inputs to need some kind of special treatment, which
212 SEQUENCES & SERiES

means that the time taken to deal with them is not typical of behaviour on larger inputs.
Sequences often take some time to “settle down” into their eventual pattern of behaviour;
we have seen this already with the Fibonacci sequence (§ 6.7).
So, in summary, given a sequence of growing terms, we would like to make statements
about its growth that

• focus on its eventual behaviour,

• focus on the dominant contribution, and

• give an upper bound.

With this motivation, we make the following definition.


Let (𝑎𝑛 ∶ 𝑛 ∈ ℕ) be any sequence of nonnegative real numbers, and let 𝑓 ∶ ℕ → ℝ. We
write
𝑎𝑛 = 𝑂(𝑓(𝑛)),
and read it as “𝑎𝑛 is big-O 𝑓(𝑛)”, if there is a constant 𝑐 and a positive integer 𝑁 such
that, for all 𝑛 ≥ 𝑁 ,
𝑎𝑛 ≤ 𝑐 ⋅ 𝑓(𝑛).
We can express this using quantifiers:

𝑎𝑛 = 𝑂(𝑓(𝑛)) ⟺ ∃𝑐, 𝑁 ∀𝑛 ≥ 𝑁 𝑎𝑛 ≤ 𝑐 ⋅ 𝑓(𝑛).

For example, consider again

𝑡𝑛 = 100𝑛 + 10𝑛2 + 2𝑛 + log 𝑛.

We observed earlier that each of the other summands is ≤ 2𝑛 when 𝑛 ≥ 10. So, for
𝑛 ≥ 10, we have

100𝑛 + 10𝑛2 + ⋅2𝑛 + log 𝑛 ≤ 2𝑛 + 2𝑛 + 2𝑛 + 2𝑛 = 4 ⋅ 2𝑛 .

Referring to the definition of big-O notation, we can take 𝑁 = 10 and 𝑐 = 4, because for
all 𝑛 > 10, we have 𝑡𝑛 ≤ 4 ⋅ 2𝑛 . So we can write

𝑡𝑛 = 𝑂(2𝑛 ). (6.39)

We could also have written


𝑡𝑛 = 𝑂(4 ⋅ 2𝑛 )
or
𝑡𝑛 = 𝑂(0.001 ⋅ 2𝑛 ).
6.9 B i G - O N O TAT i O N 213

These are mathematically correct, but poor use of big-O notation because they include
constant factors. One key point about big-O notation is that it “swallows up” constant
factors, so there’s no need to state such factors explicitly. In this case, observe that

2𝑛 = 𝑂(0.001 ⋅ 2𝑛 ),

because 2𝑛 ≤ 1000 ⋅ (0.001 ⋅ 2𝑛 ), so the definition of big-O is satisfied in this case with
𝑐 = 1000. Since any constant factor can be included inside the big-O, i.e.,

𝑡 𝑛 = 𝑂(𝑏 ⋅ 2𝑛 )

for any constant 𝑏, it is simplest to just omit the constant factor (or use 𝑏 = 1) and focus
on the part that depends on 𝑛.
It is also correct to write
𝑡𝑛 = 𝑂(3𝑛 ),
since 2𝑛 ≤ 3𝑛 . This is unnecessarily loose, and in general we prefer tighter big-O bounds
since they give us stronger claims. But it is sometimes very hard to give very tight
upper bounds, so sometimes we have to be content with looser bounds (which is good
for big-O because it is so forgiving towards constant factors).
Although we can replace 2𝑛 by 3𝑛 in (6.39) and still have a true statement (albeit a
weaker one), we cannot replace 2𝑛 by, say, 1.9𝑛 (or use any other base smaller than 2).
This is because there is no constant 𝑐 such that 𝑐 ⋅ 1.9𝑛 eventually gets (and says) bigger
than 2𝑛 .
Coming up with big-O expressions for sequences is often a matter of picking the
dominant summand and dropping its constant factor. We saw this in our study of 𝑡𝑛
above, (6.39). Some other examples of this type:
1 3 1 2 1
8𝑛 + 4𝑛 + 2𝑛 +1 = 𝑂(𝑛3 ),
1/3 1/2
3𝑛 + 2𝑛 = 𝑂(𝑛1/2 ),
log 𝑛 + log(log 𝑛) + log(log(log 𝑛)) = 𝑂(log 𝑛),
2 2
100𝑛 + 2𝑛 = 𝑂(2𝑛 ),
5 + 𝑛−1 = 𝑂(1),
1 1 1 1
4
+ 3 + 2 = 𝑂 ⒧ 2 ⒭.
𝑛 𝑛 𝑛 𝑛

If the sequence term is a product of sums, then you can find the dominant term in
each sum and then multiply them, simplifying as appropriate. For example:

(√𝑛 + 3𝑛 + 91/𝑛 )(17 + 12 log 𝑛) = 𝑂(𝑛 log 𝑛).


214 SEQUENCES & SERiES

Some useful big-O examples with logarithms:

log(𝑛2 ) = 𝑂(log 𝑛) (because log(𝑛2 ) = 2 log 𝑛)


log(𝑛2 + 𝑛3 + 𝑛4 ) = 𝑂(log 𝑛)
log(3𝑛 ) = 𝑂(𝑛) (because log(3𝑛 ) = 𝑛 log 3, and log 3 is a constant).

6.10 S U M S A N D S U M M AT i O N

For number sequences, we are often interested in the sizes and behaviour of sums of
terms, as well as just the individual terms themselves. For example, if the 𝑛-th term 𝑠𝑛
of a sequence gives the energy consumed during the 𝑛-th second of some computation,
then we may also want to study the total energy consumed so far, at each time. This can
be done using the sum, 𝑆𝑛 = 𝑠1 + 𝑠2 + ⋯ + 𝑠𝑛 , of the first 𝑛 terms of the sequence. These
cumulative sums 𝑆𝑛 give us a new sequence, with each term 𝑆𝑛 of the new sequence
giving the total energy used during all the first 𝑛 seconds of the computation.
Let 𝑀 be a set of indices for a sequence, so 𝑀 = ℕ for an infinite sequence and
𝑀 = [1, 𝑘]ℕ for a finite sequence of 𝑘 terms.
If (𝑠𝑛 )𝑛∈𝑀 is any number sequence, then its sequence of partial sums is the se-
quence (𝑆𝑛 )𝑛∈𝑀 where, for each 𝑛, the term 𝑆𝑛 is defined by

𝑆𝑛 = 𝑠 1 + ⋯ + 𝑠 𝑛 . (6.40)

Since we will be working with partial sums like this a lot now, it is time to introduce
summation notation. The expression
𝑛
 𝑠𝑖
𝑖=1

is an abbreviation for the sum on the right-hand side of (6.40). Let us study it closely.
• The  is a large upper-case Greek sigma. It is called the summation sign and
stands for sum.2

• At the base of the summation sign we see “𝑖 = 1”. On the left-hand side of this equa-
tion we have 𝑖, which is the index of summation or variable of summation.
summation
This will be varied as part of a specification of what things are to be added up.
On the right of “𝑖 = 1” we have 1, which gives the initial value of 𝑖, which means
the very first value we give to 𝑖 when we are specifying the things to be added
up.

• At the top of the summation sign we see 𝑛, which is the very last value we give
to 𝑖 when forming our sum.
2 This notation was introduced by Leonhard Euler in 1755. The Greek letter sigma is the Greek equivalent
of the English ‘s’, the first letter of “sum”.
6.10 S U M S A N D S U M M AT i O N 215

• Taken together, this information specifies the range of summation,


summation namely [1, 𝑛]ℕ .
So the understanding is always that the variable of summation is incremented from
the first value, 1, up to the last value, 𝑛, and that every integer value in that range,
1 ≤ 𝑖 ≤ 𝑛, is a value of 𝑖 that is to be used for the sum.

• But how is each of these values of 𝑖 to be used in forming the sum? This is
specified by the summand (i.e., the thing being added), which comes after the
summation sign (i.e., to its right), and in this case is 𝑠𝑖 . The summand includes
the variable of summation, in this case in its subscript.

• By substituting all numbers in the range of summation into the variable of sum-
mation, we obtain different values of the summand, and we add all these different
values up in order to obtain the whole sum.

In this case, the range of summation is {1, 2, … , 𝑛}, so the variable of summation 𝑖 in
the summand 𝑠𝑖 is given each of these values. This gives the summand values

𝑠1 , 𝑠2 , … , 𝑠𝑛 ,

and these are all added up, yielding


𝑛
 𝑠𝑖 = 𝑠 1 + 𝑠 2 + ⋯ + 𝑠 𝑛 .
𝑖=1

So we can rewrite (6.40) as


𝑛
𝑆𝑛 =  𝑠𝑖 .
𝑖=1

If summation notation seems new to you, be assured that, to a programmer, it is


not as new as it may seem. You do something quite similar when you write a loop to
add up some numbers in a program. This typically has structure like the following:

sum := 0
for each 𝑖 from 1 to 𝑛:
sum := sum + 𝑠𝑖
𝑆𝑛 := sum

Just like summation notation, this specifies

• a variable of summation 𝑖,

• a range of summation {1, … , 𝑛}, and

• a summand 𝑠𝑖 .

There are also some differences with summation notation.


216 SEQUENCES & SERiES

• What we have just given is really a short algorithm (or part of a program in some
programming language), so it does more than just define a sum mathematically;
it also specifies how to compute it. In particular, it specifies a particular order in
which the summands are added, and uses a name (sum) for the partial sums, and
initialises the partial sum at the start. By contrast, the summation notation does
not specify an order of addition. Even though it is natural to add the summands
in order of increasing 𝑖, there is no requirement to do it in that order, or any other
order. The summation specifies the mathematical result of doing the sum without
assuming anything about how it is computed. Since addition is associative and
commutative, the sum is the same regardless of the order of the additions.

• Our summation algorithm, if implemented on a computer, would use computer


representations of numbers rather than numbers themselves, and these introduce
approximations and errors. These approximations also mean that the computer’s
version of addition is not necessarily associative, so that the order of addition can
matter.

The name used for the variable of summation is not important, as long as it is
used consistently. The variable used under the summation sign (the 𝑖 in “𝑖 = 1”) and
the variable of that same name in the summand (the subscript 𝑖 in 𝑠𝑖 ) are the same
variable, as you’d expect. If you want to change the variable name, say to 𝑗, then you
can, provided you change it everywhere: both under the summation sign, and in the
summand. So the sum
𝑛
 𝑠𝑖
𝑖=1

could, for example, be rewritten as


𝑛 𝑛 𝑛 𝑛
 𝑠𝑗 or  𝑠𝑡 or  𝑠𝛼 or  𝑠♡ .
𝑗=1 𝑡=1 𝛼=1 ♡=1

It is common to use 𝑖, 𝑗, 𝑘, 𝑙, 𝑚, 𝑛 as variables of summation, and adherence to conventions


can help readers of your work, just as it does with programming. But it should not be
stuck to, rigidly, just for its own sake, and there are certainly times when other variables
make more sense in a particular context.
Variables of summation are local to the summation. In other words, their scope is
restricted to that summation sign and summand, and if they are used outside the sum,
then that outside use is not the same variable and should be avoided because of the
confusion it can cause. So, in the expression
𝑛
𝑖 +  𝑠𝑖 ,
𝑖=1
6.11 F i N i T E S E R i E S 217

we really have two different variables called 𝑖: the local variable of summation used in
the sum, and the 𝑖 on the very left. In this case, we have not made the meaning of the
first 𝑖 clear, but even if we had, there is possibility of confusion about what the variables
refer to, and this is to be avoided. There is no shortage of variable names: there are
26 letters of the English alphabet and 24 in the Greek alphabet, each with upper case
versions too!3 The same issue arises if we write
𝑛
⒧ 𝑠𝑖 ⒭ + 𝑖. (6.41)
𝑖=1

Here, the 𝑖 on the right is not the variable of summation. If for some reason we must
use 𝑖 on the right here, then it would be better to use a different variable of summation,
such as 𝑗, and this is easily done. This particular expression (6.41) also raises another
issue. The parentheses here make clear that the summand is only 𝑠𝑖 . But what if the
parentheses are omitted?
𝑛
 𝑠𝑖 + 𝑖
𝑖=1

Is this the same as (6.41), or is it a different sum in which the summand is now 𝑠𝑖 + 𝑖?
As it is, the expression is ambiguous, so we should use parentheses to make it clear. This
was done in (6.41), imposing one interpretation, and if we wanted to impose the other
possible interpretation, then we could write
𝑛
 ⒧𝑠𝑖 + 𝑖⒭ .
𝑖=1

In this expression, each successive value of 𝑖 in the range of summation, {1, … , 𝑛} is


substituted for 𝑖 in both places in the summand where 𝑖 appears. So what we obtain is
the sum
(𝑠1 + 1) + (𝑠2 + 2) + ⋯ + (𝑠𝑛 + 𝑛).

6.11 FiNiTE SERiES

A finite series is an expression obtained from a finite sequence by inserting + between


each pair of consecutive terms. We can think of this as replacing the commas in the list
of terms by additions. So a finite sequence

𝑎1 , 𝑎2 , 𝑎3 , … , 𝑎𝑛

gives rise to the finite series


𝑎1 + 𝑎2 + 𝑎3 + ⋯ + 𝑎𝑛 .

3 although there is significant overlap between the two upper-case alphabets.


218 SEQUENCES & SERiES

The value of the series is the number given by the sum.

6.12 FiNiTE ARiTHMETiC SERiES

A finite arithmetic series is a finite series obtained from an arithmetic sequence. If


the arithmetic sequence has 𝑛 terms, first term 𝑎, and common difference 𝑑, then, as we
saw in (6.8), its list of terms is

𝑎, 𝑎 + 𝑑, 𝑎 + 2𝑑, … , 𝑎 + (𝑛 − 1)𝑑.

So the corresponding finite arithmetic series is

𝑎 + (𝑎 + 𝑑) + (𝑎 + 2𝑑) + ⋯ + (𝑎 + (𝑛 − 1)𝑑). (6.42)

What is the sum 𝑆𝑛 of this series?


One way to work this out starts with two copies of the series, one above the other,
but with the second copy in reverse order, so high terms are vertically aligned with low
terms:

𝑆𝑛 = 𝑎 + (𝑎 + 𝑑) + ⋯ + (𝑎 + (𝑛 − 2)𝑑) + (𝑎 + (𝑛 − 1)𝑑)
𝑆𝑛 = (𝑎 + (𝑛 − 1)𝑑) + (𝑎 + (𝑛 − 2)𝑑) + ⋯ + (𝑎 + 𝑑) + 𝑎

Now we add these two equations. The sum of the left-hand sides is 2𝑆𝑛 . What is the sum
of the right-hand sides? We have arranged the terms on the right so that each column of
two terms adds to the same sum, namely 2𝑎 +(𝑛 −1)𝑑. For example, in the first column
on the right, the sum is 𝑎 + (𝑎 + (𝑛 − 1)𝑑) = 2𝑎 + (𝑛 − 1)𝑑; in the second column on the
right, the sum is (𝑎 + 𝑑) + (𝑎 + (𝑛 − 2)𝑑) = 2𝑎 + (𝑛 − 1)𝑑; and so on. We do this for each
of the 𝑛 columns on the right. Each of these 𝑛 column sums is 2𝑎 +(𝑛 −1)𝑑, so the sum
of the two right-hand sides of the two equations is 𝑛(2𝑎 + (𝑛 − 1)𝑑). Equating the sum
of the two left-hand sides and the sum of the two right-hand sides, we obtain

2𝑆𝑛 = 𝑛(2𝑎 + (𝑛 − 1)𝑑).

Therefore
(𝑛 − 1)𝑑 𝑛(𝑛 − 1)
𝑆𝑛 = 𝑛 ⒧𝑎 + ⒭ = 𝑛𝑎 + 𝑑. (6.43)
2 2
At this point, we should check the sanity of our answer.

• In the special case 𝑛 = 1, when the series just has the term 𝑎, then we should have
𝑆𝑛 = 𝑎, and that is indeed what our expression for 𝑆𝑛 in (6.43) gives.

• In the special case 𝑑 = 0, when the series is constant, with every term equal to 𝑎,
its sum should be 𝑆𝑛 = 𝑛𝑎, and our expression agrees with this too.
6.13 F i N i T E G E O M E T R i C S E R i E S 219

• We have already seen an important finite arithmetic series, namely the sum of the
first 𝑛 positive integers, in (3.5) and Theorem 19. In that case, the sum should be
𝑆𝑛 = 𝑛(𝑛 + 1)/2. If we put 𝑎 = 1 and 𝑑 = 1 in (6.43), then we obtain

𝑛(𝑛 − 1) 𝑛−1 2+𝑛−1 𝑛(𝑛 + 1)


𝑛+ = 𝑛 ⒧1 + ⒭ = 𝑛⒧ ⒭= .
2 2 2 2

You should get in the habit of doing checks like these — against very simple special
cases, and particular cases where you already know the answer — whenever you derive
a new expression for something you are trying to compute. Passing these checks doesn’t
prove that your expression is correct, but in practice it often detects errors and helps
you correct them.
In fact, the expression for the sum of the first 𝑛 positive integers (Theorem 19), while
only one particular case of a finite arithmetic series, can nonetheless be used to derive
the expression for the general case. Look again at (6.42), and consider the coefficients
of 𝑑 throughout the right-hand side. These coefficients are 0, 1, 2, …, 𝑛 − 1. So let us
rearrange the right-hand side of (6.42) and collect all the parts involving 𝑑 together.

𝑆𝑛 = 𝑎 + (𝑎 + 𝑑) + (𝑎 + 2𝑑) + ⋯ + (𝑎 + (𝑛 − 1)𝑑)
= (𝑎
+ 𝑎 + ⋯ + 𝑎 ) + (𝑑 + 2𝑑 + ⋯ + (𝑛 − 1)𝑑)
𝑛 copies
(collecting all the copies of 𝑎 together, then multiples of 𝑑)
= 𝑛𝑎 + (1 + 2 + ⋯ + (𝑛 − 1))𝑑
𝑛(𝑛 − 1)
= 𝑛𝑎 + 𝑑 (by Theorem 19).
2
What happens to the expression for 𝑆𝑛 as the number of terms grows? The two parts
of the expression, 𝑛𝑎 and (𝑛(𝑛 − 1)/2)𝑑, grow at different rates; the first is linear in 𝑛,
while the second is quadratic in 𝑛. For large 𝑛, quadratic functions of 𝑛 grow faster
in size than linear functions of 𝑛. What happens for large 𝑛 depends on whether 𝑑 is
positive or negative. When 𝑑 > 0, the sum 𝑆𝑛 is eventually positive (even if 𝑎 < 0) and
becomes larger and larger, and is unbounded. When 𝑑 < 0, the sum 𝑆𝑛 is eventually
negative (even if 𝑎 > 0) and, although it grows in size, it grows in the negative direction,
going lower and lower, with no lower bound.

6.13 FiNiTE GEOMETRiC SERiES

A finite geometric series is a finite series obtained from a geometric sequence. If the
geometric sequence has 𝑛 terms, first term 𝑎, and common ratio 𝑟, then, as in (6.9), its
terms are
𝑎, 𝑎𝑟, 𝑎𝑟 2 , … , 𝑎𝑟 𝑛−2 , 𝑎𝑟 𝑛−1 .
220 SEQUENCES & SERiES

So the corresponding finite geometric series is

𝑎 + 𝑎𝑟 + 𝑎𝑟 2 + ⋯ + 𝑎𝑟 𝑛−2 + 𝑎𝑟 𝑛−1 .

Again we ask, what is its sum 𝑆𝑛 ?


We can easily dispose of the case 𝑟 = 1, for then we are just adding up 𝑛 copies of 𝑎,
so we have 𝑆𝑛 = 𝑛𝑎.
So now suppose 𝑟 ≠ 1.
Probably the easiest way to work out 𝑆𝑛 in general is to first notice that, if we
multiply 𝑆𝑛 by 𝑟, we get something that looks very similar:

𝑟𝑆𝑛 = 𝑎𝑟 + 𝑎𝑟 2 + 𝑎𝑟 3 + ⋯ + 𝑎𝑟 𝑛−1 + 𝑎𝑟 𝑛 .

The only difference with the earlier series is that we have now lost the first term, 𝑎, and
gained a new last term, 𝑎𝑟 𝑛 . Therefore

𝑟𝑆𝑛 = 𝑆𝑛 − 𝑎 + 𝑎𝑟 𝑛 .

Collecting the terms in 𝑆𝑛 together on the left, we have

𝑟𝑆𝑛 − 𝑆𝑛 = −𝑎 + 𝑎𝑟 𝑛 .

A little algebra now gives


(𝑟 − 1)𝑆𝑛 = 𝑎(𝑟 𝑛 − 1).
Therefore
𝑟𝑛 − 1
𝑆𝑛 = 𝑎 ⒧ ⒭. (6.44)
𝑟 −1
Once again, having derived this expression, we should do some simple checks.

• If 𝑛 = 1, then the sequence has the single term 𝑎 so 𝑆1 = 𝑎, and our expression for
𝑆𝑛 agrees with this.

• If 𝑎 = 0 then all terms are 0 so 𝑆𝑛 = 0, and our expression agrees with this.

• In Exercise 3.7 we showed that

1 + 2 + 22 + 23 + ⋯ + 2𝑛 = 2𝑛+1 − 1.

The sum on the left is a finite geometric series with 𝑛 + 1 terms, first term 𝑎 = 1
and common ratio 𝑟 = 2. Our formula gives

2𝑛+1 − 1
𝑆𝑛+1 = 1 ⋅ ⒧ ⒭ = 2𝑛+1 − 1,
2−1

which is correct.
6.14 i N F i N i T E G E O M E T R i C S E R i E S 221

6.14 iNFiNiTE GEOMETRiC SERiES

Consider our expression in (6.44) for the sum 𝑆𝑛 of a finite geometric series of 𝑛 terms.
What happens to 𝑆𝑛 as 𝑛 → ∞?
If 𝑟 > 1 then 𝑟 𝑛 > 0 and 𝑟 𝑛 grows without bound, so the quotient inside the paren-
theses in (6.44) is also positive and grows without bound. The sign of 𝑆𝑛 is the same
as the sign of 𝑎. So, if 𝑎 > 0 then 𝑆𝑛 is positive and increases without bound, while if
𝑎 < 0 then 𝑆𝑛 is negative and decreases without bound (giving larger and larger negative
numbers with no lower bound).
If 𝑟 < −1 in (6.44) then 𝑟 𝑛 alternates in sign as 𝑛 increases, while increasing in size.
The same happens to the numerator of the quotient in (6.44). But the denominator 𝑟 −1
is always negative when 𝑟 < −1. For even 𝑛, we have 𝑟 𝑛 > 1 and 𝑟 𝑛 − 1 > 0, with 𝑟 𝑛
increasing without upper bound as 𝑛 increases, while for odd 𝑛, we have 𝑟 𝑛 < −1 and
𝑟 𝑛 − 1 < −2, with 𝑟 𝑛 decreasing without lower bound as 𝑛 increases. The sign of the
whole expression for each 𝑛 will also depend on the sign of 𝑎, but the general character
of 𝑆𝑛 as 𝑛 increases is clear: alternating between positive and negative numbers, with
each getting larger in size without bound, so 𝑆𝑛 has no upper or lower bound.
We have already seen that, if 𝑟 = 1, then 𝑆𝑛 = 𝑛𝑎. As 𝑛 increases, this increases
without upper bound if 𝑎 > 0 and decreases without lower bound if 𝑎 < 0. i If 𝑟 = −1,
we have 𝑆𝑛 = 𝑎 − 𝑎 + 𝑎 − 𝑎 + ⋯ + (−1)𝑛 𝑎, which is either 𝑎 or 0 according as 𝑛 is odd or
even, respectively. Putting 𝑟 = −1 in (6.44) gives

(−1)𝑛 − 1
𝑆𝑛 = 𝑎 ⒧ ⒭.
−2

When 𝑛 is even, this is 0, and when 𝑛 is odd, it is 𝑎, all as expected. So in this case,
the sequence has no limit, although it is now bounded both above and below.
It remains to consider what happens if −1 < 𝑟 < 1.
We saw in Theorem 25 that, in this case, 𝑟 𝑛 → 0 as 𝑛 → ∞. So, in our expression for
𝑆𝑛 in (6.44), the 𝑟 𝑛 in the numerator vanishes as 𝑛 → ∞. This means that, if −1 < 𝑟 < 1,
then
𝑎
𝑆∞ = lim 𝑆𝑛 = , (6.45)
𝑛→∞ 1−𝑟
where 𝑆∞ represents the sum of the infinite geometric series.
This expression is of fundamental importance and appears throughout computer
science, mathematics, other sciences, engineering, economics, finance, and indeed every
quantitative discipline.

6.15 HARMONiC NUMBERS

In § 6.5 we discussed the sequence (1/𝑛 ∶ 𝑛 ∈ ℕ) as an important example of a harmonic


sequence. We now consider this specific sequence in more detail, because it crops up a
lot in computer science.
222 SEQUENCES & SERiES

The terms in this sequence converge down to 0 as 𝑛 → ∞, as we saw in (6.34) in § 6.8.


Now consider the partial sums of this sequence. We call them the harmonic numbers
and denote them by 𝐻𝑛 :
1 1 1
𝐻𝑛 = 1 + + + ⋯ + .
2 3 𝑛
Even though the individual terms of the sequence of reciprocals diminish to 0, their
partial sums 𝐻𝑛 grow arbitrarily large. Pick any positive number 𝐵, no matter how
large, and you can find 𝑛 such that
1 1 1
1+ + +⋯+ > 𝐵.
2 3 𝑛
This property — terms converge to limit 0, but with unbounded partial sums — is
a very notable one.

• There are many sequences that diminish to 0 and whose partial sums are bounded:
no matter how large 𝑛 is, the partial sum stays below some fixed upper bound, so
in fact the sum of all terms in the sequence is finite. For example, the sequences
(1/𝑛2 ∶ 𝑛 ∈ ℕ) and (1/2𝑛 ∶ 𝑛 ∈ ℕ) both go to 0 as 𝑛 → ∞, and both have finite sums
too. The finite-sum claim is not obvious for the former sequence, but we know the
latter sequence has a finite sum from § 6.14.

• On the other hand, if a sequence of positive numbers does not diminish to 0, then
it can be shown that its partial sums are unbounded. For example, the terms in
the sequence (1 + 𝑛−1 ∶ 𝑛 ∈ ℕ) tend to 1 as 𝑛 → ∞, and their partial sums grow
without bound, since the 𝑛-th partial sum is ≥ 𝑛.

But the sequence of reciprocals (1/𝑛 ∶ 𝑛 ∈ ℕ) is not in either of these camps. Its terms
tend to 0, yet it has unbounded partial sums. In fact, it is one of the more rapidly
diminishing positive sequences with this property. (There are some sequences with this
property that diminish even more quickly than (1/𝑛 ∶ 𝑛 ∈ ℕ), but they do not diminish
much more quickly. Can you find some?)
We saw in Exercise 3.11 that the harmonic numbers behave very much like log 𝑛.
We proved that 𝐻𝑛 ≥ log𝑒 (𝑛 + 1), and mentioned in the solutions (as a suggested further
exercise) that 𝐻𝑛 ≤ (log𝑒 𝑛) + 1. It is known (although beyond the scope of these Course
Notes to prove) that the difference between 𝐻𝑛 and log𝑒 𝑛 converges to a constant:

lim (𝐻𝑛 − log 𝑛) = 𝛾, (6.46)


𝑛→∞

where 𝛾 = 0.5772 … is called Euler’s constant.


constant It is a famous and mysterious number
that arises in a variety of contexts, but we do not even know whether or not it is rational.
The differences 𝐻𝑛 − log𝑒 𝑛 approach their limit 𝛾 from above, so we have

𝐻𝑛 ≥ (log𝑒 𝑛) + 𝛾,
6.16 E X E R C i S E S 223

giving a stronger lower bound than the one from Exercise 3.11. For an upper bound, it
is known that
1
𝐻𝑛 ≤ (log𝑒 𝑛) + 𝛾 + , (6.47)
2𝑛
although we do not prove this. We can see that the difference between these two bounds
diminishes to 0 as 𝑛 → ∞. For very large 𝑛, we have an approximation:

𝐻𝑛 ≈ log𝑒 𝑛 + 𝛾. (6.48)

Harmonic numbers have many applications in computer science, especially in prob-


lems involving the analysis of algorithms and probability, and it is good to be aware of
their basic properties.

6.16 EXERCiSES

1. Give recurrence relations for each of the following sequences:

(a) the positive integers;

(b) the negative even integers;

(c) the squares;

(d) the sequence whose 𝑛-th term is the sum of the first 𝑛 positive integers;

(e) the sequence whose 𝑛-th term is the sum of the reciprocals of the first 𝑛 positive
integers.

2. Define the sequence (𝑎𝑛 ∶ 𝑛 ∈ ℕ) recursively by

𝑎1 = 2, 𝑎𝑛 = 2𝑛 𝑎𝑛−1

(a) Find the first five terms of this sequence.

(b) Prove by induction on 𝑛 that 𝑎𝑛 = 2𝑛 𝑛!.

3. Define the sequence (𝑏𝑛 ∶ 𝑛 ∈ ℕ) recursively by


2
𝑏1 = 3, 𝑏𝑛 = 𝑏𝑛−1

(a) Explore this sequence.

(b) Conjecture a formula for the sequence.

(c) Prove that the formula is correct, by induction.


224 SEQUENCES & SERiES

4. Define the sequence (𝑞𝑛 ∶ 𝑛 ∈ ℕ) recursively by

𝑞1 = 3, 𝑞2 = 5, 𝑞𝑛 = 3𝑞𝑛−1 − 2𝑞𝑛−2 .

(a) Explore this sequence. Find the first seven terms.

(b) Conjecture a formula for the sequence.

(c) Prove that the formula is correct, by induction.

5. Define the sequence (𝑔𝑛 ∶ 𝑛 ∈ ℕ) recursively by

𝑔1 = 0, 𝑔𝑛 = 𝑔𝑛−1 + log 𝑛.

Prove by induction on 𝑛 that

𝑔𝑛 ≤ 𝑛 log 𝑛 − 𝑛 + 𝐻𝑛 ,
1 1
where 𝐻𝑛 = 1 + + ⋯ + (see Exercise 3.11).
2 𝑛
To do this, it will help to use the following inequality, which holds for all 𝑥 ≥ −1:

log(1 + 𝑥) ≤ 𝑥

The following technique can be helpful in relating log 𝑛 and log(𝑛 − 1):

𝑛−1 1 1
log(𝑛 − 1) = log ⒧𝑛 ⋅ ⒭ = log ⒧𝑛⒧1 − ⒭⒭ = log 𝑛 + log ⒧1 − ⒭ .
𝑛 𝑛 𝑛

Throughout this question, logarithms are natural logarithms, i.e., to base 𝑒.

6. Use the explore-conjecture-prove method to develop the best upper bound you
can for the 𝑛-th term 𝑟𝑛 of the sequence defined by the recurrence relation

𝑟1 = 1, 𝑟𝑛 = 3𝑟𝑛−1 − 1,

and prove your bound by induction.


Even better, see if you can find a formula for 𝑟𝑛 , and prove it by induction.

7. The Lucas sequence (𝑙𝑛 ∶ 𝑛 ∈ ℕ) is a close relative of the Fibonacci sequence,


defined by using the same recurrence relation but starting out differently, as follows.

𝑙1 = 1, 𝑙2 = 3, 𝑙𝑛 = 𝑙𝑛−1 + 𝑙𝑛−2 (for 𝑛 ≥ 3). (6.49)


6.16 E X E R C i S E S 225

Adapt the approach of § 6.7.4 in order to develop a formula for 𝑙𝑛 and then prove by
induction that it works.

8. Define the sequence (𝑡𝑛 ∶ 𝑛 ∈ ℕ) by the recurrence relation

1
𝑡1 = 1, 𝑡𝑛 = 3 − 𝑡𝑛−1 .
2
(a) Consider the following theorem, which is correct, and its “proof” by induction,
which is incorrect.
Find the errors in this incorrect proof.
The lines of the proof are numbered, so you can refer to them.

Theorem. For all 𝑛 ∈ ℕ,


5
𝑡𝑛 ≤ .
2
“Proof”.

1. For each 𝑛 ∈ ℕ, let 𝑆(𝑛) be the statement that 𝑡𝑛 ≤ 5/2. We must prove that, for
every 𝑛 ∈ ℕ, the statement 𝑆(𝑛) is true.

2. Inductive basis: Let 𝑛 = 1. Then 𝑡1 = 3 − 12 ⋅ 1 ≤ 5/2. So 𝑆(1) holds.

3. Inductive step:

4. Assume that for some 𝑘 the statement 𝑆(𝑘) holds. (This is the Inductive Hypoth-
esis.)

5. Consider 𝑆(𝑘 + 1).


1
6. 𝑡𝑘+1 = 3 − 𝑡𝑘
2
1 5
7. ≤ 3− ⋅ (by the Inductive Hypothesis)
2 2
5
8. = 3−
4
7
9. =
4
5
10. ≤ .
2

11.

“□”
226 SEQUENCES & SERiES

(b) Write a correct proof by induction for the theorem.

There are (at least) two approaches you could try.

(i) Prove, by induction on 𝑛, that for all 𝑛 ∈ ℕ,

5
1 ≤ 𝑡𝑛 ≤ .
2
So this is a case where it’s actually easier to prove a stronger statement.

(ii) Prove firstly that 𝑆(𝑛) holds for all even 𝑛, and then that it holds for all odd 𝑛.

9. Define the sequence (𝜑𝑛 ∶ 𝑛 ∈ ℕ) recursively by

1
𝜑1 = 1, 𝜑𝑛 = 1 + .
𝜑𝑛−1

(a) Explore this sequence. Find the first 20 terms, using a spreadsheet or program.

(b) Conjecture a value for the limit 𝜑 of the sequence as 𝑛 → ∞.

(c) Roughly speaking, if 𝜑𝑛 ≈ 𝜑𝑛−1 , then we would expect that they are close to the
limit, and so the limit 𝜑 would be expected to satisfy
1
𝜑 = 1+ .
𝜑

Solve this equation, in order to derive a conjectured exact value for the limit.

(d) Suppose you want to prove that all sufficiently large 𝑛, the 𝑛-th term is within 0.001
of this limit. How large is “sufficiently large”? In other words, determine 𝑁 so that,
for all 𝑛 ≥ 𝑁 ,
|𝜑𝑛 − 𝜑| < 0.001.
It’s fine to do this computationally.

(e) Prove that, if


𝜑𝑛 < 𝜑 + 0.001,
then
𝜑𝑛+1 > 𝜑 − 0.001.
Note the change in direction of the inequality.
The following two points may help.
6.16 E X E R C i S E S 227

• When studying how a quantity changes under a small addition, it can some-
times be useful to convert that additive change to a multiplicative change. For
example,
0.001
𝑤 + 0.001 = 𝑤 ⒧1 + ⒭.
𝑤

• The following inequality may help. For all 𝑥 > 0,


1
> 1 − 𝑥.
1+𝑥

(f) Prove that, if


𝜑𝑛 > 𝜑 − 0.001,
then
𝜑𝑛+1 < 𝜑 + 0.001.
Note, again, the change in direction of the inequality.
The following inequality may help. For all 𝑥 satisfying 0 < 𝑥 < 13 ,

1 3
< 1 + 𝑥.
1−𝑥 2

(g) Deduce that, if


𝜑 − 0.001 < 𝜑𝑛 < 𝜑 + 0.001,
then
𝜑 − 0.001 < 𝜑𝑛+1 < 𝜑 + 0.001.

(h) Now use (g) to prove, by induction on 𝑛, that for all 𝑛 ≥ 𝑁 ,

𝜑 − 0.001 < 𝜑𝑛 < 𝜑 + 0.001.

Here, 𝑁 is the “threshold of sufficient largeness” from part (d).

10. The Limit Game is a simple two-player game that can be played on any sequence.
We call the two players Lim and Una. Let (𝑎𝑛 )∞ 𝑛=1 be a sequence. Roughly speaking,
Lim’s aim is to approximate a limit using a term from the sequence, while Una tries to
ensure that the approximation is not as good as it should be. Lim and Una play the
game as follows, with Lim having the first turn, and the two players then taking turns
as follows.

1. Lim’s first turn. Lim chooses a number 𝑎, which is intended to specify a limit.

2. Una’s first turn. Una chooses a real number 𝜀 > 0, which is intended to specify
how closely 𝑎 must be approximated by terms in the sequence.
228 SEQUENCES & SERiES

3. Lim’s second turn. Lim chooses a positive integer 𝑁 , which is intended to specify
some position in the sequence.

4. Una’s second turn. Una chooses a positive integer 𝑛 ≥ 𝑁 , which is intended to


specify some later position in the sequence, beyond position 𝑁 .

The game then ends, and the winner is determined as follows.

• If |𝑎𝑛 − 𝑎| < 𝜀 then Lim wins.


wins

• If |𝑎𝑛 − 𝑎| ≥ 𝜀 then Una wins.


wins

For example, suppose the sequence is ((−1)𝑛+1 /𝑛 ∶ 𝑛 ∈ ℕ). Here are one possible play of
the Limit Game for this sequence.

1. Lim chooses 𝑎 = 12 .

2. Una chooses 𝜀 = 14 .

3. Lim chooses 𝑁 = 3.

4. Una chooses 𝑛 = 5. (This satisfies 𝑛 ≥ 𝑁 .)

Here, Una wins, because

1 1 3 1
|𝑎𝑛 − 𝑎| = |𝑎5 − 𝑎| =  −  = ≥ = 𝜀.
5 2 10 4

(a) In this specific play of the game, could Una have possibly lost on her last move? In
other words, is there a different choice she could have made at that stage, which
would have lost her the game?

(b) Now let’s go back one move. So we just assume that Lim’s and Una’s first turns are
as above. Now consider Lim’s second turn. Could Lim have possibly chosen a value
of 𝑁 which would have guaranteed that he would win the game, no matter what
Una did next?

(c) Now let’s go back further. Assume that Lim has just made his first move, as above.
Consider Una’s options. Are all her options equally good? Is she certain to win, no
matter what 𝜀 she chooses? What advice would you give her about choosing 𝜀?

(d) Now let’s go back to the very first move. What should Lim choose, and why?

(e) Play the game, with a classmate or someone else you know. You can pick whatever
sequences you like.

(f) In general, what property does a sequence need to have in order for Lim to have a
winning strategy in the Limit Game?
6.16 E X E R C i S E S 229

(g) Propose a sequence for which Una has a winning strategy in the Limit Game, and
describe her winning strategy.

(h) In general, what property does a sequence need to have in order for Una to have a
winning strategy in the Limit Game?

11. The Big-O Game is another, shorter two-player game that can be played on
any sequence. For this game, the two players are Oh-Yes and Oh-No. Let (𝑎𝑛 )∞ 𝑛=1 be a
+
sequence and let 𝑓 ∶ ℕ → ℝ0 be a function. The aim of Oh-Yes is to help establish that

𝑎𝑛 = 𝑂(𝑓(𝑛)),

while the aim of Oh-No is to frustrate Oh-Yes. They play the game as follows, with
Oh-Yes having the first turn, and each player having one turn as follows.

1. Oh-Yes’s turn. Oh-Yes chooses two things:


• a constant factor 𝑐 ∈ ℝ+ ;
• a positive integer threshold 𝑁 .

2. Oh-No’s turn. Oh-No chooses a positive integer 𝑛 ≥ 𝑁 , intended to specify some


position at least this far along the sequence.

The game then ends, and the winner is determined as follows.

• If 𝑎𝑛 ≤ 𝑐 𝑓(𝑛) then Oh-Yes wins.


wins

• If 𝑎𝑛 > 𝑐 𝑓(𝑛) then Oh-No wins.


wins

For example, suppose the sequence is (𝑛2 ∶ 𝑛 ∈ ℕ) and the function is 𝑓(𝑛) = 𝑛1.5 . Here
is one possible play of the Big-O Game for this sequence.

1. Oh-Yes chooses constant factor 𝑐 = 100 and threshold 𝑁 = 4

2. Oh-No chooses 𝑛 = 1 000 000.

Here, Oh-No wins, because

𝑎𝑛 = 𝑎1 000 000 = 1 000 0002 = (106 )2 = 1012 > 1011 = 102 (106 )1.5 = 100(1 000 000)1.5 = 𝑐 𝑓(𝑛).

(a) In this play of the game, could Oh-No have possibly lost on their move?

(b) Now let’s go back to the first move. What could Oh-Yes have chosen in order to win
the game, and why?

(c) Play the game, with a classmate or someone else you know. You can pick whatever
sequences and functions you like.
230 SEQUENCES & SERiES

(d) In general, what relationship must hold between the sequence and the function in
order for Oh-Yes to have a winning strategy in the Big-Oh Game?

(e) Propose a sequence and function for which Oh-No has a winning strategy in the
Big-O Game, and describe their winning strategy.

(f) In general, how do the sequence and function have to be related in order for Oh-No
to have a winning strategy in the Big-O Game?

12. Give the simplest big-O expression you can for each of the following.

(a) 1 + 4𝑛 + 6𝑛2 + 4𝑛3 + 𝑛4


𝑛
(b) ⒧ ⒭
2
𝑛
(c) ⒧ ⒭
3
(d) the sum of the first 𝑛 terms in an arithmetic series with first term 1 and common
difference 𝑑

(e) 2𝑛 + 2𝑛−1 + ⋯ + 2 + 1

(f) (𝑛!)1/𝑛

(g) 𝑛1/𝑛

13.
(a) How can the number
999 … 99

𝑛 digits

be interpreted as the sum of a finite geometric series? For that series, what are 𝑎, 𝑟 and
𝑛? What does the formula for 𝑆𝑛 give in this case? Does this make sense?
(b) What about the number
0.999999 … …

14.
(a) Using (6.47), show that
𝐻𝑛 = 𝑂(log 𝑛).
(b) Hence prove that
𝑛 𝑛 𝑛 𝑛 𝑛
+ + +⋯+ + = 𝑂(𝑛 log 𝑛).
1 2 3 𝑛−1 𝑛
6.16 E X E R C i S E S 231

15. Recall the sequence (𝑔𝑛 ∶ 𝑛 ∈ ℕ) from Exercise 5. Its 𝑛-th term has the closed-form
formula
𝑛
𝑔𝑛 =  log 𝑖.
𝑖=1

(a) Use Exercise 5 and (6.47) to show that

𝑔𝑛 ≤ 𝑛 log 𝑛 − 𝑛 + log 𝑛 + 𝑐

for some positive constant 𝑐.

(b) Hence show that


𝑔𝑛 = 𝑂(𝑛 log 𝑛).
(c) Use (a) to show that
𝑛 𝑛
𝑒𝑔𝑛 ≤ ⒧ ⒭ 𝑛𝑒𝑐 .
𝑒
(d) Prove that
𝑒𝑔𝑛 = 𝑛!.
(e) Hence show that
𝑛 𝑛
𝑛! = 𝑂 ⒧⒧ ⒭ 𝑛⒭ .
𝑒
7
N U M B E R T H E O RY

Numbers were arguably the first abstract objects on which people did computations.
They are so central to computation that, for centuries, the term “computation” was
assumed to refer to numerical computation. Although we now compute with many other
abstract objects too, numbers remain fundamental. They appear in most algorithms and
data structures in one way or another. They are essential to the design and analysis
of algorithms and data structures, even those that don’t themselves contain numbers.
Computational problems on numbers span the full range of difficulty, from very easy
to very difficult or even totally intractable. For some hard problems, their difficulty is
actually an asset, enabling them to be used to help keep information secure.
In this chapter we study the basics of number theory, both in a general way and as
a specific tool used in modern cryptography.

7.1𝛼 M U LT i P L E S A N D D i V i S O R S

An integer 𝑛 is a multiple of a positive integer 𝑑 if there is another integer 𝑞 such that


𝑛 = 𝑞𝑑. The set of all multiples of 𝑑 is denoted by 𝑑ℤ:

𝑑ℤ = {… … , −3𝑑, −2𝑑, −𝑑, 0, 𝑑, 2𝑑, 3𝑑, … …}.

So, for example, the set of even integers is denoted by 2ℤ.


When 𝑛 is a multiple of 𝑑, we also say that 𝑑 is a divisor or factor of 𝑛. We write

𝑑 ∣𝑛

to say, symbolically, that 𝑑 is a divisor of 𝑛.


For example, we can write that, for any integer 𝑛,

𝑛 is even ⟺ 2 ∣ 𝑛.

You would have seen these ideas previously, except perhaps for the notation 𝑑ℤ and
𝑑 ∣ 𝑛, and would have seen it illustrated using a picture of the number line, with the
length representing 𝑛 being divided into 𝑞 equal segments each of length 𝑑:

233
234 N U M B E R T H E O RY

0 𝑑 2𝑑 (𝑞 − 1)𝑑 𝑞𝑑 = 𝑛

If 𝑑 is not a divisor of 𝑛, then we can write 𝑑 ∤ 𝑛.


If 𝑑 is a divisor of integers 𝑚 and 𝑛, then it is also a divisor of their sum:

𝑑 ∣𝑚 ∧ 𝑑 ∣𝑛 ⟹ 𝑑 ∣ (𝑚 + 𝑛). (7.1)

This can be illustrated as follows.

𝑚 𝑛

0 𝑑 2𝑑 𝑚 𝑚+𝑑 𝑚+𝑛

The converse does not hold. For example, 2 ∣ (3 + 5) but 2 ∤ 3 and 2 ∤ 5.


A similar observation to (7.1) applies to differences:

𝑑 ∣𝑚 ∧ 𝑑 ∣𝑛 ⟹ 𝑑 ∣ (𝑚 − 𝑛). (7.2)

What about the product? If 𝑑 ∣ 𝑚 and 𝑑 ∣ 𝑛, does it follow that 𝑑 ∣ 𝑚𝑛?


One technical point about divisors is that every integer 𝑑 is a divisor of 0. This is
because 0 = 0𝑑, so the definition is satisfied even though it seems slightly bizarre. So,
at times, when dealing with divisors and multiples, we have to treat 0 as a special case.

For some positive integers 𝑑, there are simple tests you can use to determine if a
number 𝑛 is a multiple of 𝑑, using the decimal representation of 𝑛. For example

10 ∣ 𝑛 ⟺ the decimal representation of 𝑛 ends in 0.

You don’t even need to look at the rest of 𝑛; you only need to look at its rightmost (i.e.,
least significant) digit. Similarly,

5 ∣ 𝑛 ⟺ the decimal representation of 𝑛 ends in 0 or 5,


2 ∣ 𝑛 ⟺ the decimal representation of 𝑛 ends in one of 0,2,4,6,8
⟺ the last digit of the decimal representation of 𝑛 gives a multiple of 2,
4 ∣ 𝑛 ⟺ the last two digits of the decimal representation of 𝑛 give a multiple of 4.

We study these and other divisibility tests further in Exercise 9.


7.2𝛼 P R i M E N U M B E R S 235

There are other divisibility tests that require looking at all digits but which still save
time over actually doing the division. Some of these are considered in Exercise 10.
Divisibility tests along these lines are not restricted to the decimal number system.

2∣𝑛 ⟺ the binary representation of 𝑛 ends in 0.

Other divisibility tests for binary numbers are possible (Exercise 10).

7.2𝛼 PRiME NUMBERS

Every 𝑛 ∈ ℕ is a multiple of 1 and of itself. A divisor of 𝑛 is proper if it is neither 1


nor 𝑛. A prime number is a positive integer that is greater than 1 and has no proper
divisors. So the first few prime numbers are

2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, …

A number that is not prime is composite


composite.
We have already seen that there are infinitely many prime numbers: see Theorem 17.
When considering multiplication of positive integers, prime numbers are like “atoms”
in that they are indivisible (i.e., cannot be broken down further into a product of smaller
positive integers) and every positive integer can be “put together” from these atoms.

Theorem 27.
27 Every positive integer except 1 is a product of primes.

Proof. We prove, by induction on 𝑛, that for every positive integer 𝑛 ≥ 2 can be ex-
pressed as a product of primes.

Inductive basis:
If 𝑛 = 2 then 𝑛 is itself prime so it is a product of one prime, namely 2. So the claim
holds in this case.

Inductive Step:
Let 𝑛 > 2.
Assume that every 𝑘 ≤ 𝑛 is a product of primes. (This is our Inductive Hypothesis.)
Now consider 𝑛 + 1. We consider two cases, according to whether 𝑛 + 1 is prime or
composite.
Case 1: If 𝑛 + 1 is prime, then it is already a product of primes (with just one prime
in the product, namely 𝑛 + 1 itself).
Case 2: If 𝑛 + 1 is composite, then it has a factor 𝑎 which is neither 1 nor 𝑛 + 1.
Therefore 𝑏 = (𝑛 + 1)/𝑎 is also a positive integer, so it is also an integer factor of 𝑛 + 1,
so 𝑛 +1 = 𝑎𝑏 is the product of these two positive integers. Furthermore, 𝑏 is also neither
1 nor 𝑛 + 1 (because if 𝑏 = 1 then 𝑎 = 𝑛 + 1, and if 𝑏 = 𝑛 + 1 then 𝑎 = 1, so either way we
get a contradiction with what we already know about 𝑎).
236 N U M B E R T H E O RY

Since 𝑎 ≤ 𝑛, we can apply the Inductive Hypothesis to it and deduce that 𝑎 is a


product of primes.
Similarly, since 𝑏 ≤ 𝑛, the Inductive Hypothesis tells us that 𝑏 is also product of
primes.
So, both 𝑎 and 𝑏 are products of primes. Therefore their product 𝑛 = 𝑎𝑏 is also a
product of primes.
This completes the Inductive Step.

Conclusion: Therefore, by Mathematical Induction, the claim holds for all 𝑛.

So far, we have not ruled out the possibility that some positive integers can be
expressed as a product of primes in more than one way. But, in fact, that is impossible.
Every positive integer can be expressed uniquely as a product of primes. We prove this
later, in Theorem 33 in § 7.9.
The theory of prime numbers has been developed over thousands of years. It has
become one of the deepest and richest fields of mathematics. For most of its history,
it was regarded as being among the “purest” areas of pure mathematics, fascinating
and beautiful but very far removed from practical applications. The great Cambridge
mathematician G.H. Hardy, a major contributor to the theory of prime numbers in the
early 20th century, delighted in how “useless” his work was.1 Yet today prime numbers
are widely used in computer science and are central to information security. Whenever
you do a secure electronic transaction, its security is probably based on the theory of
prime numbers.

7.3 R E M A i N D E R S A N D T H E M O D O P E R AT i O N

If 𝑛 is a multiple of 𝑑, then we can divide 𝑛 by 𝑑 exactly, with nothing left over; the
remainder is 0. In that case, the quotient
𝑛
𝑞=
𝑑
is an integer, and we have
𝑛 = 𝑞𝑑.
If, however, 𝑛 is not a multiple of 𝑑, then dividing 𝑛 by 𝑑 does not give an integer.
In that case, there is a nonzero remainder, which is the part of 𝑛 that is left over after
subtracting from 𝑛 the highest multiple of 𝑑 we can, without going over.
Formally, the remainder after division of 𝑛 by 𝑑 is the positive integer 𝑟 that
satisfies
𝑛 = 𝑞𝑑 + 𝑟, 0 ≤ 𝑟 ≤ 𝑑 − 1. (7.3)

1 G. H. Hardy, A Mathematician’s Apology, Cambridge University Press, 1940.


7.3 R E M A i N D E R S A N D T H E M O D O P E R AT i O N 237

𝑞𝑑 𝑟

𝑛
0 𝑑 2𝑑 (𝑞 − 1)𝑑 𝑞𝑑 (𝑞 + 1)𝑑

The computation of the remainder, 𝑟, of 𝑛 after division by 𝑑 is sufficiently important


that it has its own binary operation notation. It is written

𝑛 mod 𝑑

and we read it as “𝑛 mod 𝑑” or “𝑛 modulo 𝑑”. Note that, just like other arithmetic
operations +, −, ×, /, this operation uses infix notation, with “mod” placed in between
its arguments.
We have, for example:

6 mod 4 = 2
7 mod 4 = 3
6 mod 2 = 0
7 mod 2 = 1
6 mod 5 = 1
144 mod 100 = 44
174 mod 48 = 30

This operation extends to negative 𝑛 too. We still require 0 ≤ 𝑟 ≤ 𝑑 − 1, and 𝑞 has


the property that it is as large as possible while still having 𝑞𝑑 ≤ 𝑛. So, for example,

−6 mod 5 = 4,

since the largest multiple of 5 which is ≤ −6 is −2 × 5 = −10 (with 𝑞 = −2), which gives
𝑟 = 4, since −10 + 4 = −6.
Although we allow negative 𝑛, we always require 𝑑 to be positive.
Programming languages have their own conventions for operations like mod. In
many languages, % is used for applying mod to positive integers, so that, for example,
6 % 4 returns 2. But the meaning of % when one or both arguments is negative can
vary across different languages or even different implementations of the same language.
So it cannot be assumed to always mean the same thing as the mathematical remainder
operation “mod”.
238 N U M B E R T H E O RY

If we calculate 𝑛/𝑑 exactly, it has integer part 𝑞 and fractional part 𝑟/𝑑 ∈ [0, 1):

𝑛 𝑟 𝑛 mod 𝑑
= 𝑞+ = 𝑞+ .
𝑑 𝑑 𝑑

7.4 PA R i T Y

Remainders modulo 2 are particularly simple and also particularly important. For any
𝑥 ∈ ℤ, its remainder mod 2 indicates whether 𝑥 is even or odd:

1, if 𝑥 is odd;
𝑥 mod 2 =  (7.4)
0, if 𝑥 is even.

In fact, we can regard it as the indicator function of the set of odd numbers.
The remainder 𝑥 mod 2 is called the parity of 𝑥. The term “parity” is also used
more descriptively, to mean the identification of whether an integer is even or odd.
The parity of the sum, difference, product or quotient of two integers is completely
determined by the parity of those two integers. The precise behaviour of arithmetic
operations with respect to parity is given by the following tables:
+ even odd × even odd
even even odd even even even
odd odd even odd even odd
For example, the sum (or difference) of two odd numbers is even, while their product
is odd. The table for subtraction has the same entries as for addition, while a table for
division would look the same as that for multiplication with the caveat that division by
zero is undefined.
Let us rewrite those tables with parities represented by 0 or 1, as in (7.4).
+ 0 1 × 0 1
0 0 1 0 0 0
1 1 0 1 0 1
Addition here is the same as ordinary addition except that 1 + 1 is not 2; instead, it is
2 mod 2, i.e., 0. Multiplication is exactly the same as ordinary integer multiplication
restricted to {0, 1}.
Let us lay these tables out differently, with headings along the top and each quan-
tity having its own column, to help make links with some concepts we have studied
previously.
𝑥 𝑦 (𝑥 + 𝑦) mod 2 𝑥 𝑦 𝑥𝑦 mod 2
0 0 0 0 0 0
0 1 1 0 1 0
1 0 1 1 0 0
1 1 0 1 1 1
7.5 T H E G R E AT E S T C O M M O N D i V i S O R 239

Do either of these tables look familiar? See if you can find tables in earlier chapters that
have the same patterns of entries, even if the entries themselves are different.
It is worth developing the skill of recognising when two structures are the same
except that things have been renamed. This skill is used throughout computer science
and mathematics, and has been formalised as the concept of isomorphism. We will
define and discuss isomorphism in one specific context later.
Now compare the left table above (for mod 2 addition) with the table on p. 129
(§ 4.11) for exclusive-or, ⊕. Compare the right table (for mod 2 multiplication) with
the table on p. 124 (§ 4.6) for conjunction, ∧.
We see, then, that the logical operations ⊕ and ∧, using truth values False and True,
may be treated as just addition and multiplication, respectively, modulo 2, using 0 and
1 to represent False and True respectively. This resolves a question we raised on p. 122
in § 4.1𝛼 . It enables us to relate logic to arithmetic, indeed it underpins the use of logic
to perform arithmetic in computers.

7.5 T H E G R E AT E S T C O M M O N D i V i S O R

The greatest common divisor (gcd) of two integers 𝑎 and 𝑏, written gcd(𝑎, 𝑏), is the
greatest integer 𝑑 such that 𝑑 ∣ 𝑎 and 𝑑 ∣ 𝑏.2 This is well defined provided 𝑎 and 𝑏 are
not both 0.3
If 𝑏 = 1 then clearly we have gcd(𝑎, 1) = 1.
More generally, if 𝑏 ∣ 𝑎 then gcd(𝑎, 𝑏) = 𝑏.
If 𝑝 is prime, then gcd(𝑎, 𝑝) = 1 unless 𝑎 is a multiple of 𝑝, say 𝑎 = 𝑘𝑝, in which case
we have gcd(𝑘𝑝, 𝑝) = 𝑝.
What if neither 𝑎 nor 𝑏 is prime and neither is a multiple of the other?
Consider 4 and 6. Neither is a multiple of the other, and each is a multiple of 2.
Also, no number > 2 is a divisor of either of them. So gcd(4, 6) = gcd(6, 4) = 2.
Now can we work out the gcd, in general, in a systematic way?
The key observation is the following.

Theorem 28.
28 For any integers 𝑎, 𝑏 that are not both 0,

gcd(𝑎, 𝑏) = gcd(𝑏, 𝑎 − 𝑏).

Proof. For any integers 𝑥, 𝑦, let CD(𝑥, 𝑦) be the set of their common divisors:

CD(𝑥, 𝑦) = {𝑑 ∶ 𝑑|𝑥 ∧ 𝑑|𝑦}.

2 This is also called the highest common factor (hcf),(hcf) but that term is less common these days. The gcd
of 𝑎 and 𝑏 is often just denoted by (𝑎, 𝑏), but this seems to be unnecessary overloading of standard ordered
pair notation, so we use gcd(𝑎, 𝑏).
3 If 𝑎 = 𝑏 = 0, then every integer is a divisor of 𝑎 and 𝑏, so they have no greatest common divisor.
240 N U M B E R T H E O RY

We show that CD(𝑎, 𝑏) = CD(𝑏, 𝑎−𝑏), from which the Theorem will follow, since identical
sets of integers have identical greatest elements (provided they do indeed have a greatest
element, which they do in this case).
First, we show CD(𝑎, 𝑏) ⊆ CD(𝑏, 𝑎 − 𝑏). Suppose 𝑑 ∈ CD(𝑎, 𝑏). Then 𝑑 ∣ 𝑎 and 𝑑 ∣ 𝑏.
By (7.2), this implies that 𝑑 ∣ (𝑎−𝑏). So 𝑑 ∣ 𝑏 and 𝑑 ∣ (𝑎−𝑏). So 𝑑 ∈ CD(𝑏, 𝑎−𝑏). So every
element of CD(𝑎, 𝑏) is also an element of CD(𝑏, 𝑎 −𝑏). Therefore CD(𝑎, 𝑏) ⊆ CD(𝑏, 𝑎 −𝑏).
Second, we show CD(𝑏, 𝑎 − 𝑏) ⊆ CD(𝑎, 𝑏). Suppose 𝑑 ∈ CD(𝑏, 𝑎 − 𝑏). Then 𝑑 ∣ 𝑎 − 𝑏
and 𝑑 ∣ 𝑏. By (7.1), this implies that 𝑑 ∣ ((𝑎 − 𝑏) + 𝑏). But (𝑎 − 𝑏) + 𝑏 = 𝑎. So 𝑑 ∣ 𝑎 and
𝑑 ∣ 𝑏. So 𝑑 ∈ CD(𝑎, 𝑏). So every element of CD(𝑏, 𝑎 − 𝑏) is also an element of CD(𝑎, 𝑏).
Therefore CD(𝑏, 𝑎 − 𝑏) ⊆ CD(𝑎, 𝑏).
We established the subset relation in both directions. So, in fact,

CD(𝑎, 𝑏) = CD(𝑏, 𝑎 − 𝑏).

The two sets are nonempty, since every integer has 1 as a divisor, so 1 belongs to
each set. They also have an upper bound, since divisors are bounded above by the sizes
of the numbers they are divisors of, i.e., 𝑑 ∣ 𝑛 ⟹ 𝑑 ≤ |𝑛|. The two sets therefore each
have a greatest element.4
Since these two sets are identical, and since they each do have a greatest element, we
deduce that their greatest elements are identical. Therefore gcd(𝑎, 𝑏) = gcd(𝑏, 𝑎 −𝑏).

It does not matter which way round we write 𝑎 and 𝑏 in gcd(𝑎, 𝑏), because

gcd(𝑎, 𝑏) = gcd(𝑏, 𝑎).

So the equation of Theorem 28 could have been written

gcd(𝑎, 𝑏) = gcd(𝑎 − 𝑏, 𝑏).

We will mostly adopt the habit of putting the larger number first, because that makes
the description of some of our algorithms neater. But the value of the gcd does not care
about the order of its arguments.
We illustrate the use of Theorem 28 in computing the gcd.

gcd(12, 20) = gcd(20, 12)


= gcd(12, 20 − 12) (by Theorem 28)
= gcd(12, 8)
= gcd(8, 12 − 8) (by Theorem 28)
= gcd(8, 4)
= 4.
4 Here we are using the fact that every set of integers that has some upper bound has a unique greatest
element. Note, though, that it is not the case that every set of real numbers that has some upper bound
has a greatest element. Nor does it hold for the rational numbers.
7.6 T H E E U C L i D E A N A L G O R i T H M 241

Here is gcd(12, 20) pictured on a number line.

𝑑
= gcd(𝑚, 𝑛)
𝑚 𝑛

0 4 12 20

7.6 THE EUCLiDEAN ALGORiTHM

If 𝑎, 𝑏 ∈ ℕ and 𝑎 > 𝑏 then Theorem 28 expresses gcd(𝑎, 𝑏) in terms of the gcd of a smaller
pair of positive integers. This can be used in a recursive algorithm for computing gcd.
We also need a base case, but this can be provided by some of the special cases we gave
just after defining the gcd on p. 239. So we have the following algorithm.

1. Input: 𝑎, 𝑏 ∈ ℕ

2. If 𝑎 < 𝑏 then swap them around (to ensure 𝑎 ≥ 𝑏):

new 𝑎 ∶= old 𝑏,
new 𝑏 ∶= old 𝑎.

3. If 𝑏 ∣ 𝑎 then Output 𝑏.

4. Output gcd(𝑏, 𝑎 − 𝑏).


242 N U M B E R T H E O RY

Trying it on gcd(12, 20) should give the calculation at the end of the previous section.
Then try a larger pair. Here is the calculation for gcd(48, 174) (glossing over some of the
swaps).

gcd(48, 174) = gcd(174, 48) (swapping, Step 2)


= gcd(48, 174 − 48) (using Step 4, which is based on Theorem 28)
= gcd(126, 48)
= gcd(126 − 48, 48) (Step 4)
= gcd(78, 48)
= gcd(48, 78 − 48) (Step 4)
= gcd(48, 30)
= gcd(30, 48 − 30) (Step 4)
= gcd(30, 18)
= gcd(18, 30 − 18) (Step 4)
= gcd(18, 12)
= gcd(12, 18 − 12) (Step 4)
= gcd(12, 6)
= 6 (finally reaching the base case, Step 3).

Consider the first few steps in this process. We repeatedly subtracted 48 from 174 until
the amount left over was < 48. But this is precisely the process of dividing by 48 a whole
number of times and determining the remainder. In this case, after three subtractions
of 48, we found that the remainder is 30. We can speed the algorithm up by doing this
remainder calculation directly instead of by repeated subtraction. We now revise the
algorithm to do this, with the modified part shown in blue.

Euclidean Algorithm (EA)

1. Input: 𝑎, 𝑏 ∈ ℕ

2. If 𝑎 < 𝑏 then swap them around (to ensure 𝑎 ≥ 𝑏):

new 𝑎 ∶= old 𝑏,
new 𝑏 ∶= old 𝑎.

3. If 𝑏 ∣ 𝑎 then Output 𝑏.

4. 𝑞 ∶= ⌊𝑎/𝑏⌋.

5. Output gcd(𝑏, 𝑎 − 𝑞𝑏).


7.7 T H E G C D A N D i N T E G E R L i N E A R C O M B i N AT i O N S 243

Note that, in the last step,

𝑎 − 𝑞𝑏 = 𝑎 mod 𝑏,

and 𝑞 plays no role except to work out 𝑎 mod 𝑏. So, in the last step, we could write the
output as gcd(𝑏, 𝑎 mod 𝑏). But we have made the role of 𝑞 explicit because it will play
a role later on, in an extension of this algorithm.
Here is the new computation for (48, 174).

gcd(48, 174) = gcd(174, 48)


= gcd(48, 174 mod 48) (Step 4, using remainder now)
= gcd(48, 30)
= gcd(30, 48 mod 30) (Step 4)
= gcd(30, 18)
= gcd(18, 30 mod 18) (Step 4)
= gcd(18, 12)
= gcd(12, 18 mod 12) (Step 4)
= gcd(12, 6)
= 6 (base case, Step 3).

The Euclidean Algorithm is over two thousand years old and has been described as
one of the oldest written mathematical algorithms.
When you meet a new algorithm, you should think about how many steps it takes, as
a function of the input. Play around with some examples and see if you can conjecture
some kind of bound on the number of steps required by the Euclidean Algorithm.

7.7 T H E G C D A N D i N T E G E R L i N E A R C O M B i N AT i O N S

We saw in (7.1) and (7.2) that, if 𝑑 ∣ 𝑚 and 𝑑 ∣ 𝑛 then 𝑑 ∣ 𝑚 + 𝑛 and 𝑑 ∣ 𝑚 − 𝑛. It follows


that, for all 𝑥, 𝑦 ∈ ℤ, we have
𝑑 ∣ 𝑥𝑚 + 𝑦𝑛.
The quantities 𝑥𝑚 + 𝑦𝑛, with 𝑥, 𝑦 ∈ ℤ, are called integer linear combinations of 𝑚
and 𝑛: “integer” because the coefficients 𝑥 and 𝑦 are integers, and “linear” because we
are just taking multiples of 𝑚 and 𝑛 without raising them to powers, taking roots, or
doing anything more exotic to them. We write 𝑚ℤ + 𝑛ℤ for the set of all integer linear
combinations of 𝑚 and 𝑛.
Conversely, if 𝑑 ∣ 𝑥𝑚 + 𝑦𝑛 for all 𝑥, 𝑦 ∈ ℤ, then it certainly holds for each of the
following special cases:

• 𝑥 = 1 and 𝑦 = 0, which means 𝑑 ∣ 1𝑚 + 0𝑛, i.e., 𝑑 ∣ 𝑚;


244 N U M B E R T H E O RY

• 𝑥 = 0 and 𝑦 = 1, which means 𝑑 ∣ 0𝑚 + 1𝑛, i.e., 𝑑 ∣ 𝑛.

So the set of common divisors of 𝑚 and 𝑛 is the same as the set of common divisors
of all combinations 𝑥𝑚 + 𝑦𝑛 with 𝑥, 𝑦 ∈ ℤ:

{𝑑 ∶ 𝑑|𝑚 ∧ 𝑑|𝑛} = {𝑑 ∶ ∀𝑥, 𝑦 ∈ ℤ 𝑑 ∣ 𝑥𝑚 + 𝑦𝑛}.

Since these two sets are identical, so are their largest elements. The largest element of
the left set is just gcd(𝑚, 𝑛), while the largest element of the right set is the gcd of all
integers 𝑥𝑚 + 𝑦𝑛 with 𝑥, 𝑦 ∈ ℤ. So these two gcds — one being just the gcd of two
numbers, the other being the gcd of the infinite set 𝑚ℤ + 𝑛ℤ — are equal:

gcd(𝑚, 𝑛) = gcd(𝑚ℤ + 𝑛ℤ).

Let us see how this pans out for an example. Consider the case 𝑚 = 12, 𝑛 = 20. At
the end of § 7.5, on p. 240, we worked out that gcd(12, 20) = 4 and pictured it on the
number line. In this case, what does the set of all integer linear combinations of 12 and
20 look like? For a start, we have

1 ⋅ 12 + 0 ⋅ 20 = 12,
0 ⋅ 12 + 1 ⋅ 20 = 20.

Other coefficients give other values, e.g.,

1 ⋅ 12 + 1 ⋅ 20 = 32,
−1 ⋅ 12 + 1 ⋅ 20 = 8,
2 ⋅ 12 − 1 ⋅ 20 = 4,
−2 ⋅ 12 + 1 ⋅ 20 = −4,
−5 ⋅ 12 + 3 ⋅ 20 = 0,
4 ⋅ 12 − 2 ⋅ 20 = 8,
−3 ⋅ 12 + 1 ⋅ 20 = −16.

In fact, in this case, the set 12ℤ + 20ℤ of all integer linear combinations of 12 and 20 is
the set of all multiples of 4 (including both positive and negative multiples, and zero).
Symbolically, we can write
12ℤ + 20ℤ = 4ℤ.
We illustrate this on the number line.

−5 ⋅ 12 + 3 ⋅ 20 2 ⋅ 12−1 ⋅ 20 −1 ⋅ 12 + 1 ⋅ 20 1 ⋅ 12 + 0 ⋅ 20 3 ⋅ 12−1 ⋅ 20 0 ⋅ 12 + 1 ⋅ 20

0 4 8 12 16 20
7.7 T H E G C D A N D i N T E G E R L i N E A R C O M B i N AT i O N S 245

In fact, every multiple of 4 arises as an integer linear combination of 12 and 20 in


multiple ways. We saw this above, with

−1 ⋅ 12 + 1 ⋅ 20 = 4 ⋅ 12 − 2 ⋅ 20 = 8.

This is actually an understatement: every multiple of 4 arises in infinitely many differ-


ent ways as an integer linear combination of 12 and 20. This follows from the fact that
0 itself is an integer linear combination of 12 and 20, as we saw above:

−5 ⋅ 12 + 3 ⋅ 20 = 0. (7.5)

We can take any equation expressing a multiple of 4 (like 8) as an integer linear combi-
nation of 12 and 20, such as
−1 ⋅ 12 + 1 ⋅ 20 = 8, (7.6)
and then add our equation for 0, namely (7.5), to get another integer linear combinations
for 8:
equation (7.6): −1 ⋅ 12 + 1 ⋅ 20 = 8
plus equation (7.5): −5 ⋅ 12 + 3 ⋅ 20 = 0
equals: −6 ⋅ 12 + 4 ⋅ 20 = 8
We can do this as many times as we like, to generate an infinite family of equations
expressing 8 as an integer linear combination of 12 and 20:

−1 ⋅ 12 + 1 ⋅ 20 = 8,
4 ⋅ 12 − 2 ⋅ 20 = 8,
9 ⋅ 12 − 5 ⋅ 20 = 8,
14 ⋅ 12 − 8 ⋅ 20 = 8,
⋮ ⋮ ⋮
−6 ⋅ 12 + 4 ⋅ 20 = 8,
−11 ⋅ 12 + 7 ⋅ 20 = 8,
⋮ ⋮ ⋮

The phenomena we have observed here are general, and not just specific to 12 and
20. Let 𝑚, 𝑛 be any positive integers, and consider the set 𝑚ℤ + 𝑛ℤ of all integers of
the form 𝑥𝑚 + 𝑦𝑛 for 𝑥, 𝑦 ∈ ℤ.
Firstly, 0 ∈ 𝑚ℤ + 𝑛ℤ. This can be seen by putting 𝑥 = 𝑛 and 𝑦 = −𝑚 so that

𝑥𝑚 + 𝑦𝑛 = 𝑛𝑚 − 𝑚𝑛 = 0.
246 N U M B E R T H E O RY

Secondly, the difference between any two members of 𝑚ℤ + 𝑛ℤ is also in 𝑚ℤ + 𝑛ℤ.


To see this, consider any two numbers in 𝑚ℤ + 𝑛ℤ, say 𝑥1 𝑚 + 𝑦1 𝑛 and 𝑥2 𝑚 + 𝑦2 𝑛, and
find their difference:

(𝑥1 𝑚 + 𝑦1 𝑛) − (𝑥2 𝑚 + 𝑦2 𝑛) = (𝑥1 − 𝑥2 )𝑚 + (𝑦1 − 𝑦2 )𝑛.

The coefficients 𝑥1 − 𝑥2 and 𝑦1 − 𝑦2 are both integers too, so the expression on the right,
(𝑥1 − 𝑥2 )𝑚 + (𝑦1 − 𝑦2 )𝑛, is of the same form, and is also a member of 𝑚ℤ + 𝑛ℤ.
Let Δ be the smallest positive difference between members of the set:

Δ ∶= min{𝑧1 − 𝑧2 ∶ 𝑧1 − 𝑧2 > 0, 𝑧1 , 𝑧2 ∈ 𝑚ℤ + 𝑛ℤ}.

Then we claim that 𝑚ℤ + 𝑛ℤ consists precisely of all the multiples of Δ.

Theorem 29.
29
𝑚ℤ + 𝑛ℤ = Δℤ.

Proof. This is an assertion of equality of two sets. So we divide the proof into two cases:
first, we show that 𝑚ℤ + 𝑛ℤ ⊇ Δℤ, and then we show that 𝑚ℤ + 𝑛ℤ ⊆ Δℤ.

(⊇)
To prove this superset relationship, we show that the right set, Δℤ, is a subset of
the left set, 𝑚ℤ + 𝑛ℤ. To do this, we take a general member of Δℤ and show that it
also belongs to 𝑚ℤ + 𝑛ℤ.
Let 𝑘Δ be any multiple of Δ, where 𝑘 ∈ ℤ.
Since Δ is an integer linear combination of 𝑚 and 𝑛 (as explained above), there exist
𝑥0 , 𝑦0 ∈ ℤ such that
Δ = 𝑥0 𝑚 + 𝑦0 𝑛.
Multiplying each side by 𝑘, we have

𝑘Δ = 𝑘𝑥0 𝑚 + 𝑘𝑦0 𝑛.

Since 𝑘𝑥0 and 𝑘𝑦0 are integers, this establishes that 𝑘Δ, too, is an integer linear combi-
nation of 𝑚 and 𝑛.

(⊆)
Now we take a general member of 𝑚ℤ + 𝑛ℤ and show that it also belongs to Δℤ.
Let 𝑥𝑚 + 𝑦𝑛 be an integer linear combination of 𝑚 and 𝑛.
Consider what happens when we divide 𝑥𝑚+𝑦𝑛 by Δ. The quotient 𝑞 and remainder
𝑟 are
𝑥𝑚 + 𝑦𝑛
𝑞 =  ,
Δ
𝑟 = (𝑥𝑚 + 𝑦𝑛) mod Δ.
7.8 T H E E X T E N D E D E U C L i D E A N A L G O R i T H M 247

• If 𝑟 = 0 then 𝑥𝑚 + 𝑦𝑛 is a multiple of Δ and we are done.

• If 𝑟 > 0 then 0 < 𝑟 < Δ and

𝑞Δ < 𝑥𝑚 + 𝑦𝑛 < 𝑞Δ + Δ.

Then 𝑞Δ and 𝑥𝑚 + 𝑦𝑛 differ by < Δ, yet they both belong to 𝑚ℤ + 𝑛ℤ. (We saw
earlier that Δ ∈ 𝑚ℤ + 𝑛ℤ and that therefore any integer multiple of it is also in
𝑚ℤ+𝑛ℤ.) So we have a contradiction, because Δ is the smallest possible difference
between two members of 𝑚ℤ + 𝑛ℤ. So this case, where 𝑥𝑚 + 𝑦𝑛 is not a multiple
of Δ, cannot arise.

We see therefore that Δ is a divisor of every member of 𝑚ℤ + 𝑛ℤ. It must also be


the greatest of all such divisors, because any number 𝐷 > Δ cannot be a divisor of Δ,
which is a member of 𝑚ℤ + 𝑛ℤ, and therefore 𝐷 cannot be a common divisor of the
entire set 𝑚ℤ + 𝑛ℤ.
This establishes that Δ is actually the gcd of 𝑚 and 𝑛. In other words, the gcd is
the smallest integer linear combination of 𝑚 and 𝑛 that is greater than 0.

Theorem 30.
30

gcd(𝑚, 𝑛) = the smallest positive value of 𝑥𝑚 + 𝑦𝑛 with 𝑥, 𝑦 ∈ ℤ.

As we will soon see, it is very important to be able to determine, for given 𝑚 and 𝑛,
two integers 𝑥, 𝑦 such that
gcd(𝑚, 𝑛) = 𝑥𝑚 + 𝑦𝑛. (7.7)
To help see how to do this, consider that, at the outset, we have 𝑚 = 1 ⋅ 𝑚 + 0 ⋅ 𝑛 and
𝑛 = 0 ⋅ 𝑚 + 1 ⋅ 𝑛. So, at the very beginning, we already have simple equations expressing
𝑚 and 𝑛 as integer linear combinations of 𝑚 and 𝑛. So if we can maintain equations
of this type throughout the Euclidean algorithm, then hopefully we can end up with an
appropriate equation (7.7). At this point, see if you can repeat one of our earlier gcd
calculations — gcd(12, 20), at the end of § 7.5 on p. 240, or gcd(48, 174) at the end of
§ 7.6 on p. 243 — while keeping track of equations of this type as you go.

7.8 THE EXTENDED EUCLiDEAN ALGORiTHM

To find such 𝑥 and 𝑦 such that

gcd(𝑚, 𝑛) = 𝑥𝑚 + 𝑦𝑛,
248 N U M B E R T H E O RY

it turns out to be enough to extend the Euclidean Algorithm so that it keeps track of
some extra information.

Extended Euclidean Algorithm (EEA)

1. Input: 𝑚, 𝑛 ∈ ℕ

2. If 𝑚 < 𝑛 then swap them around (to ensure 𝑚 ≥ 𝑛):

new 𝑚 ∶= old 𝑛,
new 𝑛 ∶= old 𝑚.

3. Initialise triples:

(𝑎, 𝑥, 𝑦) ∶= (𝑚, 1, 0),


(𝑏, 𝑧, 𝑤) ∶= (𝑛, 0, 1).

4. If 𝑏 = 0, Output (𝑎, 𝑥, 𝑦).

5. 𝑞 ∶= ⌊𝑎/𝑏⌋.

6. Update the triples:

new (𝑎, 𝑥, 𝑦) ∶= (𝑏, 𝑧, 𝑤),


new (𝑏, 𝑧, 𝑤) ∶= (𝑎, 𝑥, 𝑦) − 𝑞 ⋅ (𝑏, 𝑧, 𝑤),

where the right-hand sides here use the old values of (𝑎, 𝑥, 𝑦) and (𝑏, 𝑧, 𝑤).

7. Go back to Step 4.

In Step 6, we work out (𝑎, 𝑥, 𝑦) − 𝑞 ⋅ (𝑏, 𝑧, 𝑤) using vector algebra. In this case, this
means that we first multiply each member of (𝑏, 𝑧, 𝑤) by 𝑞 and then subtract each
member of the resulting triple from the corresponding member of (𝑎, 𝑥, 𝑦). The result
is (𝑎 − 𝑞𝑏, 𝑥 − 𝑞𝑧, 𝑦 − 𝑞𝑤).
At each step of the Extended Euclidean Algorithm, the triples (𝑎, 𝑥, 𝑦) and (𝑏, 𝑧, 𝑤)
satisfy 𝑎 = 𝑥𝑚 + 𝑦𝑛 and 𝑏 = 𝑧𝑚 + 𝑤𝑛. It is easy to see that this holds for the initial
triples (𝑚, 1, 0) and (𝑛, 0, 1). It is not difficult to show that the property is preserved
by the updating in Step 6. It therefore holds for all triples used in the algorithm. It is
intended that, when the algorithm stops, 𝑎 = gcd(𝑚, 𝑛), so the final 𝑎-triple will give us
all the information we seek.
The final 𝑏-triple will have 𝑏 = 0, so it does not give the gcd; we already have that
from the final 𝑎-triple. But the final 𝑏-triple should still be calculated, when doing this
manually, as a check. It is usually easy to check that the final 𝑏-triple satisfies

0 = 𝑧𝑚 + 𝑤𝑛.
7.8 T H E E X T E N D E D E U C L i D E A N A L G O R i T H M 249

If this does not hold, then the earlier steps should be re-checked to identify the mistake.
Although there is more to do in the Extended Euclidean Algorithm than in the
original Euclidean Algorithm, the extra work is really just bookkeeping. We keep track,
in the triples, of exactly how the first member of each triple can be made up as an integer
linear combination of 𝑚 and 𝑛. But the decisions that we make, in the EEA, and the
calculations that we do with the first member of each triple, are exactly the same in the
two algorithms. So the EEA is really just the ordinary Euclidean Algorithm with extra
accounting tasks.
Here is an example of using the Extended Euclidean Algorithm to compute gcd(27, 40)
and also express it as an integer linear combination of its arguments. As we work out the
triples, we write them underneath each other, forming a table with three columns. The
left column has the numbers driving the calculation, the middle column has the value
of 𝑥, and the right column has the value of 𝑦. Each row (𝑡, 𝑥, 𝑦) satisfies 𝑡 = 𝑎𝑥 + 𝑏𝑦.
At each step, we do the appropriate integer division of the left numbers in the previous
two rows to work out which multiple of the previous row has to be subtracted from the
row above it.5

40 1 0
27 0 1
13 1 −1 (take previous row from the one above it)
1 −2 3 (take twice previous row from the one above it, since ⌊27/13⌋ = 2)
0 27 −40 (take 13× previous row from the one above it)

We could have stopped this calculation as soon as we obtained a row starting with 1,
since no gcd can be < 1. We continued the calculation one row further as a check, since
if our calculations are correct, we end up with a row consisting of 0 followed by the two
numbers 𝑥, 𝑦 such that 𝑎𝑥 + 𝑏𝑦 = 0, and this is usually easily checked. In the special
case when the gcd is 1 (as here), the row staring with 0 contains the two numbers we
started with, but with one of those negated. Note also the zig-zag pattern of negative
numbers going down the last two columns. Checking that these patterns are followed is
a handy way to pick up errors in manual calculations.
In this case, the calculation is correct and we find that

gcd(27, 40) = 1,

and the algorithm also expresses this as an integer linear combination (from the row
starting with 1):
1 = −2 ⋅ 40 + 3 ⋅ 27.

5 If you’ve done row operations in matrices, then you’ll recognise that this is a similar process.
250 N U M B E R T H E O RY

7.9 COPRiMALiTY

Two integers 𝑎 and 𝑏 are coprime


coprime, or relatively prime,
prime if gcd(𝑎, 𝑏) = 1. In other words,
they are coprime if they have no factors in common apart from 1.
Note that coprimality is actually quite a different concept to primality, despite the
similarity in name. Two integers can be coprime without either of them being prime.
For example, 21 and 25 are coprime, but neither is prime. On the other hand, if 𝑎 is
prime, then 𝑎 and 𝑏 are coprime unless 𝑏 is a multiple of 𝑎.
To determine if two integers are coprime, use Euclid’s algorithm to determine if their
gcd is 1. Again, this is different to the situation with ordinary primality, which is not
so easy to test.
We saw in § 7.7, and in particular in Theorem 30, that the gcd has a characterisation
in terms of integer linear combinations. In particular, gcd(𝑚, 𝑛) is the smallest positive
member of 𝑚ℤ + 𝑛ℤ. This enables a characterisation of coprimality.

Theorem 31.
31 Integers 𝑚 and 𝑛 are coprime if and only if there exist 𝑥, 𝑦 ∈ ℤ such that

𝑥𝑚 + 𝑦𝑛 = 1.

Proof. Let 𝑚, 𝑛 ∈ ℤ. By definition, they are coprime if and only if their gcd is 1. By
Theorem 30, this in turn is true if and only if 1 is the smallest positive member of
𝑚ℤ + 𝑛ℤ. But there is no smaller positive integer than 1, so 1 is the smallest positive
member of 𝑚ℤ + 𝑛ℤ if and only if 1 ∈ 𝑚ℤ + 𝑛ℤ, which is another way of saying that
there exist 𝑥, 𝑦 ∈ ℤ such that 𝑥𝑚 + 𝑦𝑛 = 1.

For example, consider 21 and 25, which are coprime, as we noted above. So there
must exist 𝑥, 𝑦 ∈ ℤ such that 𝑥 ⋅ 21 + 𝑦 ⋅ 25 = 1. We could use 𝑥 = 6 and 𝑦 = −5, since

6 ⋅ 21 − 5 ⋅ 25 = 126 − 125 = 1.

Theorem 31 has many consequences.


Firstly, when a prime divides a product, then it must divide at least one of the
factors in the product.

Theorem 32.
32 Let 𝑝 be a prime and let 𝑎 and 𝑏 be integers. Then

𝑝 ∣ 𝑎𝑏 ⟹ 𝑝 ∣ 𝑎 ∨ 𝑝 ∣ 𝑏.

Proof. Let 𝑝, 𝑎 and 𝑏 be as in the statement of the theorem. Assume that 𝑝 ∣ 𝑎𝑏.
If 𝑝 ∣ 𝑎 then we are done already. So, suppose that 𝑝 ∤ 𝑎. Since 𝑝 is prime, this means
that 𝑝 and 𝑎 must be coprime (since a prime is coprime to every positive integer except
its own multiples).
Since 𝑝 and 𝑎 are coprime, there must exist 𝑥, 𝑦 ∈ ℤ such that

𝑥𝑝 + 𝑦𝑎 = 1,
7.9 C O P R i M A L i T Y 251

by Theorem 31. Multiplying each side by 𝑏, we have

𝑥𝑝𝑏 + 𝑦𝑎𝑏 = 𝑏.

Consider the two summands on the left. The first summand, 𝑥𝑝𝑏, is clearly a multiple
of 𝑝 because it has 𝑝 as a factor. The second summand, 𝑦𝑎𝑏, is a multiple of 𝑎𝑏, but by
our assumption, 𝑝 ∣ 𝑎𝑏, so 𝑎𝑏 is also a multiple of 𝑝, so in fact the second summand is a
multiple of 𝑝 too. So, both summands on the left are multiples of 𝑝. So their sum is a
multiple of 𝑝 too. So the equation shows that 𝑏 equals a multiple of 𝑝.

For example, the fact that 3 ∣ (8×9) implies that 3 ∣ 8 or 3 ∣ 9; in this case, it happens
that 3 ∣ 9 (but 3 ∤ 8).
The theorem won’t work, in general, if 𝑝 is not prime, though. For example, we
know that 6 ∣ (8 × 9), because 8 × 9 = 72 and we know that 6 ∣ 72 (because 6 × 12 = 72).
But we do not have 6 ∣ 8 or 6 ∣ 9; in fact, 6 ∤ 8 and 6 ∤ 9.
We are now in a position to keep a promise made on p. 236, after proving Theorem 27,
where we showed that every integer can be expressed as a product of primes. There, we
stated that we would later show that this product of primes is always unique. We now
prove this.

Theorem 33 (Fundamental Theorem of Arithmetic).


Arithmetic) Every positive integer has
a unique factorisation as a product of primes.

Proof. We know from Theorem 27 that every positive integer can be written as a product
of primes. It remains to show that this can be done in only one way.
Assume, by way of contradiction, that there exists a positive integer 𝑛 which can be
written in two different ways as products of primes.
Among all positive integers 𝑛 that can be written as a product of primes in two
different ways, let’s choose the smallest.
Let the primes that appear in one or both of these two products be 𝑝1 , 𝑝2 , … , 𝑝𝑘 .
Then we may suppose that the two different ways of writing 𝑛 as a product of primes
are
𝑒 𝑒 𝑒
𝑛 = 𝑝11 𝑝22 ⋯ 𝑝𝑘𝑘 , (7.8)
𝑓 𝑓 𝑓
𝑛 = 𝑝1 1 𝑝2 2 ⋯ 𝑝𝑘 𝑘 , (7.9)

where 𝑒𝑖 ∈ ℕ0 and 𝑓𝑖 ∈ ℕ0 for each 𝑖 ∈ {1, 2, … , 𝑘}. We may assume that, for each 𝑖, the
𝑒 𝑓
two exponents 𝑒𝑖 and 𝑓𝑖 are not both 0, since if they were then 𝑝𝑖 𝑖 = 𝑝𝑖 𝑖 = 1, so the prime
𝑝𝑖 appears in neither of the two products, so we shouldn’t have included it in the list of
primes appearing in the products.
Since the two products are different, there must be at least one 𝑖 ∈ {1, 2, … , 𝑘} such
that 𝑒𝑖 ≠ 𝑓𝑖 .
Now, suppose one of the primes, say 𝑝𝑗 , appears with positive exponents in each
product: 𝑒𝑗 > 0 and 𝑓𝑗 > 0. (We may have 𝑗 = 𝑖 or 𝑗 ≠ 𝑖; that doesn’t matter.) We
252 N U M B E R T H E O RY

will use this prime 𝑝𝑗 to obtain, from 𝑛, a smaller number that can be expressed as a
product of primes in two different ways, contradicting our earlier assumption that 𝑛 is
the smallest number of this type.
Define
𝑔𝑗 ∶= min{𝑒𝑗 , 𝑓𝑗 },
and note that 𝑔𝑗 > 0 too (since 𝑒𝑗 , 𝑓𝑗 are both > 0).
𝑔 𝑔
We can divide each side of (7.8) by 𝑝𝑗 𝑗 to get an expression for 𝑛/𝑝𝑗 𝑗 as a product
𝑔
of all the primes in the same list. Similarly, we can divide each side of (7.9) by 𝑝𝑗 𝑗 to
𝑔
get another expression for 𝑛/𝑝𝑗 𝑗 as a product using the same primes.

• Now, if 𝑒𝑗 ≠ 𝑓𝑗 , then the prime 𝑝𝑗 still has different exponents in the two products
(but the two exponents of 𝑝𝑗 have each been reduced by 𝑔𝑗 ).

• On the other hand, if 𝑒𝑗 = 𝑓𝑗 , then we know there is another prime 𝑝𝑖 whose


𝑒 𝑓
exponents are different (𝑒𝑖 ≠ 𝑓𝑖 ), and the two products still have 𝑝𝑖 𝑖 and 𝑝𝑖 𝑖 , so
again we have a prime with different exponents in the two products.

𝑔
So, either way, the two expressions for 𝑛/𝑝𝑗 𝑗 each have some prime 𝑝𝑖 with different
𝑔
exponents in the two expressions. So we have two different expressions for 𝑛/𝑝𝑗 𝑗 as
𝑔 𝑔
products of primes. Also, 𝑔𝑗 > 0 implies 𝑝𝑗 𝑗 > 1, which in turn implies 𝑛/𝑝𝑗 𝑗 < 𝑛.
The upshot of this is that we now have a smaller positive integer that can be written
as a product of primes in two different ways. This contradicts our choice of 𝑛 as the
smallest positive integer with this property.
So our assumption, that one of the primes appears with a positive exponent in both
the products in (7.8) and (7.9), is wrong. This, together with the fact that we earlier
ruled out a prime having exponent 0 in both products, implies that every prime in our
list appears with a positive exponent in one product and zero exponent in the other
product. In other words, the two products use entirely separate sets of primes.
Let 𝑝 be any prime appearing (with positive exponent) in the first expression for 𝑛,
in (7.8). Let us write 𝑞1 , 𝑞2 , … , 𝑞𝑙 for the primes that appear (with positive exponent) in
the second product, and let their exponents be ℎ1 , ℎ2 , … , ℎ𝑙 , respectively, so
ℎ ℎ ℎ
𝑛 = 𝑞1 1 𝑞2 2 ⋯ 𝑞𝑙 𝑙 .

We know now, from the reasoning in the previous paragraph, that 𝑝 ∉ {𝑞1 , 𝑞2 , … , 𝑞𝑙 },
(because every prime appearing in the first product, in (7.8), does not appear at all in
the second product, in (7.9)).
But, since 𝑝 appears in the first product, we know that 𝑝 ∣ 𝑛. By Theorem 32, this
implies that 𝑝 divides at least one of 𝑞1 , 𝑞2 , … , 𝑞𝑙 . But one prime cannot divide another
prime unless they are the same prime. So 𝑝 must equal one of 𝑞1 , 𝑞2 , … , 𝑞𝑙 . This is a
contradiction (see the end of the previous paragraph).
7.10 M O D U L A R A R i T H M E T i C 253

So our initial assumption, that there exists a positive integer that can be written as
a product of two different ways, was wrong. Therefore every positive integer can only
be written as a product of primes in one way.

We will see further consequences of Theorem 31 in later sections, including applica-


tions in cryptography.

7.10 MODULAR ARiTHMETiC

We discussed remainders and the mod operation in § 7.3. There are many situations
where we are mainly interested in remainders, and all our calculations are done with
them. For example:
• The seven days of the week can be regarded as names for the remainders, modulo
7, of the number of days since some reference date in the past.

• The hours in 24-hour clock time are remainders, modulo 24, of the number of
hours since some reference time.

• When you press a toggle switch (which turns it on if it was off, and turns it off if
it was already on), then the state of the switch (using 0 for “off” and 1 for “on”)
is the remainder, modulo 2, of the number of times the switch has been pressed
since it was last known to be off.

• If you are playing Monopoly, then (ignoring special situations that cause sudden
jumps to other parts of the board, such as going to Jail or certain Chance and
Community Chest cards) your position on the board (which is a square circuit of
40 positions) is the remainder modulo 40 of the sum of your dice throws so far.

• If you are playing a game where you move around a rectangular display with
“wrap-around”, then whenever you change your coordinates by some amount, the
new coordinates are obtained from the old ones by taking remainders modulo the
appropriate dimensions of the display.

• The last digit of a nonnegative decimal number is its remainder modulo 10. The
last bit of a binary number is its remainder modulo 2.

• Sometimes, calculations with very large numbers can be checked using calculations
with some appropriate remainders of those numbers. Remainders are smaller, and
in general do not contain all the information that the original numbers contain,
so calculations with them cannot tell you everything about the larger numbers,
and they won’t detect every possible error. But, used well, they can detect some
common errors at modest cost.

• As we shall see, calculations with remainders underpin many cryptosystems includ-


ing the most widely used ones today.
254 N U M B E R T H E O RY

Let 𝑛 be a positive integer. Two integers 𝑎 and 𝑏 are said to be congruent modulo
𝑛, or congruent mod 𝑛 for short, if they differ by a multiple of 𝑛:

∃𝑘 ∈ ℤ 𝑎 − 𝑏 = 𝑘𝑛.

We can characterise congruence modulo 𝑛 in terms of remainders.


Theorem 34.34 𝑎 and 𝑏 are congruent modulo 𝑛 if and only if their remainders modulo
𝑛 are identical.
Proof. Let us find the quotients and remainders of 𝑎 and 𝑏 after division by 𝑛:

𝑎 = 𝑞𝑎 𝑛 + 𝑟 𝑎 , 𝑞𝑎 ∈ ℤ, 0 ≤ 𝑟𝑎 ≤ 𝑛 − 1;
𝑏 = 𝑞𝑏 𝑛 + 𝑟 𝑏 , 𝑞𝑏 ∈ ℤ, 0 ≤ 𝑟𝑏 ≤ 𝑛 − 1.

Then the difference 𝑎 − 𝑏 is given by

𝑎 − 𝑏 = (𝑞𝑎 − 𝑞𝑏 )𝑛 + (𝑟𝑎 − 𝑟𝑏 ).

Since (𝑞𝑎 − 𝑞𝑏 )𝑛 is already a multiple of 𝑛, this shows that 𝑎 − 𝑏 is a multiple of 𝑛 if and


only if 𝑟𝑎 − 𝑟𝑏 is a multiple of 𝑛. But 𝑟𝑎 − 𝑟𝑏 is a difference between two nonnegative
numbers that are each < 𝑛, and such a difference cannot be ≥ 𝑛. So the only way 𝑟𝑎 −𝑟𝑏
can be a multiple of 𝑛 is if it is actually 0. So the difference 𝑎 − 𝑏 is a multiple of 𝑛 if
and only if their remainders modulo 𝑛 are identical.

When 𝑎 and 𝑏 are congruent modulo 𝑛, we write

𝑎≡𝑏 (mod 𝑛).

Note that the use of “mod” here is different (but related) to the way we used it in § 7.3.
There, it was a binary operation, with 𝑎 mod 𝑛 giving the remainder of 𝑎 after division
by 𝑛. Now, used in parenthesis after an equation, it is no longer a binary operation;
rather, it signifies that ≡ means congruence mod 𝑛 (for the 𝑛 specified after “mod”).
But the two uses of “mod” are closely related, since Theorem 34 tells us that

𝑎 ≡ 𝑏 (mod 𝑛) if and only if 𝑎 mod 𝑛 = 𝑏 mod 𝑛.

Every integer 𝑎 belongs to a family of all those integers that differ from 𝑎 by a
multiple of 𝑛, or in other words, all those integers that have the same remainder as 𝑎
after division by 𝑛. This family may be denoted by [𝑎], provided the modulus 𝑛 is clear
from the context. It may also be denoted by 𝑎 + 𝑛ℤ, meaning the set of all numbers
that can be obtained from 𝑎 by adding an integer multiple of 𝑛. So

[𝑎] = 𝑎+𝑛ℤ = {𝑎+𝑘𝑛 ∶ 𝑘 ∈ ℤ} = {… … , 𝑎−3𝑛, 𝑎−2𝑛, 𝑎−𝑛, 𝑎, 𝑎+𝑛, 𝑎+2𝑛, .𝑎+3𝑛, … …}.

It is useful to describe these families using our knowledge of relations from Chapter 2.
7.10 M O D U L A R A R i T H M E T i C 255

Congruence modulo 𝑛 is a binary relation on ℤ, so the discussion of § 2.13–§ 2.14 is


applicable. Now recall our discussion of equivalence relations in § 2.16.
Theorem 35.
35 For every 𝑛 ∈ ℕ, congruence modulo 𝑛 is an equivalence relation on ℤ.
Proof. We show that congruence modulo 𝑛 is reflexive, symmetric and transitive.

Reflexive:
For any 𝑎 ∈ ℤ, we have
𝑎 − 𝑎 = 0 = 0𝑛,
so
𝑎 ≡ 𝑎 (mod 𝑛).
Symmetric:
Suppose 𝑎, 𝑏 ∈ ℤ are congruent modulo 𝑛:

𝑎 ≡ 𝑏 (mod 𝑛).

Then, by definition of congruence, there exists 𝑘 ∈ ℤ such that

𝑎 − 𝑏 = 𝑘𝑛.

Negating each side, we have


𝑏 − 𝑎 = −𝑘𝑛.
Since −𝑘 ∈ ℤ, this tells us that

𝑏 ≡ 𝑎 (mod 𝑛).

Transitive:
Suppose 𝑎, 𝑏, 𝑐 ∈ ℤ satisfy

𝑎 ≡ 𝑏 (mod 𝑛),
𝑏 ≡ 𝑐 (mod 𝑛).

Then, by definition, there exist 𝑘, 𝑙 ∈ ℤ such that

𝑎 − 𝑏 = 𝑘𝑛,
𝑏 − 𝑐 = 𝑙𝑛.

Adding these equations, we have

(𝑎 − 𝑏) + (𝑏 − 𝑐) = 𝑘𝑛 + 𝑙𝑛.

Simplifying gives
𝑎 − 𝑐 = (𝑘 + 𝑙)𝑛.
256 N U M B E R T H E O RY

Since 𝑘 + 𝑙 ∈ ℤ, this tells us that

𝑎 ≡ 𝑐 (mod 𝑛).

Since congruence modulo 𝑛 is an equivalence relation, it has equivalence classes.


These are precisely the families [𝑎] which we introduced above. We call them equivalence
classes modulo 𝑛, dropping the “modulo 𝑛” when that is clear from the context, or
congruence classes,
classes or residue classes.
classes
These equivalence classes partition the integers (as we would expect from Theo-
rem 12). Since there are 𝑛 different remainders modulo 𝑛 (namely 0, 1, … , 𝑛 − 1), there
are exactly 𝑛 different equivalence classes, namely [0], [1], … , [𝑛 −1]. Each of these equiv-
alence classes can be written using others of their members, for example [0] is the same
class as [𝑛] and [2𝑛] and [−𝑛], and so on. But it is especially convenient to have each
class represented, in this notation, by its unique member in the range 0, 1, … , 𝑛 − 1.
We can do arithmetic with these equivalence classes. Suppose 𝑎, 𝑏 ∈ ℤ. Then the
sum [𝑎]+[𝑏] is the set containing all sums formed by adding a member of 𝑎 to a member
of 𝑏. This may sound complicated, because we have infinitely many sums to form! But
it turns out that this is in fact another equivalence class:

[𝑎] + [𝑏] = {𝑎 + 𝑘𝑛 ∶ 𝑘 ∈ ℤ} + {𝑏 + 𝑙𝑛 ∶ 𝑙 ∈ ℤ}
= {𝑎 + 𝑘𝑛 + 𝑏 + 𝑙𝑛 ∶ 𝑘, 𝑙 ∈ ℤ}
= {𝑎 + 𝑏 + (𝑘 + 𝑙)𝑛 ∶ 𝑘, 𝑙 ∈ ℤ}
= {𝑎 + 𝑏 + ℎ𝑛 ∶ ℎ ∈ ℤ}, (since 𝑘 + 𝑙 ranges over all integers)
= [𝑎 + 𝑏].

We see, in fact, that when adding two equivalence classes [𝑎] and [𝑏] modulo 𝑛, we don’t
actually need to work out all possible sums; instead, we can take just one representative
from each class, say 𝑎 and 𝑏, and just add those two together (doing just one addition,
instead of infinitely many), and the resulting equivalence class is just the class [𝑎 + 𝑏]
that the sum 𝑎 + 𝑏 belongs to.
Similarly, it can be shown that

[𝑎] − [𝑏] = [𝑎 − 𝑏].

For multiplication, we can’t say quite as much, but we do have

[𝑎] ⋅ [𝑏] ⊆ [𝑎𝑏].

This is enough, for our purposes, since it means that, if we take any representatives
𝑥 ∈ [𝑎] and 𝑦 ∈ [𝑏], then their product 𝑥𝑦 belongs to [𝑎𝑏], so 𝑥𝑦 ≡ 𝑎𝑏 (mod 𝑛).
7.10 M O D U L A R A R i T H M E T i C 257

The question of division is more complex, partly because the integers are not closed
under division. We return to this later.
Our observations so far show that, when doing arithmetic mod 𝑛, we can use any
representative of each number’s equivalence class (at least for addition, subtraction and
multiplication). It is often most convenient to work with the representatives 0, 1, … , 𝑛−1,
since they seem to be the simplest possible representatives and they are the possible
values of remainders modulo 𝑛.
We write ℤ𝑛 for the set {0, 1, … , 𝑛 − 1} endowed with addition, subtraction and
multiplication but with all operations modified so that, whenever a number outside this
set is produced, its remainder modulo 𝑛 is used instead.
For example, consider ℤ4 , which uses the set {0,1,2,3}. To add 2 and 3, we start by
doing so in the usual way, obtaining 5. But this is not in the set, so we replace it by its
remainder modulo 4, which is 1. So,

2+3 = 1 in ℤ4 . (7.10)

This mirrors the fact that


2+3 ≡ 1 (mod 4). (7.11)
But these two statements are not saying exactly the same thing. The first statement,
(7.10), is an equation in ℤ4 , and 2+3 cannot equal anything else in that system! But
the second statement, (7.11), is about congruence modulo 4, and states that 2 + 3 and 1
belong to the same equivalence class, but that equivalence class has many other mem-
bers too. We could write (for example) 2 + 3 ≡ 37 (mod 4), but we would not write
“2 + 3 = 37 in ℤ4 ” because 37 is not even a member of ℤ4 .

The main message of this section is that, if we want to do a calculation with integers
involving only addition, subtraction and multiplication, and if we only want a remainder
at the end, then we can use remainders throughout. This simplifies the calculations a
lot.
For example, suppose you want to calculate

⒧2360679774 − (7320508 + 41421356) ∗ 109⒭ mod 10 (7.12)

Then, instead of doing working this out using ordinary arithmetic with these quite large
integers and then finding the reminder at the end, we can take remainders as we go.
First, we take the remainders mod 10 of each number in the expression (7.12):

2360679774 mod 10 = 4,
7320508 mod 10 = 8,
41421356 mod 10 = 6,
109 mod 10 = 9.
258 N U M B E R T H E O RY

Replacing each number in the expression (7.12) by its remainder mod 10 gives the
expression
⒧4 − (8 + 6) ∗ 9⒭ mod 10
The first calculation is 8 + 6 = 14, but we take remainders as we go, so we actually
calculate
(8 + 6) mod 10 = 14 mod 10 = 4.
We keep doing these reductions. The full calculation is

⒧2360679774 − (7320508 + 41421356) × 109⒭ mod 10 = ⒧4 − (8 + 6) × 9⒭ mod 10


= ⒧4 − (8 + 6 mod 10) × 9⒭ mod 10
= ⒧4 − (14 mod 10) × 9⒭ mod 10
= (4 − 4 × 9) mod 10
= ⒧4 − (4 × 9 mod 10)⒭ mod 10
= ⒧4 − (36 mod 10)⒭ mod 10
= (4 − 6) mod 10
= −2 mod 10
= 8.

We used remainders mod 10 in this example so that the remainders were clear throughout,
using the fact that the remainder mod 10 is always just the final decimal digit. But the
same principle holds for other moduli. For example,

⒧(705 + 48) × 150 − 22⒭ mod 7 = ⒧(5 + 6) × 3 − 1⒭ mod 7


(replacing each number by its remainder mod 7)
= ⒧((5 + 6) mod 7) × 3 − 1⒭ mod 7
= ⒧(11 mod 7) × 3 − 1⒭ mod 7
= ⒧4 × 3 − 1⒭ mod 7
= ⒧((4 × 3) mod 7) − 1⒭ mod 7
= ⒧(12 mod 7) − 1⒭ mod 7
= (5 − 1) mod 7
= 4.

One application of doing calculation with remainders is as a check on ordinary integer


calculations. If the final integer from a calculation does not have the same remainder
as the one you get from doing the calculation entirely with remainders, then there must
have been a mistake in the calculation. For example, the calculation “84 × 32 = 2678”
cannot be right, because

(84×32) mod 9 = ((84 mod 9)×(32 mod 9)) mod 9 = (3×5) mod 9 = 15 mod 9 = 6,
7.11 M O D U L A R i N V E R S E S 259

but
2678 mod 9 = 5.
This check is one-sided in the sense that a failure of the check (as in this example)
indicates an error in the calculation but the converse does not hold. Not all errors in
integer calculations will be detected this check. Furthermore, if the check detects an
error, it does not tell you how to fix it.
Our choice of 9 as the modulus here was not random! The use of calculations mod 9
to check integer calculations is known as “casting out nines” and has a long history. We
study this method further in Exercise 11.

Although modular arithmetic works very smoothly for addition, subtraction and mul-
tiplication, there are some complications when doing division and when using exponents.
We consider division in the next section (§ 7.11) and exponents in § 7.14.

7.11 MODULAR iNVERSES

We omitted division from the our list of operations that you can do in ℤ𝑛 . Division is
the opposite of multiplication. In ordinary arithmetic, finding 𝑥/𝑦 is the same as finding
the product 𝑥𝑦 −1 . Here 𝑦 −1 is the multiplicative inverse of 𝑦, which is defined by the
equation
𝑦𝑦 −1 = 1.
When dealing with ordinary real numbers, we call 𝑦 −1 the reciprocal of 𝑦, and it exists
for any nonzero real number.
Now consider inverses in ℤ𝑛 . Suppose firstly that 𝑛 = 7: ℤ7 = {0, 1, 2, 3, 4, 5, 6}. What
is the inverse of, for example, 2? Since 2 × 4 = 8 ≡ 1 (mod 7), we have 2−1 = 4 in ℤ7 . It
can also be seen that: 1−1 = 1, 3−1 = 5, 4−1 = 2, 5−1 = 3, 6−1 = 6. In fact, everything in
ℤ7 except 0 has an inverse.
Things are not always so convenient.
Consider ℤ6 = {0, 1, 2, 3, 4, 5}. Which of these elements have inverses? Zero, of course,
never does, and 1 is always its own inverse. What are the inverses for 2, 3, 4, 5? Suppose
we try to find the inverse of 2. We want a number 𝑧 = 2−1 such that 2𝑧 ≡ 1 (mod 6)
— in other words, such that 2𝑧 is one plus a multiple of 6, i.e., 1, 7, 13, 19, …This is
impossible. So 2 has no inverse in ℤ6 . Neither does 3 or 4. It can be seen that 5 is its
own inverse, since 5 × 5 = 25 ≡ 1 (mod 6).
In fact, we can characterise those members of ℤ𝑛 that have inverses.
Theorem 36.
36 A positive integer 𝑥 ∈ ℤ𝑛 has an inverse in ℤ𝑛 if and only if 𝑥 and 𝑛 are
coprime.
Proof. Let 𝑥 ∈ ℤ𝑛 .
If 𝑥 has an inverse 𝑥−1 , then 𝑥𝑥−1 = 1 in ℤ𝑛 . Therefore

∃𝑘 ∈ ℤ 𝑥𝑥−1 + 𝑘𝑛 = 1.
260 N U M B E R T H E O RY

By Theorem 31, this means that 𝑥 and 𝑛 are coprime.


Conversely, if 𝑥 and 𝑛 are coprime, then by Theorem 31 there exist 𝑦, 𝑧 ∈ ℤ such
that
𝑦𝑥 + 𝑧𝑛 = 1.
Let 𝑦 ′ be the remainder of 𝑦 modulo 𝑛. So 𝑦 = 𝑙𝑛 + 𝑦 ′ for some 𝑙 ∈ ℤ (recalling the
definition of remainder with (7.3), but here 𝑛 is the divisor). So we have

(𝑙𝑛 + 𝑦 ′ )𝑥 + 𝑧𝑛 = 1,

which means
𝑦 ′ 𝑥 + (𝑙𝑥 + 𝑧)𝑛 = 1,
Since both 𝑥 and 𝑦 ′ are in ℤ𝑛 , this means that

𝑦′𝑥 = 1 in ℤ𝑛 .

But this is precisely the statement that 𝑦 ′ is the multiplicative inverse of 𝑥 in ℤ𝑛 .

Theorem 36 shows one reason for the importance of coprimality: it determines which
elements of ℤ𝑛 have inverses. The Euclidean Algorithm can be used to determine whether
or not a member 𝑥 of ℤ𝑛 is coprime and therefore whether or not it has an inverse.
Furthermore, if we want to determine the inverse 𝑥−1 of 𝑥, then applying the Extended
Euclidean Algorithm to 𝑥 and 𝑛 gives 𝑦, 𝑧 ∈ ℤ such that

𝑦𝑥 + 𝑧𝑛 = 1.

Then 𝑦 (or its remainder modulo 𝑛) is the inverse of 𝑥 in ℤ𝑛 .


This is the main reason we use the Extended Euclidean Algorithm: to find inverses
in ℤ𝑛 .
We write ℤ∗𝑛 for the set of members of ℤ𝑛 that have multiplicative inverses. For
example:

ℤ∗6 = {1, 5},


ℤ∗7 = {1, 2, 3, 4, 5, 6}.

7.12 THE EULER TOTiENT FUNCTiON

The number of positive integers less than 𝑛 that are coprime to 𝑛 is denoted by 𝜑(𝑛),
and this function 𝜑 is called the Euler totient function.
function Putting it another way:

𝜑(𝑛) = |{𝑥 ∶ 0 < 𝑥 < 𝑛, gcd(𝑥, 𝑛) = 1}|.

Exercise: determine 𝜑(6) and 𝜑(7).


7.12 T H E E U L E R T O T i E N T F U N C T i O N 261

From the definitions of the Euler totient function and ℤ∗𝑁 , together with Theorem 36,
we have
𝜑(𝑛) = |ℤ∗𝑛 |.
The Euler totient function of a prime number 𝑝 is given by

𝜑(𝑝) = 𝑝 − 1. (7.13)

This is because every positive integer less than a prime must be coprime to it. So the
maximum possible value of 𝜑(𝑛) is 𝑛 − 1, and that this is attained if and only if 𝑛 is
prime.
We can extend (7.13) to prime powers.

Theorem 37.
37 For any prime 𝑝 and any 𝑚 ∈ ℕ,

𝜑(𝑝 𝑚 ) = 𝑝 𝑚−1 (𝑝 − 1). (7.14)

Proof. Let 𝑝 𝑚 be a power of a prime 𝑝, where 𝑚 ∈ ℕ. The total number of positive


integers < 𝑝 𝑚 is 𝑝 𝑚 − 1. How many of these are coprime to 𝑝 𝑚 ? The only proper
divisors of 𝑝 𝑚 are 𝑝, 𝑝 2 , 𝑝 3 , … , 𝑝 𝑚 − 1, which are all powers of 𝑝. So, if a positive integer
𝑥 is not coprime to 𝑝 𝑚 , then its gcd with 𝑝 𝑚 must be a multiple of 𝑝 (which includes
the possibility of being a multiple of some higher power of 𝑝 such as 𝑝 2 , 𝑝 3 , etc.). How
many multiples of 𝑝 are there in the set {1, 2, … , 𝑝 𝑚 − 1}? The multiples of 𝑝 in this set
are 𝑝, 2𝑝, 3𝑝, … , 𝑝 𝑚 −𝑝}, noting that the last member of this sequence equals (𝑝 𝑚−1 −1)𝑝.
There are 𝑝 𝑚−1 − 1 of these. So we have

𝜑(𝑛) = # positive integers < 𝑝 𝑚 that are coprime to 𝑝 𝑚


= # positive integers < 𝑝 𝑚 that are not multiples of 𝑝
= (# positive integers < 𝑝 𝑚 ) − (# multiples of 𝑝 that are < 𝑝 𝑚 )
= (𝑝 𝑚 − 1) − (𝑝 𝑚−1 − 1)
= 𝑝 𝑚 − 𝑝 𝑚−1
= 𝑝 𝑚−1 (𝑝 − 1).

To be able to compute 𝜑(𝑛) when 𝑛 is not a prime power, we need something more.
Fortunately, we have

𝑎, 𝑏 coprime ⟹ 𝜑(𝑎𝑏) = 𝜑(𝑎)𝜑(𝑏). (7.15)

(In number theory, a function 𝑓 is said to be multiplicative if 𝑓(𝑎𝑏) = 𝑓(𝑎)𝑓(𝑏) when-


ever 𝑎 and 𝑏 are coprime. So we have just stated that the Euler totient function is
multiplicative.)
262 N U M B E R T H E O RY

It is worth noting in passing that, if 𝑛 = 𝑝𝑞 is the product of two primes 𝑝 and 𝑞,


then
𝜑(𝑛) = (𝑝 − 1)(𝑞 − 1). (7.16)
This fact will be used later on,when we consider the RSA cryptosystem.
Properties (7.13)–(7.15) enable us to compute 𝜑(𝑛) for any positive integer 𝑛 whose
complete prime factorisation is known. Here are its values for 𝑛 ≤ 13:

𝑛: 2 3 4 5 6 7 8 9 10 11 12 13
𝜑(𝑛): 1 2 2 4 2 6 4 6 4 10 4 12
In fact, if you have a method for factorising any integer, then you can use it to
compute the Euler totient function of any integer. Given a positive integer 𝑛, you first
𝑚 𝑚 𝑚
factorise it into a product of prime powers, say 𝑝1 1 𝑝2 2 ⋯ 𝑝𝑘 𝑘 , where 𝑝1 , 𝑝2 , … , 𝑝𝑘 are
𝑚 𝑚
primes. Powers 𝑝𝑖 𝑖 and 𝑝𝑗 𝑗 of distinct primes 𝑝𝑖 , 𝑝𝑗 (𝑖 ≠ 𝑗) are coprime, so
𝑚 𝑚 𝑚 𝑚 𝑚 𝑚
𝜑(𝑛) = 𝜑 ⒧𝑝1 1 𝑝2 2 ⋯ 𝑝𝑘 𝑘 ⒭ = 𝜑 ⒧𝑝1 1 ⒭ 𝜑 ⒧𝑝2 2 ⒭ ⋯ 𝜑 ⒧𝑝𝑘 𝑘 ⒭ ,
𝑚
by (7.15). Then we can use (7.14) to work out each 𝜑 ⒧𝑝𝑖 𝑖 ⒭. This gives

𝑚1 −1 𝑚2 −1 𝑚𝑘 −1
𝜑(𝑛) = ⒧𝑝1 (𝑝1 − 1)⒭ ⒧𝑝2 (𝑝2 − 1)⒭ ⋯ ⒧𝑝𝑘 (𝑝𝑘 − 1)⒭ .

This can also be written as

𝑚 1 𝑚 1 𝑚 1
𝜑(𝑛) = 𝑝1 1 ⒧1 − ⒭ 𝑝2 2 ⒧1 − ⒭ ⋯ 𝑝𝑘 𝑘 ⒧1 − ⒭
𝑝1 𝑝2 𝑝𝑘
𝑚 𝑚 𝑚 1 1 1
= 𝑝1 1 𝑝2 2 ⋯ 𝑝𝑘 𝑘 ⒧1 − ⒭ ⒧1 − ⒭ ⋯ ⒧1 − ⒭
𝑝1 𝑝2 𝑝𝑘
1 1 1
= 𝑛 ⒧1 − ⒭ ⒧1 − ⒭ ⋯ ⒧1 − ⒭ .
𝑝1 𝑝2 𝑝𝑘

It is in general difficult to compute 𝜑(𝑛) from scratch (i.e., if the prime factorisation
of 𝑛 is unknown).

7.13 FA S T E X P O N E N T i AT i O N

In ordinary arithmetic, raising a number 𝑎 to a power 𝑚 means forming the product of


𝑚 copies of 𝑎:
𝑎𝑚 = 𝑎 ×𝑎 ×⋯𝑎.

𝑚 factors

On the face of it, this means doing 𝑚 − 1 multiplications, which if 𝑚 is large means
a lot of work. Fortunately, there are much more efficient methods. If we express the
exponent as a sum of powers of 2 (which is given by its binary representation), then we
7.14 M O D U L A R E X P O N E N T i AT i O N 263

can compute 𝑎 𝑚 in a way that does not use too many multiplications. For example, to
calculate 336 , observe that 36 = 25 + 22 , so
5 +22 5 2
336 = 32 = 32 × 32 = ((((32 )2 )2 )2 )2 × (32 )2 .

Careful accounting should show that the number of multiplications required to compute
336 is 6. These are:

quantity to work out calculation result


32 : 3 = 3×3 = 2
9
(32 )2 : 92 = 9 × 9 = 81
((32 )2 )2 : 812 = 81 × 81 = 6561
(((32 )2 )2 )2 : 2
6561 = 6561 × 6561 = 43046721
((((32 )2 )2 )2 )2 : 2
43046721 = 43046721 × 43046721 = 1853020188851841
((((32 )2 )2 )2 )2 × (32 )2 : 1853020188851841 × 81 = 14824161510814728

This compares well with the naïve method, which requires 35 multiplications.
The idea indicated here can be used as the basis of an algorithm for exponentiation.
Such an algorithm can compute 𝑎 𝑚 with at most 2⌊log2 𝑚⌋ multiplications, which can
be shown to be at most twice the number of bits in the binary representation of 𝑚.

7.14 M O D U L A R E X P O N E N T i AT i O N

Modular exponentiation just means doing exponentiation in ℤ𝑛 .


The fast exponentiation method of § 7.13 can be used for modular exponentiation too,
and again it significantly reduces the number of multiplications required. For example,
to compute 336 mod 25, we can do the six multiplications described on page 263 in § 7.13
and obtain:
336 mod 25 = 150094635296999121 mod 25 = 21.
But when doing exponentiation in ℤ𝑛 , we only ever need to multiply numbers that are
< 𝑛, so we can save effort by reducing the factors mod 𝑛 as we go. In the example of
336 mod 25, the six multiplications we need to do become much simpler:
264 N U M B E R T H E O RY

quantity to work out calculation result

32 mod 25: 32 = 3 × 3 = 9
2
3 mod 25 = 9 mod 25 = 9

(32 )2 mod 25: 92 = 9 × 9 = 81


92 mod 25 = 81 mod 25 = 6

((32 )2 )2 mod 25: 62 = 6 × 6 = 36


2
6 mod 25 = 36 mod 25 = 11

(((32 )2 )2 )2 mod 25: 112 = 11 × 11 = 121


2
11 mod 25 = 121 mod 25 = 21

((((32 )2 )2 )2 )2 mod 25: 212 = 21 × 21 = 441


212 mod 25 = 441 mod 25 = 16

((((32 )2 )2 )2 )2 × (32 )2 mod 25: 16 × 6 = 96


212 mod 25 = 96 mod 25 = 21
For modular exponentiation, an even more dramatic simplification is possible. This
is a consequence of the following result, which is of fundamental importance.
Theorem 38.
38 (Euler) If 𝑥 is coprime to 𝑛, then

𝑥𝜑(𝑛) ≡ 1 (mod 𝑛). (7.17)

Proof. Let 𝑥 be coprime to 𝑛. Let the 𝜑(𝑛) members of ℤ∗𝑛 be

𝑦1 , 𝑦2 , … , 𝑦𝜑(𝑛) . (7.18)

If we multiply each of these by 𝑥, then they become

𝑥𝑦1 , 𝑥𝑦2 , … , 𝑥𝑦𝜑(𝑛) . (7.19)

Now, these must all be distinct in ℤ𝑛 too.


• If that were not the case, then we would have 𝑥𝑦𝑖 ≡ 𝑥𝑦𝑗 (mod 𝑛) with 𝑖 ≠ 𝑗. Then
we can multiply each side by 𝑥−1 (which exists since 𝑥 is coprime to 𝑛) to obtain
𝑦𝑖 ≡ 𝑦𝑗 (mod 𝑛), which means that 𝑦𝑖 = 𝑦𝑗 (since 𝑦𝑖 and 𝑦𝑗 are both in ℤ𝑛 ). So
our distinct members of ℤ∗𝑛 would not all be distinct after all, a contradiction.
Therefore our two lists, (7.18) and (7.19), each give all the 𝜑(𝑛) elements of ℤ∗𝑛 , with the
only difference between the lists being the order of the elements. In effect, multiplication
of all elements by 𝑥 just reorders the list.
7.14 M O D U L A R E X P O N E N T i AT i O N 265

Now let’s form the products, in ℤ∗𝑛 , of the elements in each list, and also rearrange
the second product by collecting all the 𝑥 factors together:

product in ℤ∗𝑛 of elements in (7.18): 𝑦1 𝑦2 ⋯ 𝑦𝜑(𝑛)



product in ℤ𝑛 of elements in (7.19): 𝑥𝑦1 𝑥𝑦2 ⋯ 𝑥𝑦𝜑(𝑛) = 𝑥𝜑(𝑛) 𝑦1 𝑦2 ⋯ 𝑦𝜑(𝑛)

Since the two products come from the same elements being multiplied, they must be
equal. Therefore
𝑦1 𝑦2 ⋯ 𝑦𝜑(𝑛) ≡ 𝑥𝜑(𝑛) 𝑦1 𝑦2 ⋯ 𝑦𝜑(𝑛) (mod 𝑛). (7.20)
Now, since all the elements 𝑦1 , 𝑦2 , … , 𝑦𝜑(𝑛) are elements of ℤ∗𝑛 , their product 𝑦1 𝑦2 ⋯ 𝑦𝜑(𝑛)
must be in ℤ∗𝑛 too. Therefore 𝑦1 𝑦2 ⋯ 𝑦𝜑(𝑛) has an inverse. Multiplying each side of (7.20)
by this inverse, we obtain
𝑥𝜑(𝑛) ≡ 1 (mod 𝑛),
which completes the proof.6

An important special case of Theorem 38 is when 𝑛 is a prime 𝑝. Then (7.13) and


(7.17) give
𝑥𝑝−1 ≡ 1 (mod 𝑝) for all 𝑥 ∈ {1, … , 𝑝 − 1}. (7.21)
This is known as Fermat’s Little Theorem. The generalisation above (Theorem 38) is
due to Euler.
Theorem 38 is used at the heart of the RSA cryptosystem and Equation 7.21 is at
the core of the Diffie-Hellman key exchange scheme (§ 7.18).
One of the main computational uses of Theorem 38 is to simplify modular exponenti-
ation. The Theorem enables us to work mod 𝜑(𝑛) in the exponent. So if we want to sim-
plify 𝑎 𝑥 mod 𝑛, and we know that 𝑥 ≡ 𝑦 (mod 𝜑(𝑛)), then we can write 𝑎 𝑥 ≡ 𝑎 𝑦 (mod 𝑛).
For example, suppose we want to simplify 29 mod 7. We compute 𝜑(7) = 7 − 1 = 6
(by (7.13)). So the exponent 9 can be simplified mod 6, giving 9 mod 6 = 3. Hence
29 mod 7 = 23 mod 7 = 8 mod 7 = 1.
Here’s how this works for 336 mod 25. First, we find the Euler totient function of
the modulus, 25.
(by 7.14)
𝜑(25) = 𝜑(52 ) = 5 ⋅ (5 − 1) = 20.
Then we reduce the exponent mod 𝜑(𝑛), in this case mod 20.

336 mod 25 = 336 mod 𝜑(25) mod 25 = 336 mod 20 mod 25 = 316 mod 25.

6 This theorem can be understood more deeply using group theory, which is beyond the scope of these
Course Notes, but we give an outline of this connection here. It can be shown that ℤ∗𝑛 is a group under
multiplication. This means that it is closed (if two numbers 𝑎, 𝑏 are coprime to 𝑛, then their product 𝑎𝑏 will
be coprime to 𝑛 as well), the multiplication operation is associative, there is a multiplicative identity (in
this case, 1), and every element has a multiplicative inverse (Theorem 36). It is a basic theorem of group
theory that, if 𝑔 is a member of the group and 𝑘 is the size of the group, then 𝑔𝑘 equals the multiplicative
identity of the group. Theorem 38 is simply the application of these results to the group ℤ∗𝑛 .
266 N U M B E R T H E O RY

We now have a much simpler problem, with an exponent of 16 instead of 36. A simpler
exponent will typically mean fewer multiplications are needed. In this case, we have
4
336 mod 25 = 316 mod 25 = 32 mod 25 = (((32 )2 )2 )2 mod 25.

This requires just four multiplications:

quantity to work out calculation result

32 mod 25: 32 = 3 × 3 = 9
2
3 mod 25 = 9 mod 25 = 9

(32 )2 mod 25: 92 = 9 × 9 = 81


92 mod 25 = 81 mod 25 = 6

((32 )2 )2 mod 25: 62 = 6 × 6 = 36


2
6 mod 25 = 36 mod 25 = 11

(((32 )2 )2 )2 mod 25: 112 = 11 × 11 = 121


112 mod 25 = 121 mod 25 = 21

This can be compared with the longer calculation on p. 263 in § 7.14. The saving
becomes much greater for higher exponents.

7.15 PRiMiTiVE ROOTS

Suppose gcd(𝑥, 𝑛) = 1. Then Theorem 38 tells us that

𝑥𝜑(𝑛) ≡ 1 (mod 𝑛).

If, in addition,
𝑥𝑘 ≢ 1 (mod 𝑛) for all 𝑘 < 𝜑(𝑛),
then we say that 𝑥 is a primitive root of 𝑛. Such an 𝑥 has the property that its powers
𝑥, 𝑥2 , 𝑥3 , … give all the elements of ℤ∗𝑛 , in some order, reaching 1 at 𝑥𝜑(𝑛) . In this sense,
𝑥 generates ℤ∗𝑛 . If 𝑥 is not a primitive root, then its powers will go through some
proper subset of ℤ∗𝑛 .
We emphasise that, among the members of ℤ∗𝑛 , it is the primitive roots that have
the greatest possible range of values of their powers. This will be important later on.

Example:
Suppose 𝑛 = 7. Then 𝜑(7) = 6, by (7.21), since 7 is prime, and ℤ∗7 = {1, 2, 3, 4, 5, 6}.
Consider the successive powers of 3 in ℤ∗7 :
7.15 P R i M i T i V E R O O T S 267

𝑘: 1 2 3 4 5 6
𝑘
3 mod 7: 3 2 6 4 5 1
We see from this that 3𝜑(7) = 36 ≡ 1 (mod 7), and that 3𝑘 ≢ 1 (mod 7) for all 𝑘 < 6.
So the values of 3𝑘 mod 7, for 𝑘 = 1, … , 6, are all the elements of ℤ∗7 . So 3 is a primitive
root of 7.
On the other hand, 2 is not a primitive root of 7. Consider its powers:

𝑘: 1 2 3
𝑘
2 mod 7: 2 4 1
So 23 ≡ 1 (mod 7), with the exponent 3 < 𝜑(7), which means the definition of prim-
itive root is not satisfied. Taking further powers, with 𝑘 = 4, 5, 6, … , will just give the
same numbers 2,4,1,…; we will not get anything new.

Not all positive integers have primitive roots. For example, there are no primi-
tive roots of 8. Let us consider all the candidates. Firstly, observe that 𝜑(8) = 4 and
ℤ∗8 = {1, 3, 5, 7}. The following table shows that no member of ℤ∗8 can be a primitive root.

𝑘: 1 2
𝑘
1 mod 8: 1
3𝑘 mod 8: 3 1
5𝑘 mod 8: 5 1
7𝑘 mod 8: 7 1
Note that it does not help to look at powers of some number 𝑥 which is in ℤ8 but
not in ℤ∗8 . For example, try 𝑥 = 2. Its powers in ℤ∗8 are 2, 4, 0, 0, 0, …; it never reaches
1, and in fact once it reaches 0 it is stuck there. This is typical of what happens when
taking successive powers of a member of ℤ𝑛 ∖ ℤ∗𝑛 . So a primitive root of 𝑛, if one exists,
must in fact be a member of ℤ∗𝑛 .
Numbers that have primitive roots have been characterised.
Theorem 39.
39 The numbers which have primitive roots are 1, 2, 4, and those of the
form 𝑝 or 2𝑝 𝑘 , where 𝑝 is an odd prime and 𝑘 ∈ ℕ.
𝑘

The number 8 is not covered by this list of possibilities. (It is a power of the even
prime, 2, rather than an odd one.) So it cannot have a primitive root.
Theorem 40.
40 If 𝑛 has a primitive root, then it has 𝜑(𝜑(𝑛)) of them. □
This result can be proved by elementary methods, and insight into why it is true
can be gained by playing with small examples.
Consider our earlier example of 𝑝 = 7. We saw that 3 is a primitive root of 7, and
found all its powers 3𝑖 for 𝑖 ≤ 6, with 36 ≡ 1 (mod 7). We then saw that 2 is not a
primitive root of 7, because its powers reach 1 too soon: 23 ≡ 1 (mod 7).
Now let’s use the representation of 2 as a power of the primitive root 3, and look
more closely at why 2 fails to be a primitive root.
268 N U M B E R T H E O RY

We have 2 ≡ 32 (mod 7), and the exponent here, also 2, is a divisor of 6, with
2 × 3 = 6, so
23 ≡ (32 )3 = 32×3 = 36 ≡ 1 (mod 7).
This illustrates that we can use the exponents to work out all the primitive roots. The
exponent 2 is a factor of 6, and as we have just seen, this stops 32 mod 7 = 2 from being
another primitive root. The exponent 3 is also a factor of 6, so 33 mod 7 = 6 won’t be a
primitive root either:

62 ≡ (33 )2 = 33×2 = 36 ≡ 1 (mod 7).

But it’s not just about being a factor of 6. Consider exponent 4, which gives 34 mod 7 =
4. This is not a factor of 6, but it does have a factor of 2 in common with 6. Because of
this, it won’t give us a primitive root either:

43 ≡ (34 )3 = 34×3 = 312 = 36×2 = (36 )2 ≡ 12 = 1 (mod 7).

So, if the exponent is not coprime to 6, then it cannot yield a primitive root. This
prevents 32 , 33 and 34 from being primitive roots of 7. So none of 2,6,4 are primitive
roots of 7.
On the other hand, if the exponent is coprime to 6, it will yield a primitive root
when our first primitive root 3 is raised to that exponent. This means that exponents 1
and 5 yield primitive roots. So the two primitive roots of 7 are 31 = 3 and 35 mod 7 = 5.
So the number of primitive roots of 7 is indeed

𝜑(𝜑(7)) = 𝜑(7 − 1) = 𝜑(6) = 𝜑(2 ⋅ 3) = 𝜑(2)𝜑(3) = (2 − 1)(3 − 1) = 1 ⋅ 2 = 2.

This illustrates Theorem 40, and also that, once we have one primitive root 𝑎 of 𝑛, the
others have the form 𝑎 𝑘 where 𝑘 is coprime to 𝜑(𝑛).

There is, at present, no fast algorithm for finding a primitive root of a number.

7.16 O N E - WAY F U N C T i O N S

Cryptography is about transforming information efficiently in such a way that reversing


the transformation is difficult for someone who is not authorised to do so. So it is natural
to consider functions that are easy to compute but hard to invert.
Informally, a function is one-way if it is:

1. “easy” to compute;

2. “hard” to invert, “usually”.

This can be made precise, but doing so is beyond the scope of this unit.
7.17 M O D U L A R E X P O N E N T i AT i O N W i T H F i X E D B A S E 269

One-way functions are believed to exist, and there is a number of functions that
are regarded as one-way functions. We will see some examples shortly. However, no
function has been rigorously proved to be one-way in the precise formal sense of that
term.

Application of one-way functions: password checking


Let 𝑓 be a one-way function. Suppose we have users 𝑈𝑖 , 1 ≤ 𝑖 ≤ 𝑁 , and that user
𝑈𝑖 has password 𝑃𝑖 , 1 ≤ 𝑖 ≤ 𝑁 . In order to authenticate a user logging into a computer
system, their password must be checked to ensure it is the correct one for that user.
The obvious way to do this is to have a place in the system where users’ passwords are
stored, so that a password entered by a user logging in can be checked against the stored
password to see if it is the same. However, storing passwords themselves is too much of
a risk. Someone who gains access to the place where passwords are stored will then be
able to find the passwords associated with usernames, which gives them the ability to
pretend to be one of those users when logging into the system.
It is better for the system to store, with each username, the image of the password
under the one-way function 𝑓. So the information stored is a list of pairs (𝑈𝑖 , 𝑓(𝑃𝑖 )),
1 ≤ 𝑖 ≤ 𝑁 . When a user 𝑈𝑖 logs in, the password 𝑃 entered is first fed into the one-way
function 𝑓, which computes 𝑓(𝑃). The system looks up the list of users, finds the pair
(𝑈𝑖 , 𝑓(𝑃𝑖 )), and tests whether 𝑓(𝑃) = 𝑓(𝑃𝑖 ). If so, the entered password is accepted and
the user is admitted to the system. If not, the login attempt is rejected.
Under this system, if someone gains access to a pair (𝑈𝑖 , 𝑓(𝑃𝑖 )), they still do not
know the password. To find the password, they would have to work out 𝑃𝑖 just from
knowledge of 𝑓(𝑃𝑖 ). But this is precisely the problem of inverting 𝑓. If 𝑓 is a one-way
function, then this should almost always be difficult. So the value 𝑓(𝑃𝑖 ) is of no help in
logging in.
Note also that, since 𝑓 is a one-way function, it is easy to compute, so checking
passwords can be done quickly, even though finding a password, just from its image
under 𝑓, is difficult.
Sometimes, this method of password verification is described as using “encryption”,
and the transformed password 𝑓(𝑃𝑖 ) is called an “encrypted password”. Strictly speaking,
this is inaccurate, since no key is involved: the password is always transformed in the
same way. Furthermore, there is no-one to play the role played by the receiver of a
secret message in an ordinary cryptosystem: there is no-one who is intended to be able
to recover 𝑃𝑖 from 𝑓(𝑃𝑖 ).

7.17 M O D U L A R E X P O N E N T i AT i O N W i T H F i X E D B A S E

We now consider an important candidate one-way function.


Suppose we fix a modulus 𝑛 and a base 𝑎 < 𝑛.
270 N U M B E R T H E O RY

The function modular exponentiation with fixed base performs the mapping

𝑥 ↦ 𝑎 𝑥 mod 𝑛.

We have seen that this can be computed efficiently, using the techniques of § 7.14. Here,
we will focus on using exponents 𝑥 < 𝜑(𝑛), since we saw in § 7.14 that every other
possible exponent can be reduced to such an 𝑥.
Although this function is easy to compute, it seems to be much harder to invert.
For the inverse, we are given 𝑦 < 𝑛 and must find 𝑥 such that 𝑎 𝑥 ≡ 𝑦 (mod 𝑛). If
we want to make the inverse to be as hard as possible, we should ensure that there are
as many potential values of 𝑥 as possible, so that exhaustively searching through all
possible 𝑥, to find the one which satisfies 𝑎 𝑥 ≡ 𝑦 (mod 𝑛), takes as long as possible. To
this end, it is desirable to choose 𝑎 to be a primitive root of 𝑛, because such numbers
give the greatest range of possible values of their powers (as we observed early in § 7.15).
The inverse problem is then the following.

Discrete Logarithm:
Fix: 𝑛 ∈ ℕ, primitive root 𝑎 of 𝑛.
Input: 𝑦 ∈ ℤ∗𝑛
Output: 𝑥 ≤ 𝜑(𝑛) such that 𝑎 𝑥 ≡ 𝑦 (mod 𝑛).

When 𝑥 and 𝑦 are related in this way, we may write

𝑥 = log𝑎 𝑦.

The current belief is that Discrete Logarithm has no fast algorithm. In fact, it seems
to be of about the same difficulty as factorising integers. The best known algorithms for
each take, very roughly, similar amounts of computation time.
Furthermore, Discrete Log is believed to be almost always hard.
Modular exponentiation with a fixed primitive root as the base is believed to be a
one-way function, although this has not been proved, and proving it would solve a major
open problem in computer science.
Recall that 𝜑(𝑛) is maximised when 𝑛 is prime. So, for the best possible candidate
one-way function, we should choose a large prime 𝑝 and then choose 𝑎 to be a primitive
root of 𝑝. This is exactly what we do when using this candidate one-way function to
help distribute cryptographic keys securely. This is described in the next section.

7.18 DiFFiE-HELLMAN KEY AGREEMENT SCHEME

Suppose we have a large number of users who wish to be able to communicate with
each other. Suppose that any pair of them who communicate want secrecy, so that
no-one else (including others in our large set of users) can read their messages. With
traditional cryptosystems (including the type discussed in § 2.10), they will need to
7.18 D i F F i E - H E L L M A N K E Y A G R E E M E N T S C H E M E 271

agree in advance on a shared secret key. This requires each pair of users to have their
own secure communications channel. Not only is this expensive, but it requires time
to arrange. This severely limits the ability of the users to communicate spontaneously,
without having planned ahead of time to do so. It is clear that the demands of modern
electronic communications and commerce are quite at odds with these limitations.
A remarkable solution to this problem was proposed by Diffie and Hellman in 1976,
in a paper that marked the beginning of a revolution in cryptography.7 Their method
uses the one-way function we have just met: modular exponentiation with fixed base,
where the base is a primitive root of a large prime. It works as follows.
Firstly, we fix a large prime 𝑝 and a primitive root 𝑎 of 𝑝. These numbers are public,
in that they are disseminated, without any encryption, to all users of the system, and
system-wide, in that all users use the same values of 𝑝 and 𝑎. Each user generates their
own private random number 𝑥 ∈ {1, … , 𝑝 −1}, which is regarded as a member of ℤ∗𝑝 , and
from it generates 𝑦 = 𝑎 𝑥 mod 𝑝, which is made public. (Note that we are refraining from
calling these numbers keys since, strictly speaking, they are not used directly to encrypt
or decrypt messages.)
Observe that anyone who wants to determine some user’s private number 𝑥 is faced
with the problem of inverting a candidate one-way function. They see 𝑝, 𝑎 and 𝑦 =
𝑎 𝑥 mod 𝑝, and from this must determine 𝑥. This is exactly the Discrete Log problem,
i.e., the inverse of modular exponentiation with fixed base 𝑎. Since this appears to be
a difficult problem, provided 𝑝 is large, we will assume that the private number of each
user is secure.
Suppose now that two users, Alice and Bob, want to communicate. Suppose that
Alice’s private and public numbers are 𝑥𝐴 and 𝑦𝐴 , respectively, while Bob’s are 𝑥𝐵 and
𝑦𝐵 . Each of them knows the system-wide constants 𝑝 and 𝑎, and each can read the
other’s public number. The exact means by which this is done is an implementation
detail that does not concern us here. Perhaps they send the public numbers to each
other (unencrypted), or perhaps the public numbers of all users are collected together
in some central, publicly available file.
Now, when Alice reads Bob’s public number 𝑦𝐵 , she calculates her key 𝑘𝐴𝐵 by raising
Bob’s public number to the power of her own private number:
𝑥
𝑘𝐴𝐵 ∶= 𝑦𝐵𝐴 mod 𝑝.

Note that only she can do this calculation, since only she knows 𝑥𝐴 .
Similarly, Bob reads Alice’s public number 𝑦𝐴 , and then raises it to the power of his
own private number 𝑥𝐵 , obtaining his key 𝑘𝐵𝐴 :
𝑥
𝑘𝐵𝐴 ∶= 𝑦𝐴𝐵 mod 𝑝.

7 W. Diffie and M. E. Hellman, New directions in cryptography, IEEE Transactions on Information Theory
IT-22 (1976) 644–654.
272 N U M B E R T H E O RY

Note that only he can do this, because no-one else knows 𝑥𝐵 . Note also that Alice and
Bob can do these computations independently of each other (provided each has made
their public number available).
Although of Alice and Bob have each done a computation that only they could do,
the keys they each compute turn out, remarkably, to be exactly the same:
𝑥 𝑥
𝑘𝐴𝐵 = 𝑦𝐵𝐴 = (𝑎 𝑥𝐵 )𝑥𝐴 = 𝑎 𝑥𝐵 𝑥𝐴 = 𝑎 𝑥𝐴 𝑥𝐵 = (𝑎 𝑥𝐴 )𝑥𝐵 = 𝑦𝐴𝐵 = 𝑘𝐵𝐴 , all in ℤ∗𝑝 ,

with the exponentiations being done mod 𝑝 using the methods of § 7.14.
The outcome of this process is that Alice and Bob have arrived at the same key,
without that key itself having been sent anywhere. No secure channel is needed for key
transmission. Neither party alone could determine the key they agree on, as it depends
on the private numbers of both of them.
The problem facing a cryptanalyst is the following:

Diffie-Hellman problem:
Given: 𝑝, 𝑎, 𝑎 𝑥𝐴 , 𝑎 𝑥𝐵 (both mod 𝑝);
Find: 𝑎𝑥𝐴 𝑥𝐵 mod 𝑝.

This is believed to be about as difficult as Discrete Log, although we only know that
it cannot be significantly harder than Discrete Log:

Theorem 41. 41 Any algorithm for Discrete Log can be used to construct an algorithm
for the Diffie-Hellman problem. A fast algorithm for Discrete Log yields a fast algorithm
for the Diffie-Hellman problem.

Proof. Suppose we have an algorithm for Discrete Log. The inputs to the Diffie-
Hellman problem are 𝑝, 𝑎, 𝑎𝑥𝐴 and 𝑎 𝑥𝐵 , the last three being members of ℤ∗𝑝 . If we give
𝑝, 𝑎 and (say) 𝑎 𝑥𝐴 to the Discrete Log algorithm, then it will return 𝑥𝐴 . We can then
compute 𝑘𝐴𝐵 = 𝑎 𝑥𝐴 𝑥𝐵 mod 𝑝, just as Alice did above:

𝑘𝐴𝐵 = (𝑎 𝑥𝐵 )𝑥𝐴 = 𝑎 𝑥𝐴 𝑥𝐵 ,

with exponentiation mod 𝑝. This gives the desired output for the Diffie-Hellman prob-
lem.
If the algorithm for Discrete Log is fast, then this prodedure for the Diffie-Hellman
problem, which uses it, will be fast too, since there is not much extra work involved,
and fast exponentiation techniques are used. □

Open problem:
Does the algorithmic relationship between Discrete Log polynomial time and the Diffie-
Hellman problem go the other way too? In other words, is there an efficient way to
transform algorithms for the Diffie-Hellman problem into algorithms for Discrete Log?
7.18 D i F F i E - H E L L M A N K E Y A G R E E M E N T S C H E M E 273

It is widely believed that this is the case, but it has not yet been proved.

The purpose of the Diffie-Hellman system is to enable two parties to agree on a


common key, over an insecure channel. Once this key is agreed, they can send and
receive messages securely using a good secret-key cryptosystem.
The Diffie-Hellman system is not, itself, a cryptosystem. If used in isolation (i.e.,
without subsequent use of a secret-key cryptosystem), neither party can use it to send
a message of their own choosing, in a secure form, to the other party. The only infor-
mation either party receives from the other is the other’s public number, which can in
principle be read by anyone, and in any case is random since it is a function of that user’s
random private number. The only information that either party chooses for themselves
is their private number, but the other party never sees this so it cannot be used as a
message, and in any case it should be randomly chosen to make it as unpredictable as
possible.

Example:
We use small numbers in this example, so the calculations can be done manually
to help understand how it works, but of course you need very large numbers in real
applications.
For our public global parameters, we use 𝑝 = 11 and 𝑎 = 2. In Exercise 17 you will
show that 2 is a primitive root of 11. We will take that as given, for now.
Suppose Alice chooses private number 𝑥𝐴 = 3 and Bob chooses private number 𝑥𝐵 = 6.
Then they compute their public numbers 𝑦𝐴 and 𝑦𝐵 as follows.

Alice Bob

𝑦𝐴 = 𝑎 𝑥𝐴 mod 𝑝 𝑦𝐵 = 𝑎 𝑥𝐵 mod 𝑝
= 23 mod 11 = 26 mod 11
= 8. = 9.

So Alice sends Bob her public number 𝑦𝐴 = 8 and Bob sends Alice his public number
𝑦𝐵 = 9. Then Alice raises Bob’s public number 𝑦𝐵 to the power of her own private number
𝑥𝐴 :
𝑥
𝑘𝐴𝐵 = 𝑦𝐴𝐵 = 93 ≡ 3 (mod 11).
Meanwhile, Bob raises Alice’s public number 𝑦𝐴 to the power of his own private number
𝑥𝐵 :
𝑥
𝑘𝐴𝐵 = 𝑦𝐵𝐴 = 86 ≡ 3 (mod 11).
𝑥 𝑥
We see that 𝑦𝐴𝐵 ≡ 𝑦𝐵𝐴 (mod 11), which is what we expect, since both are equal to

𝑎 𝑥𝐴 𝑥𝐵 = 23⋅6 = 218 ≡ 3 (mod 11).


274 N U M B E R T H E O RY

This last calculation, using both 𝑥𝐴 and 𝑥𝐵 , uses the private numbers of both Alice
and Bob. So this particular calculation cannot be done by either of them individually,
provided they are each able to keep their own private number secret. But we have seen
that they can still each work out the final number 𝑘𝐴𝐵 = 3, even though they do not
know the other’s private number.
The final number, 𝑘𝐴𝐵 = 3 in this case, is then ready for use as a key in a proper
cryptosystem.

7.19 EXERCiSES

1.
(a) Write the statement that integer 𝑑 is a divisor of integer 𝑛 as a predicate logic
expression, using the integer multiplication function and without using the divisibility
predicate ∣ .
(b) Write the statement that integer 𝑛 is prime as a predicate logic expression. You
may use the divisibility relation ∣ and the inequality relation ≤.
(c) Write the statement that integer 𝑑 is the greatest common divisor of integers 𝑚
and 𝑛 as a predicate logic expression, using ∣ and ≤.

2. A positive integer 𝑛 is perfect if it equals the sum of its own proper divisors.

(a) Show by direct calculation that the first two perfect numbers are 6 and 28.

(b) The third perfect number is 496. Verify that 496 is perfect.

Over 50 perfect numbers are known, but it is not yet known if there are infinitely
many. All known perfect numbers are even. It is a longstanding open problem to
determine if any odd perfect numbers exist.

3. If you know the day of the week on which a given date falls in a given year, you
can work out the day of the week of the same date in the next year.
Suppose the days of the week are represented by members of ℤ7 , with Sunday rep-
resented by 0, Monday by 1, and so on. Let 𝑑 ∈ ℤ7 be the day of the week of a given
date this year.

(a) Give an expression in terms of 𝑑, using the mod operation, for the day of the week
for the same date next year.

(b) How would you modify the expression if you knew there was a leap day (29 Febru-
ary) between that date in one year and the same date in the next year?
7.19 E X E R C i S E S 275

(c) Now let 𝑑 be the day of the week of 29 February in some leap year. Assuming the
next leap year is four years later, give an expression in terms of 𝑑 for the weekday of
the next 29 February, again using the mod operation.

4. In (2.1) on p. 54, we introduced a simple (and quite poor) random number


generator based on a function that computes

LCG(𝑥) = the last 31 bits of 1103515245𝑥 + 12345.

Restate this definition using the mod operation.

5. The online game Primel (https://converged.yt/primel/) is about guessing


an unknown five-digit prime number and making the best use of information you are
given after each guess. It was created by Hannah Park in 2022, and is based on Wordle
(https://www.nytimes.com/games/wordle/).
Start by reading the rules at the Primel website and playing the game. Keep in
mind that

• you are not penalised for guessing numbers that are not prime (although you don’t
get any information from non-prime guesses about the unknown prime number).

• You have a maximum of six prime guesses. So you must choose your guesses care-
fully, taking into account what you learn from previous guesses.

Think about what would make a good first guess in Primel, i.e., one which is likely to
give as much information as possible about the unknown prime number.

(a) Are there any prohibitions on certain digits in certain positions, in prime numbers
in general? If so, what does this mean for the digits you might want in the five-digit
prime number you use for your first guess?

(b) The density of prime numbers — i.e., the proportion of numbers in a given interval
that are prime — decreases as numbers get higher, although the pattern is a bit
irregular and unpredictable. This effect is present even among five digit primes. So,
for example, there are more with first digit 1 than with first digit 5.
With this in mind, what five digits would be best to use in your first guess? Which
five-digit prime number do you recommend, as the first guess?

(c) When playing the game, try asking ChatGPT for help with choosing your next guess.
For example, if you know that the unknown prime does not have 1, 2 or 3, then
ask ChatGPT to suggest a five-digit prime number that does not contain 1, 2 or 3.
When it responds, check if the number it gives is prime or not, using an authoritative
program or a table that includes all five-digit primes (e.g., https://t5k.org/lists/
276 N U M B E R T H E O RY

small/10000.txt, https://prime-numbers.de/blox29prime/blox_0.txt). How many


attempts are necessary before ChatGPT gives an actual prime satisfying the condi-
tions?

6. The rule for determining if a year is a leap year is as follows.


A positive integer is a leap year if and only if it is a multiple of 4, unless
it is also a multiple of 100, in which case it is not a leap year unless it is a
multiple of 400.
Define the unary predicate Leap with domain ℕ to be True if and only if its argument
is a leap year.
Using Leap and the divisibility predicate ∣, write the above rule for leap years in
predicate logic.

7. Prove that the square of an odd number is an odd number and the square of an
even number is an even number.

8. Restate the definitions of the Caesar slide encryption and decryption functions
(Exercise 15) using the mod operation.

9. Prove each of the following statements.


(a) 4 ∣ 𝑛 if and only if the last two digits of the decimal representation of 𝑛 give a
multiple of 4.
(b) 8 ∣ 𝑛 if and only if the last three digits of the decimal representation of 𝑛 give a
multiple of 8.
(c) 25 ∣ 𝑛 if and only if the last two digits of the decimal representation of 𝑛 give a
multiple of 25.
(d) Extend (a) and (b) to higher powers of 2.
(e) (Challenge) These divisibility tests, and those at the end of § 7.1𝛼 , all have the
property that the presence of divisor 𝑑 can be determined just from some constant
number of digits at the end of the number. In other words, the number of digits
of 𝑛 that you need to look at is independent of 𝑛. For example, to test divisibility
by 4, you only look at the last two digits, regardless of how large 𝑛 is.
Can you characterise when this happens? For which 𝑑 is it the case that, to test
divisibility of 𝑛 by 𝑑, you only need to look at some constant number of digits at
the very end of 𝑛?

10. The digital sum of a positive integer 𝑛, written in standard decimal notation,
is the sum of its digits. Let’s denote it by ds(𝑛).
7.19 E X E R C i S E S 277

For example,
ds(1984) = 1 + 9 + 8 + 4 = 22.

(a) Prove that, for all 𝑛 ∈ ℕ,

𝑛 mod 3 = ds(𝑛) mod 3.

This can be used repeatedly to compute 𝑛 mod 3: just keep computing the digital
sum of the digital sum of the digital sum … until you get just a single digit, and
then you can determine the remainder manually.

(b) A similar method works for remainders modulo 9. Explain why, briefly.

(c) The alternating digital sum of 𝑛 is obtained by alternately adding and subtract-
ing its digits, starting with addition at the right-hand end and moving to the left.
We’ll denote it by ads(𝑛).
For example,
ads(1984) = −1 + 9 − 8 + 4 = 4.

Prove that, for all 𝑛 ∈ ℕ,

𝑛 mod 11 = ads(𝑛) mod 11.

(d) Devise a technique of similar type for working out 𝑛 mod 3 from the bits of the
binary representation of 𝑛.

11. For each of the following equations, apply the method from Exercise 10(b) to
work out, by hand, the remainder mod 9 of each side. In each case, comment on what
comparing these two remainders tells you.

(a) 92 × 31 = 2847

(b) 92 × 31 = 2852

(c) 92 × 31 = 2861

(d) 2718281828 × 3141592653 = 8539734219628209634

12. Use the Euclidean algorithm to find the greatest common divisor of the following
pair of consecutive Fibonacci numbers: 34 and 55.
What do you notice about the other numbers found at each step of the algorithm?
Prove, by induction on 𝑛, that for all 𝑛 ≥ 3 the Euclidean algorithm uses 𝑛 − 2
subtractions to compute the gcd of the consecutive Fibonacci numbers 𝑓𝑛 and 𝑓𝑛+1 ,
278 N U M B E R T H E O RY

and that every intermediate number found during the computation is also a Fibonacci
number.

13. Use the Extended Euclidean Algorithm to show that 86 and 99 are coprime, and
to express 1 as in integer linear combination of them, and to find the inverse of 86 in ℤ99 .

14.
(a) Find 𝜑(𝑛) for all 𝑛 up to 20.

(b) Find primitive roots for a selection of values of 𝑛.

15. The modular multiplication cryptosystem works as follows. The message and
cypher spaces are each the set of all strings over the 26-letter English alphabet. Each
letter is treated as a member of ℤ26 , with a,b,…,z being 0,1,…,25, respectively. The key
is 𝑘 ∈ ℤ26 . A message 𝑚 is encrypted using the key 𝑘 to produce cyphertext 𝑐 as follows,
where 𝑚𝑖 is the 𝑖-th letter of the message and 𝑐𝑖 is the 𝑖-th letter of the cyphertext:

𝑐𝑖 = 𝑘𝑚𝑖 mod 26.

(a) This definition of keyspace and encryption is not quite correct as it stands. Not
all keys work properly. What restriction must be placed on members of ℤ26 so that
they can work properly as keys for modular multiplication? Specify the keyspace using
appropriate mathematical notation.

(b) Define the decryption function for modular multiplication.

(c) How would modular multiplication work in general, for an arbitrary alphabet size 𝑛?
Define its keyspace, encryption function and decryption function.

16. Compute

175231415926535897932384626433832795028841971693993751058209749445923078164 mod 125

by hand, showing your working.8

17. In this exercise, we investigate the primitive roots of 11.

(a) If a positive integer 𝑎 is a primitive root of 11, what is the least 𝑘 such that 𝑎 𝑘 ≡ 1
(mod 11)?

8 The base here is the year in which the Gregorian calendar was introduced in Britain. The exponent is
⌊1070 𝜋⌋. But you do not need this information to do the computation.
7.19 E X E R C i S E S 279

(b) Show that 2 is a primitive root of 11 by working out its powers 2𝑖 for as far as
necessary. You can do this by repeatedly multiplying by 2, taking remainders mod 11
as you go.
(c) Study the exponents 𝑖 for each 2𝑖 in your list of powers from (b). Which of these
exponents is coprime to 𝜑(11)?
(d) Using (c), list all the primitive roots of 11.

18. Alice and Bob are using the Diffie-Hellman scheme to agree on a key. Their public
global parameters are prime 𝑝 = 11 and primitive root 𝑎 = 7. Their public numbers are

𝑦𝐴 = 2, 𝑦𝐵 = 8.

Play the role of the cryptanalyst: find their private numbers and the shared key they
each compute.

19. You have been happily using the Diffie-Hellman scheme, with modulus 𝑝 = 17
and base 𝑎 = 7, and these parameters have met your (rather limited) security needs.
However your General Manager has decided that “bigger is better” and that, from now
on, you and your co-workers will be using 𝑝 = 18 (with appropriate choice of 𝑎).

(a) What value of 𝑎 would you choose?

(b) What are the security implications of the change? Try to be reasonably precise,
e.g. by roughly estimating the percentage increase/decrease in the time a cryptanalyst
would have to spend on some sort of exhaustive attack.

20. Explain how the security of the Diffie-Hellman scheme will be affected if user A
selects private key 𝑥𝐴 = 𝑝 − 1 (where 𝑝 is the prime modulus used)

21. Devise a variant of the Diffie-Hellman scheme to enable three people A, B and C
to arrive at a common secret key, subject to the following conditions:
• they initially have no secret information in common;
• the common key itself is not sent anywhere by anyone;
• it is hard for an eavesdropper, or anyone only knowing public information, to
determine the common key;
• it is impossible for any one or two of A, B, C to find this common key without the
cooperation of the other(s) (i.e., if the other tells them nothing and has no public
information).
8
C O U N T I N G & C O M B I N AT O R I C S

Counting is a fundamental human activity, dating back tens of thousands of years. It


pervades all fields of human activity, especially those that rely on numbers and mea-
surement. It especially pervades computer science, because computation works with
symbols and proceeds in discrete steps, so that the things we want to measure are very
often described by whole numbers.
If we have a program together with some input for it, we would like to know how
long it will take to run, or at least to be able to estimate this time, or to provide a
lower bound or upper bound. We may also want to know how much of the computer’s
memory will be used.
Consider also the structures that computation works with. These include numbers,
strings, files1 , and sequences, all of which we have met in previous chapters. Structures
we will meet later include trees and graphs. Each of these structures is represented as a
string of symbols constructed according to certain rules. When we are doing computation
with these structures, we want to answer questions like: How large is the structure? How
many structures of a given size can be properly processed by our program?
Answering all these questions requires counting.
Counting is such a natural and ubiquitous activity that we have already done it
many times, in previous chapters. This chapter will review some of the techniques we
have already used, describe them more formally and with greater generality, and put
some of them together into a broader framework.

8.1𝛼 COUNTiNG BY ADDiTiON

Suppose we have a list of 𝑟 numbers and a list of 𝑐 numbers. In total, the two lists
contain 𝑟 + 𝑐 numbers. If each number requires 8 bytes, then the two lists together
require 8(𝑟 + 𝑐) bytes.
Suppose now we want to add up all the numbers in both lists. We can do this using
two loops, one after the other (so the first loop finishes before the second one starts).

sum = 0
1 In terms of its contents, a file is just a string of symbols. The term “file” refers to the way it is stored and
accessed within the computer; the term itself says nothing about its internal structure.

281
282 C O U N T i N G & C O M B i N AT O R i C S

for each 𝑖 from 1 to 𝑟


sum := sum + (𝑖-th entry in the first list)
for each 𝑗 from 1 to 𝑐
sum := sum + (𝑗-th entry in the second list)

This computation requires one addition for each 𝑖 and one addition for each 𝑗. Therefore
we have 𝑟 + 𝑐 additions altogether.
In general, if you are to choose one item from two disjoint sets of options, then the
number of choices available to you is the number of options in the first set plus the
number of options in the second set.
We met this principle in § 1.12, when we saw that the size of a disjoint union of sets
is the sum of the sizes of the sets, see (1.14).

8.2𝛼 C O U N T i N G B Y M U LT i P L i C AT i O N

Suppose you want to store a table of numbers in memory. If the table has 𝑟 rows and
𝑐 columns, then you must store 𝑟 × 𝑐 numbers. If each number requires, say, 8 bytes of
storage, then the table needs 𝑟 × 𝑐 × 8 bytes, or 8𝑟𝑐 bytes for short.
Suppose now that you want to add up all the numbers in the table. We again use
two loops, but this time they are nested rather than separate.

sum = 0
for each 𝑖 from 1 to 𝑟
for each 𝑗 from 1 to 𝑐
sum := sum + (entry in the 𝑖-th row and 𝑗-th column in the table)

This computation requires one addition for each pair (𝑖, 𝑗), and therefore 𝑟𝑐 additions
altogether.
In each case here — whether we are determining storage requirements or counting
additions — we are interested in the number of pairs (𝑖, 𝑗) where 𝑖 ∈ {1, 2, … , 𝑟} and
𝑗 ∈ {1, 2, … , 𝑐}. There are 𝑟 choices for the first member of a pair, and 𝑐 choices for
the second member of a pair. Crucially, these choices are independent, meaning that
the particular choice of the first member of the pair has no effect on the number of
options there are for the second member of the pair, and vice versa. In such cases,
the independence of the choices means that the numbers of the separate choices are
multiplied.
We have seen this principle before, in § 1.14: see (1.19). The size of a Cartesian
product is just the product of the sizes of the sets being combined. In the above examples,
the pairs we are counting are precisely the members of the Cartesian product

{1, 2, … , 𝑟} × {1, 2, … , 𝑐},

where the first set has 𝑟 members and the second set has 𝑐 members, so the number of
them is 𝑟𝑐.
8.3 i N C L U S i O N - E X C L U S i O N 283

8.3 iNCLUSiON-EXCLUSiON

In § 8.1𝛼 , we considered the size of a disjoint union of sets. Determining the size of a
union of sets requires more care when the sets are not disjoint.
If 𝐴 ∩ 𝐵 ≠ ∅ then |𝐴 ∪ 𝐵| is no longer given by |𝐴| + |𝐵|, because |𝐴| + |𝐵| double-
counts everything in the intersection 𝐴 ∩𝐵. So we have to subtract |𝐴 ∩𝐵| once, so that
its members are counted just once instead of twice:

|𝐴 ∪ 𝐵| = |𝐴| + |𝐵| − |𝐴 ∩ 𝐵|. (8.1)

See § 1.12, especially (1.13) and Exercise 9.

Example
How many entries in the Monash library catalogue (https://www.monash.edu/library)
contain at least one of “Babbage” and “Lovelace”?
Let 𝐵 be the set of entries containing “Babbage”, and let 𝐿 be the set of entries
containing “Lovelace”. By doing a Basic Search just for “Babbage” (no need for quotes;
we only use them here to identify the exact search term used), you may find that

|𝐵| = 41, 504.

A similar Basic Search just for “Lovelace” may show that

|𝐿| = 124, 045.

If you enter both terms “Babbage” and “Lovelace” in the search field (separated by a
space, and without quotes), you get the number of items containing both terms:

|𝐵 ∩ 𝐿| = 3, 263.

You can now find the number of items containing at least one of these terms.

|𝐵 ∪ 𝐿| = |𝐵| + |𝐿| − |𝐵 ∩ 𝐿| (by (8.1))


= 41, 504 + 124, 045 − 3, 263
= 162, 286.

In the Monash library catalogue, you can actually check this answer using an Advanced
Search, which enables you to combine searches using any of AND, OR and NOT.2 But
there are many other search tools for which union is either unavailable or significantly
harder than intersection. This makes sense, since searching for two alternative terms
involves two searches of the entire database (once for each term), whereas searching for
2 But hang on, NOT is a unary operation, not a binary operation! Try using the NOT operation to combine
two catalogue searches, which could be using the two searches we have used here or others you are interested
in, and determine which set operation they mean by NOT.
284 C O U N T i N G & C O M B i N AT O R i C S

joint occurrences of two terms (i.e., where both appear in the same item) can be done
by doing just one search of the entire database followed by a second search restricted
to the items found in the first search (with this second search usually being much more
efficient than the first search, since by then there is much less data to sift through).

Now consider the size |𝐴 ∪ 𝐵 ∪ 𝐶| the union of three sets, 𝐴, 𝐵 and 𝐶. Again, just
adding the sizes of these three sets, obtaining |𝐴| + |𝐵| + |𝐶|, overcounts elements that
belong to more than one set.

• An element that belongs to 𝐴 and 𝐵 but not 𝐶 — so it’s in (𝐴 ∩ 𝐵) ∖ 𝐶 — gets


counted twice: once by 𝐴 and once by |𝐵|.

• An element that belongs to all three sets — so it’s in 𝐴 ∩ 𝐵 ∩ 𝐶 — gets counted


three times: once by each of |𝐴|, |𝐵| and |𝐶|.

We can try to compensate by subtracting |𝐴 ∩ 𝐵|, |𝐴 ∩ 𝐶| and |𝐵 ∩ 𝐶|, i.e., the sizes of
the pairwise intersections. This corrects the overcounting of the elements that belong
to exactly two of the three sets. But it over-corrects the overcounting of the elements
that belong to all three sets! This is because any element that belongs to all three of 𝐴,
𝐵 and 𝐶 also belongs to all three of the pairwise intersections 𝐴 ∩ 𝐵, 𝐴 ∩ 𝐶 and 𝐵 ∩ 𝐶.
The net result of this is that everything is now correctly counted except the elements of
𝐴 ∩ 𝐵 ∩ 𝐶, which are not counted at all! So we adjust by adding the size of that triple
intersection, |𝐴 ∩ 𝐵 ∩ 𝐶|, and then everything is counted exactly once as required. The
upshot of this is

|𝐴 ∪ 𝐵 ∪ 𝐶| = |𝐴| + |𝐵| + |𝐶| − |𝐴 ∩ 𝐵| − |𝐴 ∩ 𝐶| − |𝐵 ∩ 𝐶| + |𝐴 ∩ 𝐵 ∩ 𝐶|. (8.2)

See also Exercise 11 in Chapter 1.

Example
Now let’s determine the number of Monash library catalogue entries containing at
least one of “Babbage”, “Lovelace” and “Turing”. Let the sets 𝐵 and 𝐿 be as before, and
let 𝑇 be the set of entries containing “Turing”. We found |𝐵|, |𝐿| and |𝐵 ∩𝐿| earlier, and
we will use them again shortly. Further queries with Basic Search tell us that

|𝑇| = 484, 719


|𝐵 ∩ 𝑇| = 2, 886
|𝐿 ∩ 𝑇| = 2, 518
|𝐵 ∩ 𝐿 ∩ 𝑇| = 971
8.3 i N C L U S i O N - E X C L U S i O N 285

Therefore, using (8.2), the number of entries containing at least one of these three terms
is given by

|𝐵 ∪ 𝐿 ∪ 𝑇| = |𝐵| + |𝐿| + |𝑇| − |𝐵 ∩ 𝐿| − |𝐵 ∩ 𝑇| − |𝐿 ∩ 𝑇| + |𝐵 ∩ 𝐿 ∩ 𝑇|


= 41, 504 + 124, 045 + 484, 719 − 3, 263 − 2, 886 − 2, 518 + 971
= 650, 268 − 8, 667 + 971
= 642, 572.

Notice, in (8.2), how the signs alternate according to the number of intersecting
sets: we add the sizes of single sets, subtract the sizes of pairwise intersections, and add
the sizes of the triple intersections.
This alternation persists in expressions for the sizes of the unions of arbitrary num-
bers of sets in terms of the sizes of all possible intersections of them. Suppose the sets
are 𝐴1 , 𝐴2 , … , 𝐴𝑛 . Then the general expression has the form

|𝐴1 ∪ 𝐴2 ∪ ⋯ ∪ 𝐴𝑛 | = sum of sizes of single sets


− sum of sizes of all intersections of two sets
+ sum of sizes of all intersections of three sets
− sum of sizes of all intersections of four sets
+ ⋯ (8.3)

+ (−1)𝑘 ⋅ sum of sizes of all intersections of 𝑘 sets

+ (−1)𝑛 ⋅ the size of the intersection of all 𝑛 sets.

This can be written more succinctly, and we do so now in stating it as a theorem.

Theorem 42.
42 For all 𝑛,
𝑛
|𝐴1 ∪ 𝐴2 ∪ ⋯ ∪ 𝐴𝑛 | = (−1)𝑘+1 ⋅ (sum of sizes of all intersections of 𝑘 sets) (8.4)
𝑘=1

Proof. We prove, by induction on 𝑛, that (8.4) holds for all 𝑛 ∈ ℕ.

Base case:
When 𝑛 = 1, there is only one set, 𝐴1 , and the left and right sides of (8.4) are
both |𝐴1 |. So the equation holds in this case.

Inductive step:
Let 𝑛 ≥ 1. Assume that (8.4) holds. (This is our Inductive Hypothesis.)
Now consider the size of the union of 𝑛 + 1 sets, i.e., |𝐴1 ∪ 𝐴2 ∪ ⋯ ∪ 𝐴𝑛+1 |.
286 C O U N T i N G & C O M B i N AT O R i C S

We first need to relate this somehow to the size of the union just of 𝑛 sets. To this
end, we can start by relating the union of 𝑛 + 1 sets to the union of the first 𝑛 sets:

𝐴1 ∪ 𝐴2 ∪ ⋯ ∪ 𝐴𝑛+1 = 𝐴1 ∪ 𝐴2 ∪ ⋯ ∪ 𝐴𝑛 ∪ 𝐴𝑛+1

We can view this as a union of two sets, namely 𝐴1 ∪ 𝐴2 ∪ ⋯ ∪ 𝐴𝑛 and 𝐴𝑛+1 . And we
already know how to find the size of the union of two sets.

|𝐴1 ∪ 𝐴2 ∪ ⋯ ∪ 𝐴𝑛+1 |
= |(𝐴1 ∪ 𝐴2 ∪ ⋯ ∪ 𝐴𝑛 ) ∪ 𝐴𝑛+1 |
= |𝐴1 ∪ 𝐴2 ∪ ⋯ ∪ 𝐴𝑛 | + |𝐴𝑛+1 | − |(𝐴1 ∪ 𝐴2 ∪ ⋯ ∪ 𝐴𝑛 ) ∩ 𝐴𝑛+1 |
(by (8.1))
= |𝐴1 ∪ 𝐴2 ∪ ⋯ ∪ 𝐴𝑛 | + |𝐴𝑛+1 | − |(𝐴1 ∩ 𝐴𝑛+1 ) ∪ (𝐴2 ∩ 𝐴𝑛+1 ) ∪ ⋯ ∪ (𝐴𝑛 ∩ 𝐴𝑛+1 )|
(by the distributive law). (8.5)

This is progress: instead of the size of a union of 𝑛 + 1 sets, we have two occurrences of
a size of a union of 𝑛 sets, and we can use the Inductive Hypothesis on each of these.
Note also that the first of these occurrences is added, while the second is subtracted.
Continuing from (8.5), we have

|𝐴1 ∪ 𝐴2 ∪ ⋯ ∪ 𝐴𝑛+1 |
𝑛
= ⒧(−1)𝑘+1 ⋅ (sum of sizes of all intersections of 𝑘 of the sets 𝐴1 , 𝐴2 , … , 𝐴𝑛 )⒭
𝑘=1
+ |𝐴𝑛+1 |
𝑛
sum of sizes of all intersections of 𝑘 of the sets
− ⒧(−1)𝑘+1 ⋅ ⒧ ⒭⒭
𝑘=1
𝐴1 ∩ 𝐴𝑛+1 , 𝐴2 ∩ 𝐴𝑛+1 , … , 𝐴𝑛 ∩ 𝐴𝑛+1
(8.6)
(by applying the Inductive Hypothesis to the first and third terms in (8.5)).

We now consider the first and third terms in (8.6). It turns out that they are closely
related, indeed complementary in a sense. To see this, we consider them each in turn.
The first term in (8.6) uses, inside the summation over 𝑘, the sum of the sizes of
all intersections of 𝑘 of the sets 𝐴1 , 𝐴2 , … , 𝐴𝑛 . It will be convenient to describe this as
the sum of the sizes of 𝑘 of the sets 𝐴1 , 𝐴2 , … , 𝐴𝑛+1 excluding 𝐴𝑛+1 . This is a bit more
long-winded, but it does reflect the context that we are now considering 𝑛 + 1 sets, not
just 𝑛 sets. So the first term in (8.6) may be written
𝑛
sum of sizes of all intersections of 𝑘 of the sets 𝐴1 , 𝐴2 , … , 𝐴𝑛 , 𝐴𝑛+1
(−1)𝑘+1 ⋅ ⒧ ⒭.
𝑘=1
excluding 𝐴𝑛+1
8.3 i N C L U S i O N - E X C L U S i O N 287

Now consider the second term in (8.6). What can we say about the intersection of 𝑘
of the sets 𝐴1 ∩ 𝐴𝑛+1 , 𝐴2 ∩ 𝐴𝑛+1 , … , 𝐴𝑛 ∩ 𝐴𝑛+1 ? The intersection of any two of these sets
is really an intersection of three sets (including 𝐴𝑛+1 ):

(𝐴𝑖 ∩ 𝐴𝑛+1 ) ∩ (𝐴𝑗 ∩ 𝐴𝑛+1 ) = 𝐴𝑖 ∩ 𝐴𝑗 ∩ 𝐴𝑛+1 .

Similarly, an intersection of 𝑘 of the sets 𝐴1 ∩ 𝐴𝑛+1 , 𝐴2 ∩ 𝐴𝑛+1 , … , 𝐴𝑛 ∩ 𝐴𝑛+1 is really an


intersection of 𝑘+1 sets, namely 𝑘 of the sets 𝐴1 , 𝐴2 , … , 𝐴𝑛 and the set 𝐴𝑛+1 . So the sum
of the sizes of all intersections of 𝑘 of the sets 𝐴1 ∩ 𝐴𝑛+1 , 𝐴2 ∩ 𝐴𝑛+1 , … , 𝐴𝑛 ∩ 𝐴𝑛+1 equals
the sum of the sizes of all intersections of 𝑘 + 1 of the sets 𝐴1 , 𝐴2 , … , 𝐴𝑛+1 provided that
𝐴𝑛+1 is one of the sets included when taking the intersection. So the last of the three
main terms in (8.6), where the sum is being subtracted, is
𝑛
sum of sizes of all intersections of 𝑘 + 1 of the sets 𝐴1 , 𝐴2 , … , 𝐴𝑛 , 𝐴𝑛+1
− (−1)𝑘+1 ⋅ ⒧ ⒭.
𝑘=1
including 𝐴𝑛+1

We can take the negation inside the sum, so that the whole sum is now added (instead of
substracted) but the coefficient of the sum of sizes is −(−1)𝑘+1 , which may be rewritten
as (−1)𝑘+2 . Then it equals
𝑛
sum of sizes of all intersections of 𝑘 + 1 of the sets 𝐴1 , 𝐴2 , … , 𝐴𝑛 , 𝐴𝑛+1
(−1)𝑘+2 ⋅ ⒧ ⒭.
𝑘=1
including 𝐴𝑛+1

Here, the number of sets being intersected is 𝑘 + 1, and this is also the exponent of −1
in the coefficient. The range of values that 𝑘 +1 can take, in this sum, is from 2 to 𝑛 +1
(since 𝑘 ranges from 1 to 𝑛). Writing our sum in terms of 𝑘 + 1 rather than 𝑘 gives

𝑛+1
sum of sizes of all intersections of 𝑘 + 1 of the sets 𝐴1 , 𝐴2 , … , 𝐴𝑛 , 𝐴𝑛+1
 (−1)(𝑘+1)+1 ⋅ ⒧ ⒭.
𝑘+1=2
including 𝐴𝑛+1

This sum covers all possible numbers of sets, 𝑘 + 1, except for the case of just a single
set (𝑘 + 1 = 1).
To enable easier comparison with the first term in (8.6), it will help if we consistently
use 𝑘 for the number of sets being intersected. So we will replace 𝑘 + 1 by 𝑘 throughout
the last sum in the previous paragraph. Again, the sum goes from 2 to 𝑛 + 1 instead of
from 1 to 𝑛. So the expression becomes
𝑛+1
sum of sizes of all intersections of 𝑘 of the sets 𝐴1 , 𝐴2 , … , 𝐴𝑛 , 𝐴𝑛+1
(−1)𝑘+1 ⋅ ⒧ ⒭.
𝑘=2
including 𝐴𝑛+1
288 C O U N T i N G & C O M B i N AT O R i C S

Let us now plug the expressions we have derived back into into (8.6). We have

|𝐴1 ∪ 𝐴2 ∪ ⋯ ∪ 𝐴𝑛+1 |
𝑛
sum of sizes of all intersections of 𝑘 of the sets 𝐴1 , 𝐴2 , … , 𝐴𝑛 , 𝐴𝑛+1
= ⒧(−1)𝑘+1 ⋅ ⒧ ⒭⒭
𝑘=1
excluding 𝐴𝑛+1
+ |𝐴𝑛+1 |
𝑛+1
sum of sizes of all intersections of 𝑘 of the sets 𝐴1 , 𝐴2 , … , 𝐴𝑛 , 𝐴𝑛+1
+ (−1)𝑘+1 ⋅ ⒧ ⒭
𝑘=2
including 𝐴𝑛+1
(8.7)

At first glance, it may look like the first and third terms together cover all possible
intersections of any number of the sets 𝐴1 , 𝐴2 , … , 𝐴𝑛+1 , since the first term deals with
those intersections that don’t use 𝐴𝑛+1 and the third term deals with those that do use
𝐴𝑛+1 . But there are some differences.

• The first term covers 𝑘 = 1, whereas the third term does not. Because 𝐴𝑛+1 is
excluded in the first term, it only gives |𝐴1 | + |𝐴2 | + ⋯ + |𝐴𝑛 | and does not include
the size of the last set. This is ok, though, because the size of the last set, |𝐴𝑛+1 |,
has its own term, namely the second term (which we have hardly mentioned, but
now it plays its small part). So the 𝑘 = 1 contribution from the first term, plus
the second term, account for the sum of the sizes of all single sets, and we don’t
need any contribution from the third term for these, which is just as well.

• The third term covers 𝑘 = 𝑛 + 1, whereas the first term does not. But the only
way we can take 𝑛 + 1 sets from the list 𝐴1 , 𝐴2 , … , 𝐴𝑛+1 is to take all of them, and
this means that we must necessarily include 𝐴𝑛+1 . So this is entirely taken care of
by the 𝑘 = 𝑛 + 1 contribution from the third term; the first term does not include
𝑘 = 𝑛 + 1 so it does not interfere.

In summary, we can now write

|𝐴1 ∪ 𝐴2 ∪ ⋯ ∪ 𝐴𝑛+1 |
𝑛+1
sum of sizes of all intersections of 𝑘 of the sets 𝐴1 , 𝐴2 , … , 𝐴𝑛 , 𝐴𝑛+1
= (−1)𝑘+1 ⋅ ⒧ ⒭
𝑘=1
regardless of whether they include or exclude 𝐴𝑛+1
𝑛+1
= (−1)𝑘+1 ⋅ ⒧sum of sizes of all intersections of 𝑘 of the sets 𝐴1 , 𝐴2 , … , 𝐴𝑛 , 𝐴𝑛+1 ⒭.
𝑘=1

This is what we were aiming to prove, so the inductive step is complete.

Conclusion:
Therefore, by Mathematical Induction, (8.4) holds for all 𝑛 ∈ ℕ.
8.4 i N C L U S i O N - E X C L U S i O N : D E R A N G E M E N T S 289

We have just expressed the size of a union of sets as a sum of appropriately-signed


sizes of all possible intersections that can be formed from the sets.
We can also use Inclusion-Exclusion to express the size of an intersection of sets in
terms of a sum of signed sizes of all possible unions. This can be proved by induction,
just as we did above, but interchanging the roles of ∪ and ∩ throughout. It is a good
exercise to use the above proof as a starting point and go through it, making all the
changes needed to turn it into a proof of this new type of Inclusion-Exclusion. It can also
be proved using the above theorem together with De Morgan’s Law for sets (Theorem 1
and Corollary 2); doing this is a more challenging exercise, though it gives a shorter
proof.

Theorem 43.
43 For all 𝑛,
𝑛
|𝐴1 ∩ 𝐴2 ∩ ⋯ ∩ 𝐴𝑛 | = (−1)𝑘+1 ⋅ (sum of sizes of all unions of 𝑘 sets) (8.8)
𝑘=1

8.4 iNCLUSiON-EXCLUSiON: DERANGEMENTS

A function 𝑓 ∶ 𝐴 → 𝐴 is said to fix an element 𝑥 ∈ 𝐴 if 𝑓(𝑥) = 𝑥. The element 𝑥 is then


said to be a fixed point of 𝑓. Other elements of 𝐴 may or may not also be fixed by 𝑓.
Now let 𝑋 ⊆ 𝐴. The function 𝑓 fixes 𝑋 if it fixes all elements of 𝑥. So, for all 𝑥 ∈ 𝑋 ,
we have 𝑓(𝑥) = 𝑥. There may or may not be other fixed points of 𝑓 that are not in 𝑋 .
Given a finite set 𝐴 of 𝑛 elements, it is often of interest to know how many functions
𝑓 ∶ 𝐴 → 𝐴 are fixed-point-free
fixed-point-free, meaning that they have no fixed points.
This is reasonably straightforward if there is no restriction on the type of function
we consider, i.e., no requirement for it to be an injection or a surjection. For each 𝑥 ∈ 𝐴,
its image 𝑓(𝑥) under 𝑓 can be anything except 𝑥, so the number of options is |𝐴 ∖ {𝑥}|
which is 𝑛 − 1. We have the same number of choices for each 𝑥 ∈ 𝐴, and the choices
do not interfere with each other. So, for each 𝑥 ∈ 𝐴, we can choose any of the 𝑛 − 1
options for 𝑓(𝑥), with these choices being independent of each other. This is another
application of the multiplicative counting principle discussed in § 8.2𝛼 . So the total
number of fixed-point-free functions is

(𝑛 − 1)𝑛 .
290 C O U N T i N G & C O M B i N AT O R i C S

But what if we are only interested in counting fixed-point-free bijections?3 This is


the problem of counting derangements, where a derangement is just a fixed-point-free
bijection from a finite set to itself.
This kind of problem crops up in a wide variety of contexts.
• After a written class test, the teacher in the classroom might tell each student to
give their test answers to someone else for marking while she reads out the answers,
with the requirement that each student marks exactly one other student’s work,
and no-one marks their own work.

• In the standard Enigma cypher machine used by the Nazi regime’s army during the
Second World War, at each position in the message the function sending plaintext
letters to cyphertext letters (in the same alphabet) was a fixed-point-free bijection
on the alphabet.
In this situation, we cannot just count the options available for each 𝑥 ∈ 𝐴 and then
multiply them as if they are independent. The problem with that is that, if 𝑓(𝑥) = 𝑦
and 𝑤 ≠ 𝑥 then we must have 𝑓(𝑤) ≠ 𝑦, else 𝑓 is not a bijection. So the choices we make
for each 𝑥 ∈ 𝐴 interfere with each other. So we need another approach. This is where
inclusion-exclusion will prove useful.
We start by taking a complementary view of the problem. This is a general problem-
solving strategy. It won’t work in every situation, but it is worth keeping in mind.
We have

# bijections with no fixed point = total # bijections −


# bijections with at least one fixed point.

We already know the total number of bijections from 𝐴 to 𝐴: this is just 𝑛!, as we saw
in § 2.12. Therefore

# bijections with no fixed point = 𝑛! − # bijections with at least one fixed point.
(8.9)
So, to count fixed-point-free bijections, we first count bijections with at least one fixed
point.
For convenience, denote the elements of 𝐴 by 𝑎1 , 𝑎2 , … , 𝑎𝑛 , so that

𝐴 = {𝑎1 , 𝑎2 , … , 𝑎𝑛 }.

For each 𝑖 ∈ {1, 2, … , 𝑛}, let 𝐴𝑖 be the set of all bijections on 𝐴 that fix 𝑎𝑖 . So the set of
all bijections that fix at least one element of 𝐴 is

𝐴1 ∪ 𝐴2 ∪ ⋯ 𝐴𝑛 .
3 Since 𝐴 is finite, any injection 𝑓 ∶ 𝐴 → 𝐴 is also a surjection, and vice versa. (See the end of § 2.7, on p. 50.)
So, if we count fixed-point-free injections (or surjections) from a finite set to itself, then we will really be
counting fixed-point-free bijections anyway.
8.4 i N C L U S i O N - E X C L U S i O N : D E R A N G E M E N T S 291

So, to count these bijections, we need to determine

|𝐴1 ∪ 𝐴2 ∪ ⋯ 𝐴𝑛 |.

This is a job for the Inclusion-Exclusion principle, in the form of Theorem 42. To apply
that theorem, we will need to work out, for each 𝑘, the sum of the sizes of all intersections
of 𝑘 of the sets 𝐴1 , 𝐴2 , … , 𝐴𝑛 .
Consider, then, what an intersection of 𝑘 of the sets 𝐴1 , 𝐴2 , … , 𝐴𝑛 looks like.
Let’s start with 𝑘 = 1. Then we have just a single set, say 𝐴𝑖 . This is the set of all
bijections on 𝐴 that fix 𝑎𝑖 . How many such bijections are there? If 𝑓 is one of these
bijections, then its value on 𝑎𝑖 is determined by the fact that it fixes 𝑎𝑖 , so for 𝑓(𝑎𝑖 ) we
have no choice: it must equal 𝑎𝑖 . Then none of the other elements of 𝐴 can be mapped
to 𝑎𝑖 , because then 𝑓 would not be a bijection. So 𝑓 maps the elements of 𝐴 ∖ {𝑎𝑖 } to
𝐴 ∖ {𝑎𝑖 }, and in fact it must be a bijection on that set, otherwise it can’t be a bijection
on 𝐴. So the number of bijections on 𝐴 that fix 𝑎𝑖 is just the number of bijections on
𝐴 ∖ {𝑎𝑖 }, and there are (𝑛 − 1)! of these, since |𝐴 ∖ {𝑎𝑖 }| = 𝑛 − 1. Hence

|𝐴𝑖 | = (𝑛 − 1)!. (8.10)

There are 𝑛 of these sets, and all have the same size, so the sum of the sizes of all the
sets 𝐴𝑖 is given by

|𝐴𝑖 | + |𝐴2 | + ⋯ + |𝐴𝑛 | = 𝑛 ⋅ (𝑛 − 1)! = 𝑛!.

Now consider what happens in general, for arbitrary 𝑘. Consider the intersection of
the first 𝑘 sets,
𝐴1 ∩ 𝐴 2 ∩ ⋯ 𝐴 𝑘 .
This contains those bijections on 𝐴 that fix 𝑎1 and also fix 𝑎2 and also 𝑎3 and so on up
to 𝑎𝑘 . So, we want to count bijections 𝑓 ∶ 𝐴 → 𝐴 that fix every 𝑎𝑖 with 1 ≤ 𝑖 ≤ 𝑘. The
values of such a bijection 𝑓 on 𝑎1 , 𝑎2 , … , 𝑎𝑘 is completely determined by the requirement
that it fixes those elements. So it remains to consider what 𝑓 does on the other elements
of 𝐴, namely 𝑎𝑘+1 , 𝑎𝑘+2 , … , 𝑎𝑛 . Now 𝑓 cannot map any of these elements to the fixed
points 𝑎1 , 𝑎2 , … , 𝑎𝑘 , else it would not be a bijection, because each of those elements is
already mapped to by itself. So 𝑓 has to map the set {𝑎𝑘+1 , 𝑎𝑘+2 , … , 𝑎𝑛 } into itself.
Furthermore, it has to map this set onto itself too, else it isn’t a surjection. So, in fact,
the restriction of 𝑓 to {𝑎𝑘+1 , 𝑎𝑘+2 , … , 𝑎𝑛 } must be a bijection on that set, and it can be
any bijection on that set at all. So, counting bijections that fix 𝑎1 , 𝑎2 , … , 𝑎𝑘 is the same
as just counting bijections on {𝑎𝑘+1 , 𝑎𝑘+2 , … , 𝑎𝑛 }, which is a set of size 𝑛 − 𝑘, so there
are (𝑛 − 𝑘)! bijections on this set. So we have

|𝐴1 ∩ 𝐴2 ∩ ⋯ 𝐴𝑘 | = (𝑛 − 𝑘)!.
292 C O U N T i N G & C O M B i N AT O R i C S

This is just one of the many possible intersections of 𝑘 of these sets. Since no element of
𝐴 has any special role, and since their names do not matter, the size of the intersection
of 𝑘 of them is always the same. We could pick any 𝑘 distinct elements of 𝐴, say

𝑎𝑖1 , 𝑎𝑖2 , … , 𝑎𝑖𝑘 ,

where 𝑖1 , 𝑖2 , … , 𝑖𝑘 are any 𝑘 distinct elements of 𝐴. Regardless of our choice, we still have

|𝐴𝑖1 ∩ 𝐴𝑖2 ∩ ⋯ 𝐴𝑖𝑘 | = (𝑛 − 𝑘)!.

Note that the special case 𝑘 = 1 agrees with the expression for that case we derived
above, (8.10).
The number of ways of choosing 𝑘 of the 𝑛 sets 𝐴1 , 𝐴2 , … , 𝐴𝑛 is ⒧𝑛𝑘⒭. So

𝑛
sum of the sizes of all intersections of 𝑘 of the sets = ⒧ ⒭(𝑛 − 𝑘)!.
𝑘

We can now apply Theorem 42 to obtain


𝑛
𝑛
|𝐴1 ∪ 𝐴2 ∪ ⋯ ∪ 𝐴𝑛 | = (−1)𝑘+1 ⒧ ⒭(𝑛 − 𝑘)!. (8.11)
𝑘=1
𝑘

Since the left-hand side here is the number of bijections on 𝐴 that fix at least one member
of 𝐴, using (8.9) gives
𝑛
𝑛
# fixed-point-free bijections = 𝑛! − (−1)𝑘+1 ⒧ ⒭(𝑛 − 𝑘)!.
𝑘=1
𝑘

There are some instructive simplifications we can do here.


First, observe that the main sum starts at 𝑘 = 1, so it does not include 𝑘 = 0, but if
we put 𝑘 = 0 in its summand then (including the negation in front of the sum) it would
be −(−1)0+1 ⒧𝑛0⒭(𝑛 − 0)!, which is just 𝑛!. But this equals the solitary extra term 𝑛! at
the start. So, if we start the sum at 𝑘 = 0 instead of 𝑘 = 1, then we don’t need that
extra term 𝑛! because it’s then part of the sum. So
𝑛
𝑛
# fixed-point-free bijections = (−1)𝑘 ⒧ ⒭(𝑛 − 𝑘)!.
𝑘=0
𝑘

Next, observe that

𝑛 𝑛!
⒧ ⒭(𝑛 − 𝑘)! = ⋅ (𝑛 − 𝑘)! (see § 1.10, especially (1.6))
𝑘 𝑘! (𝑛 − 𝑘)!
𝑛!
= .
𝑘!
8.5 S E L E C T i O N 293

So
𝑛
𝑛!
# fixed-point-free bijections = (−1)𝑘 ⋅ .
𝑘=0
𝑘!
We have been focusing on the number of fixed-point-free bijections. But, often, we
don’t just want to know the number of structures of interest; we may also want to know
what proportion they are, out of all possible structures. In this case, we may ask, what
proportion of all bijections on 𝐴 are fixed-point-free? Since there are 𝑛! bijections on 𝐴
altogether, the proportion that are fixed-point-free is

# fixed-point-free bijections 1 𝑛 𝑛!
= (−1)𝑘 ⋅
total # bijections 𝑛! 𝑘=0 𝑘!
𝑛
1
= (−1)𝑘 ⋅ .
𝑘=0
𝑘!

It is interesting that 𝑛 no longer appears inside the sum; its only role is to determine how
many terms (𝑛 + 1, in fact) must be added up. Note also that these terms rapidly get
smaller and smaller, and furthermore they alternate in sign. You can see the structure
by writing the sum out:
1 1 1 1 1
1− + − + − ⋯ + (−1)𝑛 .
1! 2! 3! 4! 𝑛!
This is actually the first 𝑛 + 1 terms of the standard infinite series for 𝑒−1 , where as
usual 𝑒 = 2.71828 … is the base of natural logarithms. So, as 𝑛 → ∞, the proportion of
bijections on a set of size 𝑛 that are fixed-point-free converges to

𝑒−1 = 0.367989 … .

In other words, if you choose a bijection at random from all bijections on 𝑛 elements,
with all bijections equally likely, then the chance that it has no fixed points converges to
about 36.8% as 𝑛 → ∞. The convergence is rapid, so this proportion gives a very useful
approximation even for moderate-sized 𝑛.
At this point, it is worth revisiting Exercises 2.9 and 2.10.

8.5 SELECTiON

There are many situations where we want to choose 𝑟 objects from a set of 𝑛 objects.
For example:

(a) making a string of 𝑟 letters, each chosen from an alphabet of 𝑛 letters;

(b) forming a queue of 𝑟 people, from a crowd of 𝑛 people;

(c) requesting 𝑟 meals for an event, from a catering menu containing 𝑛 meal options;
294 C O U N T i N G & C O M B i N AT O R i C S

(d) choosing 𝑟 party guests, from among 𝑛 of your friends.

Counting the number of ways of making these selections depends on the specific nature
of the task.

• Does the order in which items are chosen matter?


– It does, for (a) and (b) above; but it doesn’t, for (c) and (d).

• Once an item is chosen, can we choose it again?


– We can, in (a) and (c) above; but we cannot, in (b) and (d).

So we have two distinctions to make:

• ordered selection (as in (a) and (b)) versus unordered selection (as in (c) and
(d));

• selection with replacement (as in (a) and (c)) versus selection without replacement
(as in (b) and (d)).

We consider each of these.

8.6 ORDERED SELECTiON WiTH REPLACEMENT

Given a set 𝐴 of size 𝑛, suppose we make a sequence of 𝑟 choices from 𝐴, where each
choice can be any member of 𝐴. So:

• Our selection is ordered.

• Choosing a member of 𝐴 does not stop it being chosen again later. In other words,
if a choice takes an element from 𝐴, we can think of that element as being replaced,
in 𝐴, by a copy of itself, so that the element is still available for future choices.
This is why we say that our selection is done with replacement.

In how many ways can we make such a sequence of choices?


We answered this question back in Chapter 1.
The number of sequences of 𝑛 elements of 𝐴 is just the size of the Cartesian product

𝐴 ×𝐴 ×⋯×𝐴.

𝑛 copies of 𝐴

This size is just 𝑛𝑟 , as we saw on p. 26 in § 1.14.


When 𝐴 is an alphabet of symbols, then we are essentially just counting strings of
a given length over 𝐴. We did this on p. 7 in § 1.5.
We have seen other examples of this too. In § 2.12, we applied this technique to
counting functions with a finite domain and codomain.
8.7 O R D E R E D S E L E C T i O N W i T H O U T R E P L A C E M E N T 295

8.7 ORDERED SELECTiON WiTHOUT REPLACEMENT

Suppose that we are again making a sequence of 𝑟 choices from 𝐴, but that now we
cannot repeat earlier choices. So:
• Our selection is ordered, as in § 8.6,

• Choosing an element of 𝐴 means that it is no longer available for subsequent


choices. In other words, if an element is chosen, it is not replaced. So we say that
our selection is done without replacement.
In how many ways can we do this?
We have done this before too, on p. 13, leading to (1.3) and (1.4). Let’s recap.
For our first choice, we can choose any member of 𝐴 at all. So we have 𝑛 options for
this. For our second choice, the first choice is no longer available to us, but every other
member of 𝐴 is still available. The number of options has gone down by 1, so it is 𝑛 − 1
instead of 𝑛. So the number of ways of making the first two choices is 𝑛 × (𝑛 − 1). For
the third choice, we can choose any member of 𝐴 except for the two we have already
chosen. So we have 𝑛 − 2 options.
The general pattern is:
• For each choice we make, all previous choices are now forbidden, but every other
member of 𝐴 is still available.

• When we come to the 𝑘-th choice, we have previously made 𝑘 − 1 choices, all of
which are now forbidden.

• For every choice we make, the number of available choices remaining is reduced
by 1.
Therefore, the number of ways of choosing 𝑟 objects, in order and without replacement,
from a set of 𝑛 objects is
𝑛
⋅ (𝑛 − 1) ⋅ ⋯ ⋅ (𝑛 − 𝑟 + 1) .
𝑟 factors

This is often called the falling factorial and denoted by (𝑛)𝑟 . It is also sometimes
written 𝑛 𝑃𝑟 or occasionally 𝑛 𝑃𝑟 .
In § 2.12, we applied this counting method to counting injections and bijections,
where domain and codomain are finite.

8.8 UNORDERED SELECTiON WiTHOUT REPLACEMENT

When doing ordered selection in § 8.6–§ 8.7, it was convenient to treat selection with
replacement first, because it was easier and because it “paved the way” for selection
without replacement. But, for unordered selection, it turns out that selection without
replacement is easier, so we discuss it first, in this section (albeit briefly, because we
296 C O U N T i N G & C O M B i N AT O R i C S

have done it before). Then, in the next section, we will use the ideas from this section
(and § 1.10) to help us count unordered selections with replacement.
In fact, we already know how to count unordered selections without replacement,
because we did it in § 1.10. We derived expressions involving factorials and binomial
coefficients in (1.5)and (1.6). The number of unordered selections — i.e., subsets — of
𝑟 elements that can be chosen, without replacement, from a set of 𝑛 elements is given
by the binomial coefficient ⒧𝑛𝑟⒭ which can be written in a few different ways:

𝑛 𝑛! (𝑛)𝑟
⒧ ⒭ = = .
𝑟 (𝑛 − 𝑟)! 𝑟! 𝑟!

Alternative notation for the binomial coefficient includes 𝑛 𝐶𝑟 or occasionally 𝑛 𝐶𝑟 .


Subsets chosen, from a set of size 𝑛, by unordered selection without replacement are
sometimes called combinations
combinations, but this term is also used more broadly too, for other
types of selections and also in less precise ways.

8.9 UNORDERED SELECTiON WiTH REPLACEMENT

When choosing without replacement (as in the previous section), each member of our
set 𝐴 of size 𝑛 can be chosen at most once. For convenience, suppose the objects in 𝐴
are numbered from 1 to 𝑛. Let 𝑥𝑖 be the number of times the 𝑖-th object in 𝐴 is chosen.
Then each 𝑥𝑖 ∈ {0, 1}, and we require that the total number of objects chosen is 𝑟:

𝑥1 + 𝑥2 + ⋯ + 𝑥𝑛 = 𝑟.

We saw in the previous section that the total number of choices is ⒧𝑛𝑟⒭.
Now we consider choosing with replacement. There is no longer any limit on how
many times we can choose a particular object in 𝐴, except for the overall requirement
that we make exactly 𝑟 choices. So each 𝑥𝑖 ≥ 0. The 𝑥𝑖 can be any nonnegative integer
subject to the same constraint we had earlier,

𝑥1 + 𝑥2 + ⋯ + 𝑥𝑛 = 𝑟. (8.12)

One way to view this is that we have 𝑟 1s added together,


𝑟 ones

1
+ ⋯ + 1 = 𝑟,

and we must group the ones into sums of sizes 𝑥1 , 𝑥2 , … , 𝑥𝑛 :

𝑟 ones altogether

+ ⋯ + 1 1
1 +⋯+1 .
+ ⋯ + 1 ⋯ 1
𝑥1 𝑥2 𝑥𝑛
8.9 U N O R D E R E D S E L E C T i O N W i T H R E P L A C E M E N T 297

We can specify the groups by placing barriers (shown as vertical lines) between them:

𝑟 ones altogether

1
+ ⋯ + 1 1
+ ⋯ + 1 ⋯ 1
+⋯+1 .
𝑥1 𝑥2 𝑥𝑛

The understanding here is that 𝑥1 is the sum of the 1s before the first barrier, 𝑥2 is the
sum of the 1s between the first and second barriers, and so on, with 𝑥𝑖 being the sum
of the 1s between the (𝑖 − 1)-th and 𝑖-th barriers (1 ≤ 𝑖 ≤ 𝑛 − 1) and 𝑥𝑛 being the sum
of the 1s after the (𝑛 − 1)-th barrier. Note that we have no 0-th barrier and no 𝑛-th
barrier; they are not needed.4 So the number of barriers is 𝑛 − 1, i.e., one less than the
number of groups.
For example, if 𝑛 = 5 and 𝑟 = 3, then we have three ones,
3 ones

1 + 1 + 1 = 3,

to be divided into five groups of sizes 𝑥1 , 𝑥2 , 𝑥3 , 𝑥4 , 𝑥5 . So each 𝑥𝑖 ≥ 0 and 𝑥1 + 𝑥2 + 𝑥3 +


𝑥4 + 𝑥5 = 3. One possibility is 𝑥1 = 2, 𝑥2 = 0, 𝑥3 = 0, 𝑥4 = 1, 𝑥5 = 0. We can specify this
using four barriers:
1+1 1
In principle, we could count unordered selections with replacement by counting the
number of ways of distributing 𝑛 −1 barriers in between the 𝑟 ones, noting that barriers
are allowed to be before the first 1 (which corresponds to 𝑥1 = 0) or after the last one
(for 𝑥𝑛 = 0), and we are allowed to have more than one barrier in between ones (which
happens when some 𝑥𝑖 = 0). But it is not yet clear that this different viewpoint gives
us a way forward. An issue we have to deal with is the possibility that multiple barriers
may be placed between some pairs of consecutive 1s.
We can deal with this by the following trick. Instead of wanting 𝑛 integers 𝑥1 , … , 𝑥𝑛
which are all ≥ 0 and add up to 𝑟, let us instead add 1 to all these integers and require
them to add up to 𝑟 + 𝑛. For each 𝑖, define 𝑦𝑖 = 𝑥𝑖 + 1. We have

𝑥1 + 𝑥 2 + ⋯ + 𝑥 𝑛 = 𝑟 ⟺ (𝑥1 + 1) + (𝑥2 + 1) + ⋯ + (𝑥𝑛 + 1) = 𝑟 + 𝑛


⟺ 𝑦1 + 𝑦2 +⋯+ 𝑦𝑛 = 𝑟 + 𝑛.

So we still have a simple equation specifying the required value of a sum of 𝑛 integers, but
now the integers are all required to be positive (instead of merely nonnegative) and their
sum is now 𝑟 + 𝑛 (instead of just 𝑟). Although the equation looks different, it is really
encoding the same information. There is a bijection between sequences (𝑥1 , 𝑥2 , … , 𝑥𝑛 ) of

4 If we used them, they would be before the start and after the end, respectively. There would be no flexibility
in where they are placed.
298 C O U N T i N G & C O M B i N AT O R i C S

nonnegative integers satisfying (8.12) and sequences (𝑦1 , 𝑦2 , … , 𝑦𝑛 ) of positive integers


satisfying
𝑦1 + 𝑦2 + ⋯ + 𝑦𝑛 = 𝑟 + 𝑛. (8.13)
Keep in mind the interpretation of the numbers 𝑥𝑖 , as the number of times the 𝑖-th object
is chosen. We will interpret the numbers 𝑦𝑖 similarly, but they have the constraint that
the 𝑖-th object is chosen at least once.
So, the number of unordered selections, with replacement, of 𝑟 objects from a set of
𝑛 objects, is the same as the number of unordered selections, with replacement, of 𝑟 + 𝑛
objects from a set of 𝑛 objects with the extra constraint that each object is now chosen
at least once.
We again depict our choice using barriers.

𝑟 + 𝑛 ones altogether

+ ⋯ + 1+1 1
1 + ⋯ + 1+1 ⋯ 1
+ ⋯ + 1+1 .
𝑦1 𝑦2 𝑦𝑛

For our earlier example (𝑛 = 5, 𝑟 = 3, (𝑥1 , 𝑥2 , 𝑥3 , 𝑥4 , 𝑥5 ) = (2, 0, 0, 1, 0)), this transforma-


tion now gives
1 + 1+1 1 1 1+1 1,
with 𝑟 + 𝑛 = 3 + 5 = 8 ones altogether, and the five new ones shown in blue.
A key observation here — which is actually the point of this trick — is that, between
any two adjacent ones in our line of 𝑟 + 𝑛 ones, there is at most one barrier. Further-
more, there is no barrier before the first one and after the last one. These observations
follow from the requirement that each 𝑦𝑖 ≥ 1.
Choosing positions for the barriers (which amounts to choosing the values for 𝑦1 , 𝑦2 , … , 𝑦𝑛 )
must be done to satisfy the following conditions.

• We have 𝑟 + 𝑛 − 1 positions to choose from.

• Each of these positions can be used at most once. So, if a position is chosen, it
cannot be chosen again. In other words, we are now choosing without replacement!

• We must choose exactly 𝑛 − 1 of these positions.

So we have transformed a problem of unordered selection with replacement to one of


unordered selection without replacement. This is a classic computer science problem-
solving method: transforming a new problem to one we already know how to solve, in
this case by constructing a suitable bijection.
So, all we have to do now is find the number of ways of choosing 𝑛 − 1 objects (in
this case, barrier positions) from a set of 𝑟 + 𝑛 − 1 available objects (in this case, all
possible positions between consecutive ones). By § 8.8, this is just

𝑟 +𝑛−1
⒧ ⒭.
𝑛−1
8.10 E X E R C i S E S 299

By the symmetry of binomial coefficients (see (1.2)), this is the same as

𝑟 +𝑛−1
⒧ ⒭.
𝑟

8.10 EXERCiSES

1. Consider the following algorithm.

for 𝑖 from 1 to 𝑎 do
for 𝑗 from 1 to 𝑏 do
for 𝑘 from 1 to 𝑐 do
for 𝑙 from 1 to 𝑑 do
beep!

for 𝑚 from 1 to 𝑒 do
for 𝑛 from 1 to 𝑓 do
beep!
for 𝑜 from 1 to 𝑔 do
beep!
for 𝑝 from 1 to ℎ do
beep!

(a) Give an expression in the positive integer variables 𝑎, 𝑏, 𝑐, 𝑑, 𝑒, 𝑓, 𝑔, ℎ for the number
of times this algorithm beeps.

(b) If each of these eight variables is 𝑂(𝑛), give a big-O expression for the number of
times the algorithm beeps.

2.
(a) In an Australian Federal Election, a ballot paper for a House of Representatives
seat has 𝑛 boxes, one for each candidate. A voter must enter the numbers 1 to 𝑛, in the
order of their preference, in those boxes, with exactly one number in each box. In how
many ways can this be done?
(b) In the Senate, the ballot paper again has one box for each of 𝑁 candidates, but this
time, voters are only required to enter numbers 1 to 12, for their twelve most preferred
candidates. In how many ways can this be done?

3. Two fair dice are thrown, and each shows a number from {1, 2, 3, 4, 5, 6}. The
outcome is the ordered pair of numbers shown.

(a) How many possible outcomes are there?


300 C O U N T i N G & C O M B i N AT O R i C S

(b) How many outcomes are there in which the number on the first die is less than the
number on the second die?

(c) How many outcomes are there in which the sum of the two numbers is 7?

(d) How many outcomes are there in which both numbers are ≤ 3?

(e) How many outcomes are there in which neither number is 5?

4. The Melbourne city centre grid is defined by fourteen main streets:


• five “horizontal” streets: Flinders St., Collins St., Bourke St., Lonsdale St., and La
Trobe St;

• nine “vertical” streets: Spencer St., King St., William St., Queen St., Elizabeth
St., Swanston St., Russell St., Exhibition St., and Spring St.
These divide the city centre into 4 × 8 = 32 square blocks.
Suppose you are at the corner of Flinders and Swanston Streets, having just emerged
from Flinders Street Station. You can’t make up your mind whether to visit Federation
Square, St Paul’s Cathedral or Young & Jackson’s Pub. So you decide to go for a walk,
always staying on one of these main streets, never turning back, and always going further
away from your starting point.

(a) How many routes are there, following these rules, from your starting point to the
corner of William and La Trobe Streets?5
Express your answer using one of the expressions we have used for counting, then
work out its numerical value.

(b) How many routes of this kind are there from your starting point which do not meet
Russell Street and which take you four block-lengths away? Give the total as well as the
number of routes to each intersection that lies at this distance from the start. Comment
on the relationship among these numbers and how they relate to some counting you did
earlier in semester.

(c) How many ways are there of walking from the south-west corner (Flinders and
Spencer Streets) to the north-east corner (Spring and La Trobe Streets) using a shortest
possible route?

5. Using only Basic Searches (with no quotation marks or special commands) and
the inclusion-exclusion principle, determine how many Monash Library catalogue entries
contain at least one of the three terms CSIRAC, SILLIAC and WREDAC.
5 While you are there, you can see Russell’s Old Corner Shop (now closed), which is described as Melbourne’s
oldest residential building, and then take a short walk in Flagstaff Gardens to see the site of Flagstaff
Observatory (1858).
8.10 E X E R C i S E S 301

6.
(a) Use the inclusion-exclusion principle to determine how many positive integers ≤ 100
are not multiples of 3, 5 or 7.
(b) Is there another simple way of working this out, that you can use to check your
answer?

7. Use the inclusion-exclusion principle to determine the number of surjections from


{𝑎, 𝑏, 𝑐, 𝑑, 𝑒, 𝑓} to {1, 2, 3, 4}.

8. Determine the numbers of 𝑛-digit positive integers that have their digits

(a) in ascending order;

(b) in descending order;

(c) in nondecreasing order;

(d) in nonincreasing order.

9. The card game poker uses a standard deck of 52 cards, divided into four suits
♠, ♣, ♢, ♡ of 13 cards each, with the cards within each suit having a rank

2, 3, 4, 5, 6, 7, 8, 9, 10, J, K, Q, A,

where J,Q,K,A stand for Jack, Queen, King, Ace, respectively. These designations may
all be regarded as numbers, so J,Q,K,A represent 11,12,13,14 respectively.
A unique property of the Ace is that it can also stand for 1. So it can be considered
to be the predecessor of 2 as well as the successor of K. But it can’t play these low and
high roles simultaneously, in the same hand!
A poker hand consists of a set of five cards from the deck. In an actual game, there
are multiple players, each with a hand dealt from the same deck. But in this exercise
we consider just a single hand, in isolation.
Five cards in a poker hand are consecutive if their ranks are in numerical sequence.
Here, the Ace can either be the first of five consecutive lowest cards, A,2,3,4,5, or the
last of the five consecutive highest cards, 10,J,Q,K,A. But there is no wrap-around,6 so,
for example, the five cards Q,K,A,2,3 are not consecutive. This is because, as mentioned
above, the Ace cannot play its low and high roles simultaneously.
How many poker hands are there of each of the following types:

(a) no restriction at all;

(b) a straight flush:


flush five consecutive cards in the same suit;

6 or as some card-players say, you can’t go “round the corner”.


302 C O U N T i N G & C O M B i N AT O R i C S

(c) four of a kind:


kind four of the five cards have the same rank (so they must all be in
different suits);

(d) full house:


house three cards of one rank, two of another;

(e) flush:
flush all five cards have the same suit, but their ranks are not all consecutive
(although some of them might be);

(f) straight:
straight five consecutive cards, not all in the same suit (although some of them
will be);

(g) three of a kind:


kind three cards of the same rank, with the other two cards being of
different ranks to those three and to each other;

(h) pair of pairs:


pairs two separate pairs of cards, with the cards in each pair having the
same rank but the two pairs having different ranks, and a fifth card that is of a
different rank to the others;

(i) one pair:


pair a single pair of cards of the same rank, with the other three cards having
different ranks to the pair and to each other;

(j) nothing:
nothing a hand of none of the special types (b)–(i) listed above, so its five cards
are all of different ranks, their ranks are not all consecutive, and the cards are not
all of the same suit.

10. How many strings of five digits are there in which

(a) there is no restriction whatsoever;

(b) leading 0s are not allowed;

(c) all digits are identical;

(d) all digits are different;

(e) every digit is different to its predecessor, but other repetitions are allowed;

(f) all digits are in strictly ascending order (so each digit is numerically < its successor);

(g) all digits are in nondecreasing order (so each digit is numerically ≤ its successor);

(h) the ordering of digits is monotonic, meaning that it’s either increasing or decreasing
(although not necessarily strictly so); in other words, no internal digit is greater than
both its neighbours, and no internal digit is less than both its neighbours;

(i) the digits are in arithmetic progression;

(j) the digits are in geometric progression;


8.10 E X E R C i S E S 303

(k) the digits add up to a multiple of 9;

(l) no digit appears in its own position (i.e., the first digit cannot be 1, the second digit
cannot be 2, and so on);

(m) in addition to the previous restriction (l), all the digits are different.

11. Specify formally the bijection given in § 8.9 …

• from the set of 𝑥-sequences of nonnegative integers satisfying the constraints given
there (including (8.12))

• to the set of 𝑦-sequences of positive integers satisfying the constraints given there
(including (8.13)).
9
DISCRETE PROBABILITY I

Uncertainty is a fact of life. We face it whenever we look beyond what we know. It


surrounds our attempts to understand the past and anticipate the future. It is inherent
in some natural processes. Complex systems — in machines, organisations or societies
— seem to generate it.
So, when we construct models based on the real world, we need to be able to model
uncertainty and to reason about it. This provides compelling motivation for the formal
study of uncertainty.
But there are other reasons, relating to computation itself.
When you first learn to program, computers are presented as deterministic, meaning
that everything about them at any given time is completely determined by knowing the
entire state of the computer (i.e., every piece of information in every input device and
every piece of memory) at the immediately previous instant of time. When you write a
program, each instruction specifies precisely how the information stored in the computer
is to change. But it turns out that access to random information — produced by some
unpredictable chance process — can be a powerful computational tool, and can help
some algorithms do some tasks more efficiently than seems possible otherwise. So the
study of uncertainty and randomness is important in algorithm design, as well as for
modelling uncertainty in the real world.
Another reason for studying uncertainty relates to algorithm analysis. Once we
have designed an algorithm, we want to know how it behaves on the range of inputs
it might have to deal with. For example, we have already mentioned upper and lower
bounds on the time an algorithm takes (p. 281, at the start of Chapter 8). An upper
bound gives a guarantee that your algorithm always runs at least that quickly, which is
important when making positive statements about what your algorithm can do. A lower
bound prevents you from being unreasonably optimistic about the time it needs, and
helps you direct your efforts away from fruitless quests for unattainable improvement.
But lower and upper bounds are, by definition, extremes; they don’t tell you what sort
of running time is typical, or how long it may take on average. If you want to analyse
the time your algorithm takes across a range of possible inputs, then you need to know
how likely the various inputs are, so that you can measure (or estimate, or bound) the
contribution the various inputs make to an average running time.

305
306 DiSCRETE PROBABiLiTY i

In discussing the importance of studying uncertainty and randomness for algorithm


design and analysis, we have focused on the time algorithms take. But the same issues
arise when studying other aspects of an algorithm’s behaviour, such as the amount of
memory it uses, or the quality of the solution it produces. And similar points apply to
designing and analysing the data structures that algorithms work with, as well as the
algorithms themselves.
The theory of probability provides the foundation for measuring uncertainty and
reasoning about it. In this chapter, we introduce probability theory and learn how to
use it. In the next chapter, we study random variables and probability distributions.

9.1𝛼 T H E N AT U R E O F R A N D O M N E S S

Suppose someone tosses a coin. You do not know, in advance, whether it will turn up
Heads or Tails, i.e., which face will be uppermost once it has landed.1 If the coin is fair
—meaning that it has no bias towards either outcome, so each is equally likely — then
the probability that it comes up Heads is 12 , and the probability that it comes up Tails
is 12 too. We can say that the event that the coin comes up Heads has probability 12 ,
which we write mathematically as
1
Pr(coin comes up Heads) = ,
2
and similarly,
1
Pr(coin comes up Tails) = .
2
These events take place in a setting where randomness is at play, which is why we
describe them using probabilities rather than just by propositions which can only be
true or false.
The nature and source of randomness is a surprisingly deep topic and a lot has
been written about it. Tossing a coin is often used as an introductory example of a
simple random experiment, but where does the randomness come from? During the
tossing process, it is subject to physical forces from the tossing hand, the movement of
the air through which it passes, and gravity. If we do sufficiently careful and precise
measurements of the coin, its environment and the forces acting upon it, it may be
possible in principle to determine, accurately enough, its movement during the period
of the toss until it comes to rest, and therefore to determine the outcome of the toss. Of
course, this is usually impractical. So, instead of treating it as a deterministic process
(which it seems to be, at least to the level of detail required to determine the outcome

1 Traditionally, Heads is the head of the monarch whose face would historically adorn one side of a coin;
Tails, being the opposite of Heads, indicates the other side of the coin. Not many coins would have actual
“tails” depicted on the other side. One exception is the Australian penny from 1938 to the introduction of
decimal currency in 1966, which had a kangaroo, including its long tail, on the other side. So Tails could
be interpreted more literally in those days.
9.2𝛼 P R O B A B i L i T Y 307

of the toss with high confidence), we say that it is too complex to treat deterministically
and model it instead as a simple random process.
This illustrates one common source of randomness: processes that are actually de-
terministic, but for which deterministic models are infeasible. You might say that ran-
domness is a cloak for our computational shortcomings!
That is not to say that all physical processes that we treat as random are really
just overly complex deterministic processes. In some physical processes, we have no
deterministic concept (however complex) to explain them, so we treat them as random.
Again, randomness might be a cloak, but for our ignorance rather than our computa-
tional shortcomings. Or, for some physical processes, we might think that they really
are random in some deep fundamental sense. Quantum mechanics is often interpreted
as treating the measurement of physical systems as inherently random.
In other settings, the role of randomness in expressing our ignorance is very natural.
For example, a coin might be lying on the ground, some distance away, and you cannot
see whether it shows Heads or Tails. There is no random experiment here; either the
coin shows Heads, or it shows Tails, and someone close to it may know which one it is,
but you don’t know yet. So you may model your knowledge of the state of the coin by
saying that its probability of showing Heads is 12 .
We will focus on randomness as an aspect of processes that produce outcomes we
are interested in. For such processes, we will often describe the outcomes they produce
as random, too.

9.2𝛼 PROBABiLiTY

Informally, the probability of something is a real number in the unit interval [0,1] that
measures how likely it is, where something that’s impossible has probability 0 and some-
thing that’s certain has probability 1.
Probability is usually considered in the context of an experiment. Here, an experiment
is a process that is at least partly random and can give rise to any outcome from some
set of all possible outcomes. The set of all possible outcomes is called the sample space.
space
We can view this as just a universal set.
An event is just a subset of the sample space.
The simplest situation in which we can define probability is when the sample space
𝑈 is finite and all its elements are equally likely. In this scenario, the probability Pr(𝐴)
of an event 𝐴 ⊆ 𝑈 is given by
|𝐴|
Pr(𝐴) = . (9.1)
|𝑈|
So the probability of 𝐴 is just the proportion of members of 𝑈 that belong to 𝐴. If
you choose a member of 𝑈 at random, with an equal chance of choosing each one, then
Pr(𝐴) measures how likely your chosen element is to belong to 𝐴.
308 DiSCRETE PROBABiLiTY i

How do you work this out? You just need to work out the sizes of the sets. To do
this, you can use all the techniques of counting that we have discussed so far, and many
others. The study of probability is intimately related to the study of counting.
We have two extreme special cases of the definition (9.1):

Pr(∅) = 0, Pr(𝑈) = 1.

To get a handle on the probability of an event, it is natural to try an experimental


approach. If you repeatedly choose members of 𝑈, with each choice having no influence
at all on subsequent choices, then you can count the number of times your chosen element
is in 𝐴. Intuitively, you would anticipate that

number of choices that belong to 𝐴


Pr(𝐴) ≈ . (9.2)
total number of choices
This seems plausible, but it is not guaranteed that the approximation is close. So, ratios
of empirical counts like (9.2) do not define probability; they merely estimate it, and
these estimates may or may not be good ones, depending on the number of choices you
make and how chance plays out as you make your choices. As the number of choices
increases, you would intuitively expect the approximation in (9.2) to get better and
better. This belief can be made precise and then proved. In fact,

# choices that belong to 𝐴


Pr(𝐴) = lim ⒧ ⒭. (9.3)
total # choices → ∞ total # choices

We will discuss the estimation of probabilities later, but now return to defining and
exactly calculating them.
We illustrate the definitions with some familiar examples.
Suppose our random experiment is the toss of a fair coin. The two possible outcomes
are Heads and Tails, so our sample space is

𝑈 = {Heads, Tails}.

The possible events here are the various subsets of 𝑈:

∅, {Heads}, {Tails}, {Heads, Tails}.


9.2𝛼 P R O B A B i L i T Y 309

Their probabilities are:

Pr(∅) = 0,
|{Heads}| 1
Pr({Heads}) = = ,
|𝑈| 2
1
Pr({Tails}) = (similarly),
2
2
Pr({Heads, Tails}) = = 1.
2
The throw of a fair die gives six equally likely outcomes, with sample space

𝑈 = {1, 2, 3, 4, 5, 6}.

We can calculate that, for example,


1
Pr({5}) = ,
6
3 1
Pr(outcome is odd) = Pr({1, 3, 5}) = = ,
6 2
2 1
Pr(outcome is a square) = Pr({1, 4}) = = ,
6 3
5
Pr(outcome is not 1) = Pr({2, 3, 4, 5, 6}) = .
6
So far, we have considered probability in a context where the sample space is finite
and every element of it is equally likely. Using our definition (9.1), an event consisting
of a single element 𝑥 ∈ 𝑈 has probability

|{𝑥}| 1
Pr({𝑥}) = = .
|𝑈| |𝑈|

So we can assign to each element 𝑥 a probability of 1/|𝑈|,

1
Pr(𝑥) = .
|𝑈|

and calculate the probability of any event 𝐴 as the sum of the probabilities of its
elements:
Pr(𝐴) =  Pr(𝑥). (9.4)
𝑥∈𝐴

In our current scenario, where every element is equally likely and has probability 1/|𝑈|,
this equation (9.4) agrees with our earlier definition (9.1). But our new equation (9.4) is
more general. We can now deal with situations where the elements of the sample space
need not all have the same probability.
310 DiSCRETE PROBABiLiTY i

We suppose, then, that the elements of the sample space each have a probability,
which must be a number in the unit interval [0,1], and that these probabilities add to 1.
The probability of an element 𝑥 ∈ 𝑈 is denoted by Pr(𝑥). We require that, for all 𝑥 ∈ 𝑈,

0 ≤ Pr(𝑥) ≤ 1,

and our constraint on the sum of these probabilities may be written

 Pr(𝑥) = 1. (9.5)
𝑥∈𝑈

We emphasise that this sum is over every element in the entire sample space, however
large or small that sample space may be.
We now define the probability Pr(𝐴) of any event 𝐴 ⊆ 𝑈 to be

Pr(𝐴) =  Pr(𝑥). (9.6)


𝑥∈𝐴

This is more general than our previous definition of probability, which only applies to
finite sample spaces with all elements having the same probability. We have, again, the
extreme cases
Pr(∅) = 0, Pr(𝑈) = 1.
For example, in the board game Scrabble2 , there are 100 tiles each with an English
letter, except that two tiles are blank, and these tiles are all in a bag so that players
can choose them at random. Suppose, at the start of the game, you choose a letter
by drawing a random tile from the bag. The numbers of tiles of each type gives the
following probabilities for the letters (or blank, denoted □).

lettter prob. letter prob. letter prob.


A 0.09 J 0.01 S 0.04
B 0.02 K 0.01 T 0.06
C 0.02 L 0.04 U 0.04
D 0.04 M 0.02 V 0.02
E 0.12 N 0.06 W 0.02
F 0.02 O 0.08 X 0.01
G 0.03 P 0.02 Y 0.02
H 0.02 Q 0.01 Z 0.01
I 0.09 R 0.06 □ 0.02

2 Scrabble is one of the most popular word games in the world, and is good for improving spelling and
vocabulary. The letter frequencies in the game are based on an empirical count from text in some particular
issues of newspapers when the game was designed, although the sample used may not have been huge.
9.2𝛼 P R O B A B i L i T Y 311

The sample space can be taken to be the set of all English letters together with the
blank, so has size 27, but this time the probabilities of its members are not all the same.
Applying our definition (9.6) gives, for example,

Pr(vowel) = Pr({A,E,I,O,U})
= Pr(A) + Pr(E) + Pr(I) + Pr(O) + Pr(U)
= 0.09 + 0.12 + 0.09 + 0.08 + 0.04
= 0.42

In fact, not only does our new definition (9.6) deal with finite sample spaces where
the probabilities are not all the same, but it can also deal with some infinite sample
spaces. Suppose our sample space is ℕ and the probability of an element 𝑛 ∈ ℕ is given
by
1
Pr(𝑛) = 𝑛 .
2
For this to be valid, we need the probabilities of all the elements in the sample space to
add up to 1, so that (9.5) is satisfied. In this case, we have

1 1 1 1 1 1 1 1 1
 Pr(𝑥) =  Pr(𝑛) =  𝑛
= + 2 + 3 + 4 +⋯ = + + + +⋯
𝑥∈𝑈 𝑛∈ℕ 𝑛∈ℕ
2 2 2 2 2 2 4 8 16

1
But this is just the sum of an infinite geometric series, with first term 𝑎 = 2 and common
ration 𝑟 = 12 . So, by (6.45), the sum is
1 1
𝑎 2 2
= = = 1.
1−𝑟 1 − 12 1
2

So all the probabilities do indeed add up to 1, and (9.5) is satisfied.


312 DiSCRETE PROBABiLiTY i

We can calculate the probabilities of events using (9.6). For example,

Pr(a randomly chosen number is odd)


= Pr({odd numbers})

1
=  (using the fact that the odd numbers are {2𝑘 − 1 ∶ 𝑘 ∈ ℕ})
𝑘=1
22𝑘−1

1
=  1 2𝑘
𝑘=1 2 2

2 1
=  (which is now the sum of a geometric series with 𝑎 = 2 and 𝑟 = 14 )
𝑘=1
4𝑘
1
2
= (by (6.45))
1 − 14
1
2
= 3
4
2
= .
3
This should not be interpreted as saying that two-thirds of positive integers are odd!
It is just saying that, under the particular probabilities we have assigned to positive
integers, the probability of a randomly chosen positive integer being odd is 32 . Assigning
different probabilities to the numbers could give a very different probability of oddness.
Here is another example.

Pr(a randomly chosen number is ≤ 10) = Pr({1, 2, 3, 4, 5, 6, 7, 8, 9, 10})


1
= 
𝑛∈{1,2,…,10}
2𝑛
10
1
=  𝑛
𝑛=1
2
1 1 1
= + + ⋯ + 10 .
2 22 2
This is the sum of a finite geometric series, with ten terms and the same 𝑎 and 𝑟 as
before. So, using (6.44), the sum is

1 − 𝑟 10 1 1 − ( 12 )10 1 1 − ( 12 )10 1 1 1023


𝑎⒧ ⒭ = ⒧ ⒭ = ⒧ ⒭ = 1− 10 = 1− = ≃ 0.999.
1−𝑟 2 1 − 12 2 1 − 12 2 1024 1024

The probabilities we have assigned to the positive integers in this example are far
from uniform. Can you imagine giving all positive integers the same probability? If so,
9.3𝛼 C H O i C E O F S A M P L E S PA C E 313

what would that probability be? Would the probabilities sum to 1, as required? If not,
we cannot really call them probabilities.
If it’s too tricky to assign the same probability to all the positive integers, how
close can we get? Suppose you want to give a decreasing sequence of probabilities to
the positive integers such that they all sum to 1. What is the most slowly-decreasing
sequence you can come up with that does this, while still giving every positive integer
a positive probability?

9.3𝛼 C H O i C E O F S A M P L E S PA C E

We have seen in the previous section that, to define probabilities of events, we need to
have a sample space. Each element of the sample space must have a defined probability,
with those probabilities summing to 1. Events correspond to subsets of the sample space,
and the probability of an event is just the sum of the probabilities of its elements.
Choice of sample space is therefore fundamental. You need it to be an accurate
model of the situation you are studying, so that the probabilities of events tell you
about their likelihood in that situation.
Suppose you are playing Monopoly, where two dice are thrown and the numbers
they show are added to give the number of steps you take on that move. So the length
of your move is an integer in {2,3,…,12}. It is tempting to make this set the sample
space. But what probabilities should we assign to its elements?
With some thought or experimentation, it soon becomes clear that the elements of
{2,3,…,12} are not all equally likely, so we should not just give them each a probability
of 1/11. (Doing so would define a sample space that is valid in itself, in that it satisifes
the definition of a sample space. But the probabilities calculated from it do not align
with the actual probabilities of the various totals obtained when throwing two dice. So
these uniform probabilities are incorrect, as a model of this situation.)
To determine what the correct probabilities should be, we need to go deeper. Al-
though the data we are interested in is just the total of the numbers shown on the two
dice, the random process of throwing two dice gives a larger set of outcomes, namely a
number from {1, 2, 3, 4, 5, 6} on each die. So the full set of outcomes, from throwing two
dice, is the Cartesian product

{ 1, 2, 3, 4, 5, 6 } × { 1,
 2, 3, 4, 5, 6 },

possible results possible results
from first die from second die

which is the set of all pairs of results, one from the first die and another from the second
die. It helps to visualise these outcomes in a 6 × 6 table:
314 DiSCRETE PROBABiLiTY i

(1, 1) (1, 2) (1, 3) (1, 4) (1, 5) (1, 6)

(2, 1) (2, 2) (2, 3) (2, 4) (2, 5) (2, 6)

(3, 1) (3, 2) (3, 3) (3, 4) (3, 5) (3, 6)

(4, 1) (4, 2) (4, 3) (4, 4) (4, 5) (4, 6)

(5, 1) (5, 2) (5, 3) (5, 4) (5, 5) (5, 6)

(6, 1) (6, 2) (6, 3) (6, 4) (6, 5) (6, 6)

We assume throughout that both dice are fair, i.e., each of the six outcomes of each
die is equally likely and has probability 1/6. We also assume throughout that the two
dice are not linked in any way; the outcome of one does not influence the outcome of
the other. It follows that all the pairs of outcomes are equally likely too, and since there
are 6 × 6 = 36 pairs of outcomes, they must have probability 1/36 each.
This gives us a more fine-grained sample space for the throw of two dice, with 36
elements (instead of just 11 for the possible totals), and it now has uniform probabilities.
(Uniformity is not an essential feature for probabilities of elements of sample spaces, but
we like it when it happens, as it makes life easier, provided it yields an accurate model.)
Furthermore, we can now calculate probabilities for the total of two dice. For exam-
ple, what is the probability that the total is 8? This corresponds to the following subset
of our new sample space, consisting of all pairs whose sum is 8:

{(2, 6), (3, 5), (4, 4), (5, 3), (6, 2)}.

Since this subset has five elements, and since each element has probability 1/36, we have

1 1 1 1 1 5
Pr(total is 8) = + + + + = .
36 36 36 36 36 36
We can do a similar calculation for every possible outcome in the set {2,3,…,12}.
It is helpful to envisage this event on our diagram of the sample space. The elements
of this event are circles in green in the next diagram:
9.3𝛼 C H O i C E O F S A M P L E S PA C E 315

(1, 1) (1, 2) (1, 3) (1, 4) (1, 5) (1, 6)

(2, 1) (2, 2) (2, 3) (2, 4) (2, 5) (2, 6)

(3, 1) (3, 2) (3, 3) (3, 4) (3, 5) (3, 6)

(4, 1) (4, 2) (4, 3) (4, 4) (4, 5) (4, 6)

(5, 1) (5, 2) (5, 3) (5, 4) (5, 5) (5, 6)

(6, 1) (6, 2) (6, 3) (6, 4) (6, 5) (6, 6)

We see that this event, where the total of the two dice is 8, is a diagonal of five pairs.
In fact, all the other possible totals correspond to diagonals parallel to this one. We
can see that a total of 2 has only one element, so its probability is 1/36, and that as
the totals increase, their probabilities increase too, with the probability increasing by
1/36 for each increment of the total until the total is 7, which corresponds to the longest
diagonal and maximises the probability:
6 1
Pr(total is 7) = = .
36 6
Then, as the total keeps increasing, the probability decreases at the same rate until we
reach the highest possible total, 12, with a probability of 1/36.

total of two dice 2 3 4 5 6 7 8 9 10 11 12

1 2 3 4 5 6 5 4 3 2 1
probability
36 36 36 36 36 36 36 36 36 36 36

Once we have determined these probabilities of the totals 2,3,…,12, we can then use
them along with the smaller sample space {2, 3, … , 12} to compute probabilities of events
pertaining solely to values of the total of the two dice (e.g., whether the total is 8, or
even, or prime, or ≤ 5, etc.). For example,

1 2 3 4 10 5
Pr(total is ≤ 5) = + + + = = . (9.7)
36 36 36 36 36 12
But keep in mind that:

• To work out the probabilities of the totals in the first place, we needed the larger
sample space, consisting of the 36 pairs with uniform probabilities.
316 DiSCRETE PROBABiLiTY i

• If we want to determine the probabilities of other events relating to the throw of


two dice, we will need the larger sample space we have introduced. For example,
in Monopoly there are specific consequences (including possible imprisonment!)
when the two dice show the same number. This is not catered for by a sample
space based only on the sum of the two numbers. But, using our larger sample
space, we see that

Pr(the two dice show the same number) = Pr({(1, 1), (2, 2), (3, 3), (4, 4), (5, 5), (6, 6)})
6
=
36
1
= .
6

This example illustrates a more general point. While a sample space with nonuniform
probabilities is ok in principle, and indeed natural in many situations, there is often
a larger sample with uniform probabilities that underlies it in some way. Uniform
probabilities can be simpler to deal with than nonuniform ones, so a sample space with
uniform probabilities may make some probability calculations easier even if the sample
space itself is larger.
Having said that, our first priority is to use a sample space and associated probabil-
ities that model the situation at hand as accurately as possible. So we will not go for
uniform probabilities if they remove us from the reality we are trying to model.
Our earlier Scrabble example is another case where a larger sample space, with
uniform probabilities, could be used. In that example, the sample space we used was
the set of all English letters together with the blank, making 27 elements in all, and
we used nonuniform probabilities based on the numbers of tiles of each type. However,
the individual tiles themselves can be treated as elements of a sample space of size 100,
each with probability 0.01. To compute the probability of a vowel, we count up all tiles
bearing a vowel, of which there are 42 (9 × A, 12 × E, 9 × I, 8 × O, 4 × U), and divide this
by the total number of tiles, 100, to obtain

|{tiles with a vowel}| 42


Pr(vowel) = = = 0.42,
|{all tiles}| 100

using (9.1). This particular calculation may or may not seem easier than the one we did
earlier. But the key point here is that the larger sample space underpins the one we used
earlier. The probabilities in our table, such as 0.09 for A and so on, were derived from
the observation that we had 100 equally likely tiles, of which nine were A. So, in effect,
we used the larger sample space (of individual tiles rather than letters) to work out,
and justify, the probabilities we assigned to the letters in the smaller sample space. So,
again, it is the larger sample space (with its uniform probabilities) which is “what’s really
going on”; the smaller sample space summarises those aspects of the larger space that
9.4𝛼 M U T U A L LY E X C L U S i V E E V E N T S 317

are relevant to the problem at hand, and its probabilities are obtained by calculations
from the larger sample space.

9.4𝛼 M U T U A L LY E X C L U S i V E E V E N T S

Two events are mutually exclusive if they are disjoint.


Since events are just subsets of the sample space, this definition just says that two
events are disjoint if their sets are disjoint. So, two events 𝐴 and 𝐵 are mutually
exclusive if and only if 𝐴 ∩ 𝐵 = ∅.
This means that, if one of the events occurs, then the other cannot possibly occur.
Maybe neither of them occurs. But they can’t both occur.
If 𝐴 and 𝐵 are mutually exclusive, then the probability of their union is just the
sum of their separate probabilities:

𝐴 ∩𝐵 = ∅ ⟹ Pr(𝐴 ∪ 𝐵) = Pr(𝐴) + Pr(𝐵), (9.8)

or, in other words,


Pr(𝐴 ⊔ 𝐵) = Pr(𝐴) + Pr(𝐵), (9.9)
since 𝐴 ⊔ 𝐵 is only defined if the events are disjoint. This follows from:

Pr(𝐴 ⊔ 𝐵) =  Pr(𝑥) (by definition of probability, (9.4))


𝑥∈𝐴⊔𝐵

= ⒧ Pr(𝑥)⒭ + ⒧ Pr(𝑥)⒭
𝑥∈𝐴 𝑥∈𝐵
(since each 𝑥 ∈ 𝐴 ⊔ 𝐵 belongs to exactly one of 𝐴 or 𝐵)
= Pr(𝐴) + Pr(𝐵)
(again using the definition of probability, twice).

For example, if we throw two dice and ask about the probability that their sum is
≤ 5 or ≥ 10 (perhaps because we are playing monopoly and want to avoid landing on a
318 DiSCRETE PROBABiLiTY i

dangerous block of four consecutive squares). We already know that Pr(total is ≤ 5) =


10/36 (see (9.7)) and can similarly work out that Pr(total ≥ 10) = 6/36. Then

Pr(total is ≤ 5 or ≥ 10) = Pr((total is ≤ 5) ∪ (total is ≥ 10))


= Pr(total is ≤ 5) + Pr(total is ≥ 10)
(by (9.8), since the events are disjoint)
10 6
= +
36 36
16
=
36
4
= .
9
Our expression (9.8) for the probability of the disjoint union of two sets extends to
disjoint unions of any number of sets.
Sets 𝐴1 , 𝐴2 , … , 𝐴𝑛 are mutually disjoint,
disjoint or pairwise disjoint,
disjoint if every pair of
them is disjoint, i.e., for all 𝑖, 𝑗 ∈ {1, 2, … , 𝑛} we have 𝐴𝑖 ∩ 𝐴𝑗 = ∅.
Let 𝑛 ∈ ℕ and let 𝐴1 , 𝐴2 , … , 𝐴𝑛 be mutually disjoint sets. Then

Pr(𝐴1 ⊔ 𝐴2 ⊔ ⋯ ⊔ 𝐴𝑛 ) = Pr(𝐴1 ) + Pr(𝐴2 ) + ⋯ + Pr(𝐴𝑛 ). (9.10)

This important general principle may be captured in words by saying that the probability
of any disjoint union is the sum of the probabilities of the events. More succinctly,
probability is additive over disjoint unions.
Decomposing events into disjoint unions of simpler events is a very powerful tool in
probability. In fact, we have come very close to using it already. When calculating the
probability that a random Scrabble tile is a vowel using the sample space of all 100 tiles
each with probability 0.01 (p. 316 in § 9.3𝛼 ), we used our knowledge of the numbers
of tiles bearing each vowel. Converting these to probabilities, and expressing the event
“the tile is a vowel” as a disjoint union of simpler events, we have

Pr(tile is vowel) = Pr({vowel tiles})


= Pr({A-tiles} ⊔ {E-tiles} ⊔ {I-tiles} ⊔ {O-tiles} ⊔ {U-tiles})
= Pr({A-tiles}) + Pr({E-tiles}) + Pr({I-tiles}) + Pr({O-tiles}) + Pr({U-tiles})
(by (9.10))
= 0.09 + 0.12 + 0.09 + 0.08 + 0.04
= 0.42.

The very definition of probability may be viewed in terms of the additivity of prob-
ability over disjoint unions. An event is just the disjoint union of all the singleton sets
9.5 O P E R AT i O N S O N E V E N T S 319

(i.e., sets of one element) obtained from the elements of the event, and its probability is
just the sum of the probabilities of those singleton sets.3
Later we will see more applications of the fact that probability is additive over
disjoint unions.

9.5 O P E R AT i O N S O N E V E N T S

Since events are just subsets of the sample space, we can apply set operations to them
to form other events. We have already seen the disjoint union in the previous section.
The complement 𝐴 of an event 𝐴 occurs precisely when 𝐴 does not occur. So 𝐴 and
its complement 𝐴 are mutually exclusive, and their union is the entire sample space. So

Pr(𝐴 ⊔ 𝐴) =  Pr(𝑥) (by definition of probability)


𝑥∈𝐴⊔𝐴

=  Pr(𝑥) (by definition of set complementation)


𝑥∈𝑈
= 1, (by (9.5)),

in keeping with the certainty that either 𝐴 or 𝐴 occurs.


We can also apply our previous expression for the probability of mutually exclusive
events, (9.8), to 𝐴 and 𝐴. This gives

Pr(𝐴 ⊔ 𝐴) = Pr(𝐴) + Pr(𝐴).

Equating these two expressions for Pr(𝐴 ⊔ 𝐴), we have

Pr(𝐴) + Pr(𝐴) = 1.

Therefore
Pr(𝐴) = 1 − Pr(𝐴). (9.11)
Examples:

• Some children play a pencil-and-paper “dice cricket” game where each throw of a
fair die gives the number of runs scored against one delivery, except that 5 is out.
So the probability of not being out, from one specific delivery (i.e., one throw of
the die), is
1 5
Pr(not out) = 1 − Pr(out) = 1 − =
6 6

3 Strictly speaking, the probability of a set is the sum of the probabilities of its individual elements; it is the
probabilities of the individual elements that are the starting point, and it is from them that the probabilities
of events are defined in (9.6). But, of course, the probability of a singleton set equals the probability of
its sole element. So it is true that the probability of an event is just the sum of the probabilities of all the
singleton events (i.e., singleton sets) inside it.
320 DiSCRETE PROBABiLiTY i

• In drawing one of the 100 letter tiles in Scrabble, the probability of drawing a
blank tile is 2/100, so

2 98
Pr(not drawing a blank) = 1 − Pr(drawing a blank) = 1 − = = 0.98.
100 100

More generally, suppose 𝐴 ⊆ 𝐵 and consider the set difference 𝐵 ∖ 𝐴. In this case,
𝐵 = 𝐴 ⊔ (𝐵 ∖ 𝐴). Therefore

Pr(𝐵) = Pr(𝐴 ⊔ (𝐵 ∖ 𝐴)) = Pr(𝐴) + Pr(𝐵 ∖ 𝐴). (9.12)

So we have
𝐴⊆𝐵 ⟹ Pr(𝐵 ∖ 𝐴) = Pr(𝐵) − Pr(𝐴). (9.13)
It also follows from (9.12) that

𝐴⊆𝐵 ⟹ Pr(𝐴) ≤ Pr(𝐵). (9.14)

This could be paraphrased as saying that you cannot make something less likely by
creating more ways for it to happen. Not surprising, but good to know!

Example:
In drawing our first Scrabble tile, what is the probability that we get a letter tile
(not a blank) that has a consonant? We already know that the probability of a vowel
is 0.42 and the probability of a non-blank tile is 0.98. Define events 𝐴 and 𝐵 for these
two occurrences:
𝐴: the tile we draw is a vowel
𝐵: the tile we draw is a letter (not a blank).
Then 𝐴 ⊆ 𝐵, and 𝐵 ∖ 𝐴 is the event whose probability we seek:
𝐵 ∖ 𝐴: the tile we draw is a consonant.
Since 𝐴 ⊆ 𝐵, we have

Pr(consonant) = Pr(𝐵 ∖ 𝐴) = Pr(𝐵) − Pr(𝐴) = 0.98 − 0.42 = 0.56,

where the second equality here is by (9.13).


When 𝐴 ⊈ 𝐵, the expression (9.13) no longer applies. But we still have 𝐴 ∩ 𝐵 ⊆ 𝐵,
so we can still say that

Pr(𝐵 ∖ 𝐴) = Pr(𝐵) − Pr(𝐴 ∩ 𝐵). (9.15)

Our earlier expression (9.13) for the situation 𝐴 ⊆ 𝐵 is a special case of this, since 𝐴 ⊆ 𝐵
implies 𝐴 = 𝐴 ∩ 𝐵.
This raises the general question of how to determine Pr(𝐴 ∩ 𝐵). This is intimately
related to determining Pr(𝐴 ∪ 𝐵). We consider these probabilities now, in the general
9.5 O P E R AT i O N S O N E V E N T S 321

situation where events 𝐴 and 𝐵 that are not necessarily mutually exclusive, so their sets
are not necessarily disjoint, and neither is necessarily a subset of the other.
Consider Pr(𝐴 ∪ 𝐵). The union 𝐴 ∪ 𝐵 is not necessarily a disjoint union of 𝐴 and
𝐵; the sets may intersect, and their intersection 𝐴 ∩ 𝐵 may be small or large, and its
probability matters here. Nonetheless, we can still find a way to express 𝐴 ∪ 𝐵 as a
disjoint union of certain of its subsets. See if you can work out how to do this, and how
to use this to work out Pr(𝐴 ∪ 𝐵). Before showing how it’s done, we pause to reflect on
the methods we have been using so far in this section.
In working out probabilities, it often helps to decompose events into a disjoint union
of simpler, mutually exclusive events, as we discussed on p. 318 in § 9.4𝛼 . In terms of
sets, we are just expressing a set as a partition of simpler sets. We have seen further
instances of this already in this section:
• We expressed the sample space as the disjoint union of 𝐴 and 𝐴 in order to work
out the relationship between the probabilities of 𝐴 and 𝐴 in (9.11).
• When 𝐴 ⊆ 𝐵, we expressed 𝐵 as the disjoint union of 𝐴 and 𝐵 ∖ 𝐴 in (9.12), in
order to derive an expression for Pr(𝐵 ∖ 𝐴) in (9.13).
So, let us continue in this vein and consider Pr(𝐴 ∪ 𝐵). Since

𝐴 ∪ 𝐵 = (𝐴 ∖ 𝐵) ⊔ (𝐴 ∩ 𝐵) ⊔ (𝐵 ∖ 𝐴),

we have
Pr(𝐴 ∪ 𝐵) = Pr(𝐴 ∖ 𝐵) + Pr(𝐴 ∩ 𝐵) + Pr(𝐵 ∖ 𝐴). (9.16)
Now,
Pr(𝐴) = Pr(𝐴 ∖ 𝐵) + Pr(𝐴 ∩ 𝐵)
Pr(𝐵) = Pr(𝐴 ∩ 𝐵) + Pr(𝐵 ∖ 𝐴).
Adding these two equations, we obtain

𝑃𝑟(𝐴) + 𝑃𝑟(𝐵) = Pr(𝐴 ∖ 𝐵) + 2 Pr(𝐴 ∩ 𝐵) + Pr(𝐵 ∖ 𝐴).

Combining this with (9.16) gives

Pr(𝐴 ∪ 𝐵) = Pr(𝐴) + Pr(𝐵) − Pr(𝐴 ∩ 𝐵). (9.17)

Rearranging, we also have

Pr(𝐴 ∩ 𝐵) = Pr(𝐴) + Pr(𝐵) − Pr(𝐴 ∪ 𝐵). (9.18)

So, if we already know the probabilities of events 𝐴 and 𝐵, then we need to know the
probability of one of the union and intersection, in order to be able to work out the
probability of the other.
These equations (9.17) and (9.18) are reminiscent of the relationship between the
sizes of two sets and their union and intersection: see (1.13) and Exercises 1.9 and 1.10.
322 DiSCRETE PROBABiLiTY i

In fact, if we take our expressions from those exercises and divide each side by the size
of the universal set (i.e., in this context, the sample space), then we have

|𝐴 ∪ 𝐵| |𝐴| |𝐵| |𝐴 ∩ 𝐵|
= + − , (9.19)
|𝑈| |𝑈| |𝑈| |𝑈|
|𝐴 ∩ 𝐵| |𝐴| |𝐵| |𝐴 ∪ 𝐵|
= + − . (9.20)
|𝑈| |𝑈| |𝑈| |𝑈|

Each quotient here is just the probability of the set shown in the numerator in the spe-
cial case when all elements of the sample space are equally likely. So, really, (9.19) and
(9.20) are just special cases of (9.17) and (9.18), respectively.

When working out Pr(𝐴 ∩ 𝐵), it can sometimes help to partition one of the events,
say 𝐵, into a disjoint union of other events, say 𝐵1 , 𝐵2 , … , 𝐵𝑛 :

𝐵 = 𝐵1 ⊔ 𝐵2 ⊔ … ⊔ 𝐵𝑛 .

Observe that
𝐴 ∩ (𝐵1 ⊔ 𝐵2 ) = (𝐴 ∩ 𝐵1 ) ⊔ (𝐴 ∩ 𝐵2 ).
This is just an application of the Distributive Law together with the observation that
𝐴 ∩ 𝐵1 and 𝐴 ∩ 𝐵2 are disjoint (since they are subsets of the disjoint sets 𝐵1 and 𝐵2 ,
respectively).
For three events, we have

𝐴 ∩ (𝐵1 ⊔ 𝐵2 ⊔ 𝐵3 ) = (𝐴 ∩ 𝐵1 ) ⊔ (𝐴 ∩ 𝐵2 ) ⊔ (𝐴 ∩ 𝐵3 ).

See Figure 9.1.


This observation generalises to 𝑛 mutually exclusive events:

𝐴 ∩ (𝐵1 ⊔ 𝐵2 ⊔ ⋯ ⊔ 𝐵𝑛 ) = (𝐴 ∩ 𝐵1 ) ⊔ (𝐴 ∩ 𝐵2 ) ⊔ ⋯ ⊔ (𝐴 ∩ 𝐵𝑛 ).

Therefore,

Pr(𝐴 ∩ 𝐵) = Pr(𝐴 ∩ (𝐵1 ⊔ 𝐵2 ⊔ ⋯ ⊔ 𝐵𝑛 ))


= Pr((𝐴 ∩ 𝐵1 ) ⊔ (𝐴 ∩ 𝐵2 ) ⊔ ⋯ ⊔ (𝐴 ∩ 𝐵𝑛 ))
= Pr(𝐴 ∩ 𝐵1 ) + Pr(𝐴 ∩ 𝐵2 ) + ⋯ + Pr(𝐴 ∩ 𝐵𝑛 ), (9.21)

by (9.10). So, one way to work out Pr(𝐴 ∩ 𝐵) is to work out each Pr(𝐴 ∩ 𝐵𝑖 ), where 1 ≤
𝑖 ≤ 𝑛, and then just add these probabilities. That requires working out 𝑛 probabilities
of intersections of events, but in some situations it is possible to choose the partition
𝐵 = 𝐵1 ⊔𝐵2 ⊔⋯⊔𝐵𝑛 in such a way that working out the probabilities Pr(𝐴 ∩𝐵𝑖 ) is much
easier than working out Pr(𝐴 ∩ 𝐵).
9.5 O P E R AT i O N S O N E V E N T S 323

𝐵1
𝐴
𝐴 ∩ 𝐵1

𝐴 ∩ 𝐵2 𝐵2

𝐴 ∩ 𝐵3
𝐵3

Figure 9.1: Events 𝐴 and 𝐵, with 𝐵 partitioned into 𝐵1 , 𝐵2 , 𝐵3 . This means that 𝐴 ∩ 𝐵 is
partitioned into 𝐴 ∩ 𝐵1 , 𝐴 ∩ 𝐵2 , 𝐴 ∩ 𝐵3 .

For example, what is the probability that a three-letter English word starts with ‘c’
and has a vowel as its second letter? This kind of question arises in word games and
also, in more elaborate forms, in language modelling. Suppose the three-letter word is
chosen uniformly at random from the set of all three-letter words in a standard word
list. Then

Pr(1st letter is ‘c’ and 2nd letter is a vowel) = Pr((1st letter is ‘c’) ∩ (2nd letter is a vowel).

Let the events 𝐴 and 𝐵 be

𝐴 = 1st letter is ‘c’


𝐵 = 2nd letter is a vowel.

Then we can partition 𝐵 into five events:

𝐵1 = 2nd letter is ‘a’,


𝐵2 = 2nd letter is ‘e’,
𝐵3 = 2nd letter is ‘i’,
𝐵4 = 2nd letter is ‘o’,
𝐵5 = 2nd letter is ‘u’,

and
𝐵 = 𝐵1 ⊔ 𝐵2 ⊔ 𝐵3 ⊔ 𝐵4 ⊔ 𝐵5 .
324 DiSCRETE PROBABiLiTY i

We can then use (9.21):

Pr(𝐴 ∩ 𝐵) = Pr((1st letter is ‘c’) ∩ (2nd letter is a vowel))


= Pr((1st letter is ‘c’) ∩ (2nd letter is ‘a’)) +
Pr((1st letter is ‘c’) ∩ (2nd letter is ‘e’)) +
Pr((1st letter is ‘c’) ∩ (2nd letter is ‘i’)) +
Pr((1st letter is ‘c’) ∩ (2nd letter is ‘o’)) +
Pr((1st letter is ‘c’) ∩ (2nd letter is ‘u’))
= Pr(first two letters are ‘ca’) +
Pr(first two letters are ‘ce’) +
Pr(first two letters are ‘ci’) +
Pr(first two letters are ‘co’) +
Pr(first two letters are ‘cu’)
10 0 3 14 7
= + + + +
1443 1443 1443 1443 1443
34
=
1443
≈ 0.024.

Here we have used the word list at /usr/share/dict/words (i.e., filename words, direc-
tory path /usr/share/dict/) in the virtual Linux system in your Ed Workspace. This
has 1443 three-letter words. It is not hard to use computational tools such as grep and
wc (see Module 0 Applied Session) to calculate the answer directly in this case. But
it is also worth thinking about how you might solve this problem manually, using a
physical dictionary. Looking up the first two letters of a word enables you to narrow
down the options a lot, and makes manual calculation feasible, showing the advantage
of using (9.21). Even when using computational tools, smart partitioning of events can
help calculate some probabilities more efficiently.
At this point, it is worth studying this probability of 𝐴∩𝐵 alongside the probabilities
of the events 𝐴 and 𝐵 themselves, which are:
44
Pr(𝐴) = Pr(1st letter is ‘c’) = ≈ 0.030,
1443
760
Pr(𝐵) = Pr(2nd letter is a vowel) = ≈ 0.527.
1443
Something to consider, to prepare for studying conditional probability later: based
only on these probabilities of 𝐴, 𝐵 and 𝐴 ∩ 𝐵, do you think that having first letter ‘c’
makes it more or less likely that the second letter is a vowel? Why?
9.6 i N C L U S i O N - E X C L U S i O N F O R P R O B A B i L i T i E S 325

An important special case of (9.21) arises when 𝐵 = 𝑈, i.e., 𝐵 is the entire sample
space, so Pr(𝐵) = 1 and 𝐴 ∩ 𝐵 = 𝐴 ∩ 𝑈 = 𝐴. Then we have

𝑈 = 𝐵1 ⊔ 𝐵 2 ⊔ ⋯ ⊔ 𝐵 𝑛
(9.22)
⟹ Pr(𝐴) = Pr(𝐴 ∩ 𝐵1 ) + Pr(𝐴 ∩ 𝐵2 ) + ⋯ + Pr(𝐴 ∩ 𝐵𝑛 ),

by (9.21). This is sometimes called the Law of Total Probability.


Probability

9.6 iNCLUSiON-EXCLUSiON FOR PROBABiLiTiES

The relationship between probabilities of events, their unions and intersections ((9.17)
and (9.18)) can be extended to three events.

Pr(𝐴 ∪ 𝐵 ∪ 𝐶)
= Pr(𝐴 ∪ 𝐵) + Pr(𝐶) − Pr((𝐴 ∪ 𝐵) ∩ 𝐶)
(applying (9.17) to the sets 𝐴 ∪ 𝐵 and 𝐶, instead of 𝐴 and 𝐵)
= Pr(𝐴) + Pr(𝐵) − Pr(𝐴 ∩ 𝐵) + Pr(𝐶) − Pr((𝐴 ∪ 𝐵) ∩ 𝐶)
(applying (9.17) to 𝐴 and 𝐵, in Pr(𝐴 ∪ 𝐵))
= Pr(𝐴) + Pr(𝐵) − Pr(𝐴 ∩ 𝐵) + Pr(𝐶) − Pr((𝐴 ∩ 𝐶) ∪ (𝐵 ∩ 𝐶))
(applying the Distributive Law to (𝐴 ∪ 𝐵) ∩ 𝐶, in the last term)
= Pr(𝐴) + Pr(𝐵) − Pr(𝐴 ∩ 𝐵) + Pr(𝐶) − ⒧Pr(𝐴 ∩ 𝐶) + Pr(𝐵 ∩ 𝐶) − Pr((𝐴 ∩ 𝐶) ∩ (𝐵 ∩ 𝐶))⒭
(applying (9.17) to 𝐴 ∩ 𝐶 and 𝐵 ∩ 𝐶)
= Pr(𝐴) + Pr(𝐵) − Pr(𝐴 ∩ 𝐵) + Pr(𝐶) − Pr(𝐴 ∩ 𝐶) − Pr(𝐵 ∩ 𝐶) + Pr((𝐴 ∩ 𝐶) ∩ (𝐵 ∩ 𝐶))
= Pr(𝐴) + Pr(𝐵) − Pr(𝐴 ∩ 𝐵) + Pr(𝐶) − Pr(𝐴 ∩ 𝐶) − Pr(𝐵 ∩ 𝐶) + Pr(𝐴 ∩ 𝐵 ∩ 𝐶)
(since (𝐴 ∩ 𝐶) ∩ (𝐵 ∩ 𝐶) = 𝐴 ∩ 𝐵 ∩ 𝐶)
= Pr(𝐴) + Pr(𝐵) + Pr(𝐶) − Pr(𝐴 ∩ 𝐵) − Pr(𝐴 ∩ 𝐶) − Pr(𝐵 ∩ 𝐶) + Pr(𝐴 ∩ 𝐵 ∩ 𝐶) (9.23)
(rearranging slightly, for neatness).

Compare this expression with the expression for |𝐴 ∪ 𝐵 ∪ 𝐶| given in (8.2), and which
you derived in Exercise 1.11.
The structure of the expression (9.23) is

Pr(𝐴 ∪ 𝐵 ∪ 𝐶) = ⒧sum of the probabilities of 𝐴, 𝐵, 𝐶⒭


− ⒧sum of the probabilities of all intersections of two of the sets 𝐴, 𝐵, 𝐶⒭
+ ⒧the probability of the intersection of all three of the sets 𝐴, 𝐵, 𝐶⒭.

Note in particular the alternation in signs.


326 DiSCRETE PROBABiLiTY i

So we can write our expression for Pr(𝐴 ∪ 𝐵 ∪ 𝐶) as

3
Pr(𝐴∪𝐵∪𝐶) = (−1)𝑘+1 ⋅ ⒧sum of probabilities of all intersections of 𝑘 of the sets 𝐴, 𝐵, 𝐶⒭.
𝑘=1

This generalises to arbitrary numbers of sets.

Theorem 44 (Inclusion-Exclusion for probabilities, version 1).


1) For all 𝑛,
𝑛
Pr(𝐴1 ∪𝐴2 ∪⋯∪𝐴𝑛 ) = (−1)𝑘+1 ⋅ (sum of probabilities of all intersections of 𝑘 sets)
𝑘=1
(9.24)

This expresses the probability of a union of events in terms of probabilities of all


possible intersections. It can be proved by induction (at greater length than many of
our proofs by induction).
We can also express the probability of an intersection in terms of probabilities of all
possible unions.

Theorem 45 (Inclusion-Exclusion for probabilities, version 2).


2) For all 𝑛,
𝑛
Pr(𝐴1 ∩ 𝐴2 ∩ ⋯ ∩ 𝐴𝑛 ) = (−1)𝑘+1 ⋅ (sum of probabilities of all unions of 𝑘 sets)
𝑘=1
(9.25)

Theorem 44 and Theorem 45 are really just generalisations of the Inclusion-Exclusion


principle for counting, which we studied in § 8.3. There, we used it to find the size of a
union of sets from the sizes of all possible intersections formed by them (Theorem 42),
and also to find the size of an intersection of sets from the sizes of all possible unions of
them (Theorem 43).
In fact, the proof by induction of the Inclusion-Exclusion principle for probability
follows exactly the same pattern as the proof by induction we gave (Theorem 42) for the
Inclusion-Exclusion principle for counting. Although the notation looks a bit different, in
essence the only change is that each element contributes some specific constant amount
(its probability) to the probability of any set containing it, rather than each elements
contributing 1 to the size of any set containing it; apart from this, the structure of
the algebra and the reasoning is the same in the proofs of both Inclusion-Exclusion
principles.
9.7 i N D E P E N D E N T E V E N T S 327

If all elements in our sample space are equally likely, then the probability of an event
is just its size (as a set) divided by the size of the sample space (provided the sample
space is finite). In this case, Theorem 44 and Theorem 45 just become
𝑛
|𝐴1 ∪ 𝐴2 ∪ ⋯ ∪ 𝐴𝑛 | sum of sizes of all intersections of 𝑘 sets
= (−1)𝑘+1 ⋅ ⒧ ⒭,
|𝑈| 𝑘=1
|𝑈|
𝑛
|𝐴1 ∩ 𝐴2 ∩ ⋯ ∩ 𝐴𝑛 | sum of sizes of all unions of 𝑘 sets
= (−1)𝑘+1 ⋅ ⒧ ⒭.
|𝑈| 𝑘=1
|𝑈|

Removing the common denominator |𝑈| throughout, we just obtain Theorem 42 and
Theorem 43. This shows that the Inclusion-Exclusion principle for counting is a special
case of the Inclusion-Exclusion principle for probability.4

9.7 iNDEPENDENT EVENTS

Intuitively, it is natural to consider two events to be “independent” of each other if the


occurrence, or not, of one has no effect at all on the occurrence, or not, of the other.
For example, a toss of a coin at the start of a cricket match in Chennai seems
independent of a coin toss at the start of a football match in Melbourne. Shaking two
dice in the same cup and then throwing them — as done in Monopoly and other board
games — is considered to give independent outcomes from the two dice; the physical
interaction between them, during the shaking and throwing process, does not seem to
influence their probabilistic behaviour.
Tying the notion of “independence” to the specific mechanism used is fraught with
difficulty, because of the huge variety of processes that can be modelled using probability.
Moreover, sometimes events that seem closely related can turn out, perhaps surprisingly,
to behave independently as far as probability is concerned (and we will see examples of
this).
Fortunately, there is a simple definition of independence which depends solely on
the probabilities of the events involved, without referring at all to the mechanisms by
which the events occur.
Two events 𝐴 and 𝐵 are independent if the probability that they both occur is the
product of their separate probabilities:

Pr(𝐴 ∩ 𝐵) = Pr(𝐴) ⋅ Pr(𝐵). (9.26)

4 Conversely, if all elements of a finite sample space are equally likely, then probabilities are just sizes of sets
divided by the size of the sample space; in other words, they are just counts that have been scaled so that
the sample space itself is scaled to 1. So, in this scenario, the Inclusion-Exclusion principle for counting
implies the Inclusion-Exclusion principle for probability. So, in fact, the two Inclusion-Exclusion principles
are equivalent in a precise sense.
328 DiSCRETE PROBABiLiTY i

For example, when throwing a pair of dice, suppose we are interested in the first die
showing 3 and the second die showing 5. Then

𝐴 = {pairs (𝑥, 𝑦) with 𝑥 = 3}


= {(3, 1), (3, 2), (3, 3), (3, 4), (3, 5), (3, 6)},
𝐵 = {pairs (𝑥, 𝑦) with 𝑦 = 5}
= {(1, 5), (2, 5), (3, 5), (4, 5), (5, 5), (6, 5)},
𝐴 ∩ 𝐵 = {(3, 5)}.

So
6 1
Pr(𝐴) = Pr({(3, 1), (3, 2), (3, 3), (3, 4), (3, 5), (3, 6)}) = = ,
36 6
6 1
Pr(𝐵) = Pr({(1, 5), (2, 5), (3, 5), (4, 5), (5, 5), (6, 5)}) = = ,
36 6
1 1 1
Pr(𝐴) ⋅ Pr(𝐵) = ⋅ = ,
6 6 36
1
Pr(𝐴 ∩ 𝐵) = Pr({(3, 5)}) = .
36
Since Pr(𝐴 ∩ 𝐵) = Pr(𝐴) ⋅ Pr(𝐵), the events 𝐴 and 𝐵 are independent.
The independence of these two events is not too surprising, intuitively, if we accept
that the two dice behave separately when thrown. In this case, the mechanism itself
suggests independence. But independence can be more subtle than this.
Now suppose that the two events we are interested in are (i) the dice totalling seven,
and (ii) the second die showing the “capped successor” of the first, meaning it’s one
greater than the first up to a maximum of 6, so that the capped successor of 𝑥 is 𝑥 + 1
unless 𝑥 = 6, in which case it’s 6. We have

𝐴 = {pairs (𝑥, 𝑦) with 𝑥 + 𝑦 = 7}


= {(1, 6), (2, 5), (3, 4), (4, 3), (5, 2), (6, 1)},
𝐵 = {pairs (𝑥, 𝑦) with 𝑦 = capped successor of 𝑥}
= {pairs (𝑥, 𝑦) with 𝑦 = max(𝑥 + 1, 6) }
= {(1, 2), (2, 3), (3, 4), (4, 5), (5, 6), (6, 6)}.

These events may not “look” so independent. They each depend on both dice; we cannot
attribute them to two separate dice and appeal to the apparently separate nature of the
two dice throws. But consider their intersection

𝐴 ∩ 𝐵 = {(3, 4)}
9.7 i N D E P E N D E N T E V E N T S 329

and the probabilities


6 1
Pr(𝐴) = Pr({(1, 6), (2, 5), (3, 4), (4, 3), (5, 2), (6, 1)}) = = ,
36 6
6 1
Pr(𝐵) = Pr({(1, 6), (2, 5), (3, 4), (4, 3), (5, 2), (6, 6)}) = = ,
36 6
1 1 1
Pr(𝐴) ⋅ Pr(𝐵) = ⋅ = ,
6 6 36
1
Pr(𝐴 ∩ 𝐵) = Pr({(3, 4)}) = .
36
Again we have Pr(𝐴∩𝐵) = Pr(𝐴)⋅Pr(𝐵), so the events 𝐴 and 𝐵 are actually independent.
This example illustrates that, to determine independence, we must reason from the
probabilities of the events 𝐴, 𝐵 and 𝐴 ∩ 𝐵, and not rely just on intuition about the
mechanisms that cause the events to occur.
To illustrate how delicate independence can be, let us consider the same event 𝐴
as before (the total of the two dice is seven) but a slight change to 𝐵 so that we now
require the numbers on the two dice to be equal. The events are

𝐴 = {pairs (𝑥, 𝑦) with 𝑥 + 𝑦 = 7}


= {(1, 6), (2, 5), (3, 4), (4, 3), (5, 2), (6, 1)}, as above;
𝐵 = {pairs (𝑥, 𝑦) with 𝑦 = 𝑥}
= {(1, 1), (2, 2), (3, 3), (4, 4), (5, 5), (6, 6)}.

Superficially, this seems a small change from the previous example: 𝐴 is unchanged,
while 𝐵 is very similar to what it was before, requiring now that the number on the
second dice is equal to that on the first dice instead of being its capped successor. (In a
sense, we have only replaced a difference of +1 or 0 by a difference of 0.) The probabilities
of these two events are the same (with Pr(𝐵) having been calculated on p. 316 in § 9.3𝛼 ):

1
Pr(𝐴) = ,
6
1
Pr(𝐵) = .
6
1 1 1
Pr(𝐴) ⋅ Pr(𝐵) = ⋅ = , as above.
6 6 36
But their intersection is now empty: two integers that add up to seven cannot be equal!

𝐴 ∩ 𝐵 = ∅,
Pr(𝐴 ∩ 𝐵) = 0.

Therefore Pr(𝐴 ∩ 𝐵) ≠ Pr(𝐴) ⋅ Pr(𝐵), so these events 𝐴 and 𝐵 are not independent.
330 DiSCRETE PROBABiLiTY i

Compare mutually exclusive events (§ 9.4𝛼 ) with independent events.

• The probability of the union of mutually exclusive events is the sum of their
individual probabilities.

• The probability of the intersection of independent events is the product of their


individual probabilities.

Just as we said that probability is additive over disjoint unions, we can now say that
probability is multiplicative over intersections of events.
It is important not to confuse mutual exclusive with independent. Do not be tricked
by Venn diagrams! The fact that two mutually exclusive events “look separate from each
other” on a Venn diagram does not mean they are independent. In fact, if two events
𝐴 and 𝐵 are mutually exclusive, then they are not independent, provided they both
have positive probability. To see this, we just work from the definitions. By definition
of mutual exclusivity,
Pr(𝐴 ∩ 𝐵) = 0,
yet, if both events have positive probability, then

Pr(𝐴) Pr(𝐵) > 0.

Therefore
Pr(𝐴 ∩ 𝐵) ≠ Pr(𝐴) Pr(𝐵),
so the definition of independence is not satisfied, so the events are not independent. In-
tuitively, this actually makes sense, since the occurrence of 𝐴 prevents 𝐵 from occurring.
Independence cannot be depicted on a Venn diagram as simply as mutual exclusivity.
If two events are independent, then we know they cannot be disjoint (provided each has
positive probability). That’s a necessary condition for independence, but not a suffi-
cient one. Determining independence is not just a matter of checking the emptiness, or
not, of regions in a Venn diagram; it comes down to the precise relationship, given in
(9.26), between the probabilities of the three sets 𝐴, 𝐵 and 𝐴 ∩ 𝐵.

One of the main uses of independence is to help work out the probabilities of com-
plex events. As usual, when confronted with a complex problem, we want to reduce it
to simpler problems. A complex event, being representable as a set, can typically be
described by combining simpler sets using standard set operations. So, if an event is
an intersection of independent events, then we can work out the probabilities of those
simpler events separately and then multiply them to get the probability of the more
complex event we are interested in. This may be viewed as yet another instance of
“divide-and-conquer” problem-solving.

Example: network reliability (I)


9.7 i N D E P E N D E N T E V E N T S 331

Suppose devices 𝑆 and 𝑇 are connected by two communications links in series, as


shown in the following diagram.

𝑆 𝑇
𝑀

Suppose also that

• each link may survive or fail, according to some random process;

• for each link, the probability that a link survives is 𝑝, and the probability that it
fails is therefore 1 − 𝑝, and that these probabilities are the same for each link;

• the links behave independently of each other. This means that, if 𝐴 is an event
determined solely by the first link 𝑆𝑀 , and 𝐵 is an event determined solely by the
second link 𝑀 𝑇, then Pr(𝐴 ∩ 𝐵) = Pr(𝐴) Pr(𝐵).

What is the probability that 𝑆 and 𝑇 can communicate with each other? For this to
be possible, we need both links between them to survive, otherwise there is no path
between them. So

Pr(there is a path from 𝑆 to 𝑇 using surviving links)


= Pr(links 𝑆𝑀 and 𝑀 𝑇 both survive)
= Pr((link 𝑆𝑀 survives) ∩ (link 𝑀 𝑇 survives))
= Pr(link 𝑆𝑀 survives) ⋅ Pr(link 𝑀 𝑇 survives)
(since the survival of each link is independent of the other link)
= 𝑝 ⋅𝑝
= 𝑝2.

Example: network reliability (II)


Now consider the following simple communications network, under the same scenario:
links behave independently and identically, with each having a survival probability of 𝑝.

𝑆 𝑇

Now what is the survival probability? We will work this out in two different ways.

First method:
332 DiSCRETE PROBABiLiTY i

Observe that, for there to be a path linking 𝑆 and 𝑇, we just need at least one of
the two links to survive.

Pr(there is a path from 𝑆 to 𝑇 using surviving links)


= Pr(at least one of the top and bottom links survives)
= Pr(top link survives, or bottom link survives, or both)
= Pr((top link survives) ∪ (bottom link survives))
= Pr(top link survives) + Pr(bottom link survives)
− Pr((top link survives) ∩ (bottom link survives)), (9.27)

by (9.17). Now, the survival of the separate links are independent events, so

Pr((top link survives)∩(bottom link survives)) = Pr(top link survives)⋅Pr(bottom link survives).

Substituting back into (9.27), we have

Pr(there is a path from 𝑆 to 𝑇 using surviving links) = 𝑝 + 𝑝 − 𝑝 2 = 2𝑝 − 𝑝 2 .

Second method:
Let us now start with

Pr(there is a path from 𝑆 to 𝑇 using surviving links)


= 1 − Pr(there is no path from 𝑆 to 𝑇 using surviving links) (9.28)

and work out the probability of the complementary event, that there is no path from 𝑆
to 𝑇, instead.
For this complementary event to happen, the top and bottom links must both fail.
Therefore

Pr(there is no path from 𝑆 to 𝑇 using surviving links)


= Pr(top and bottom links both fail)
= Pr((top link fails) ∩ (bottom link fails))
= Pr(top link fails) ⋅ Pr(bottom link fails)
(since the failure of each link is independent of the other link)
= (1 − 𝑝) ⋅ (1 − 𝑝)
= (1 − 𝑝)2 .
9.8 C O N D i T i O N A L P R O B A B i L i T Y 333

Substituting back into (9.28), we obtain

Pr(there is a path from 𝑆 to 𝑇 using surviving links) = 1 − (1 − 𝑝)2


= 1 − (1 − 2𝑝 + 𝑝 2 )
= 2𝑝 − 𝑝 2 .

This agrees with the result of our first method, above. This second method is some-
what simpler in this case, and illustrates the benefit of being alert to the possibility of
computing the probability of an event by computing the probability of its complement
instead.

9.8 CONDiTiONAL PROBABiLiTY

The probability of an event quantifies how likely it is in a given setting. But we often need
to find the probability of the same event in more than one setting. New information
may come to light, which may affect our view of how likely an event is, so we may
need to recalculate our probabilities. Or we might have several different competing
explanations of some measurement or observation we have made, so we want to calculate
the probability of our observation under these various explanations in order to compare
them.
Conditional probability gives us tools to do this.
Suppose we are interested in an event 𝐴. Its probability is Pr(𝐴), calculated using
the probabilities of elements of the sample space.
Suppose then that another event 𝐵 occurs. This might be a new development, a
change in the circumstances. Or it might be just an improvement in our information
about the situation. Or we may simply be supposing event 𝐵 to occur (even if we don’t
know whether it occurs or not) in order to reason about what might happen if it did
occur. Whatever the motivation, we still want to know the probability of 𝐴, but under
the condition that 𝐵 occurs, and this may well change the probability of 𝐴. We assume
here that 𝐵 ≠ ∅ and Pr(𝐵) > 0.
Our consideration of what can happen is now more restricted in scope than it was
previously. In particular, elements of the sample space outside 𝐵 are now excluded. We
have to restrict our calculations to elements of the sample space that are in 𝐵. In effect,
𝐵 is the new sample space.
But the elements of 𝐵 cannot have the same probabilities that they had in our
original sample space, because those probabilities add up to Pr(𝐵), which in general can
be < 1. This violates the fundamental requirement that the sum of the probabilities of
all elements of the sample space must be 1.
So, in order to use 𝐵 as a new sample space, we need new probabilities for its elements.
These new probabilities should still be proportional to the original probabilities: if one
element was twice as likely as another in the original sample space, then this should still
be true now. What we will do, then, is to scale all the probabilities of the elements by
334 DiSCRETE PROBABiLiTY i

the same constant factor so that they add up to 1. The appropriate scaling to use is to
divide the probability of each element by Pr(𝐵).
For any 𝑥 ∈ 𝐵, we write Pr(𝑥 ∣ 𝐵) for the probability of element 𝑥 when 𝐵 is used as
the sample space. Then, using our usual notation Pr(𝑥) for the probability of element
𝑥 in the original sample space, we have

Pr(𝑥)
Pr(𝑥 ∣ 𝐵) = .
Pr(𝐵)

We should check that these probabilities satisfy the requirements for probabilities of
elements in a sample space. Firstly, they are clearly nonnegative (since the original
probabilities Pr(𝑥) are nonnegative and Pr(𝐵) > 0). Secondly, they sum to 1, because

Pr(𝑥) 1 1
 Pr(𝑥 ∣ 𝐵) =  =  Pr(𝑥) = ⋅ Pr(𝐵) = 1.
𝑥∈𝐵 𝑥∈𝐵
Pr(𝐵) Pr(𝐵) 𝑥∈𝐵 Pr(𝐵)

So we can indeed use these probabilities for the elements of 𝐵 when treating 𝐵 as the
new sample space.
Let 𝑋 ⊆ 𝐵, and suppose we want its probability under the condition that 𝐵 occurs.
We denote this by Pr(𝑋 ∣ 𝐵). To work this out, we just use our restricted sample space,
𝐵, with appropriately scaled probabilities Pr(𝑥 ∣ 𝐵) for its elements. The definition of
probability, (9.6), gives

Pr(𝑥) 1 1 Pr(𝑋 )
Pr(𝑋 ∣ 𝐵) =  Pr(𝑥 ∣ 𝐵) =  =  Pr(𝑥) = ⋅Pr(𝑋 ) = .
𝑥∈𝑋 𝑥∈𝑋
Pr(𝐵) Pr(𝐵) 𝑥∈𝑋 Pr(𝐵) Pr(𝐵)

We emphasise that this formula only works if 𝑋 ⊆ 𝐵.


Let’s return to event 𝐴, which is a subset of the original sample space but not
necessarily of 𝐵.
To study 𝐴 in this restricted scenario (assuming event 𝐵), we must now exclude from
consideration all elements of 𝐴 that are not in 𝐵. In other words, we exclude 𝐴 ∖ 𝐵,
and focus just on 𝐴 ∩ 𝐵, since the elements of 𝐴 ∩ 𝐵 are the only way 𝐴 can occur if we
are restricting to 𝐵.
So to calculate the probability of 𝐴 given 𝐵, we work out the probability of 𝐴 ∩ 𝐵
using the restricted sample space 𝐵 with its scaled probabilities. So we use our earlier
expression for Pr(𝑋 ∣ 𝐵) with 𝑋 = 𝐴 ∩ 𝐵. Therefore we have

Pr(𝐴 ∩ 𝐵)
Pr(𝐴 ∣ 𝐵) = Pr(𝐴 ∩ 𝐵 ∣ 𝐵) = .
Pr(𝐵)
9.8 C O N D i T i O N A L P R O B A B i L i T Y 335

The main outcome from this discussion is the following expression for conditional
probability:
Pr(𝐴 ∩ 𝐵)
Pr(𝐴 ∣ 𝐵) = (9.29)
Pr(𝐵)

Here are some important special cases:

• Pr(𝐵 ∣ 𝐵) = 1.

• Pr(∅ ∣ 𝐵) = 0.

• If 𝐴 and 𝐵 are disjoint then Pr(𝐴 ∣ 𝐵) = 0.

Equation (9.29) is most useful when we need to work out conditional probability and
the probabilities of 𝐴∩𝐵 and 𝐵 are available or can be calculated. In other situations, we
may already have the conditional probability and can then use it to work out Pr(𝐴 ∩𝐵),
by rearranging (9.29):
Pr(𝐴 ∩ 𝐵) = Pr(𝐴 ∣ 𝐵) Pr(𝐵). (9.30)

This makes sense, intuitively: for 𝐴 and 𝐵 to both occur, we need 𝐵 to occur and, given
that 𝐵 occurs, we also need 𝐴 to occur.

An especially important situation is when 𝐴 and 𝐵 are independent. This happens


if and only if the conditional probability equals the unconditional probability.

Theorem 46.
46 Events 𝐴 and 𝐵 are independent if and only if Pr(𝐴 ∣ 𝐵) = Pr(𝐴).

Proof. (⇒)
Suppose 𝐴 and 𝐵 are independent. Then

Pr(𝐴 ∩ 𝐵)
Pr(𝐴 ∣ 𝐵) =
Pr(𝐵)
Pr(𝐴) Pr(𝐵)
= (by independence)
Pr(𝐵)
= Pr(𝐴).

(⇐)
Suppose Pr(𝐴 ∣ 𝐵) = Pr(𝐴). Then

Pr(𝐴 ∩ 𝐵)
= Pr(𝐴),
Pr(𝐵)

from which it follows that Pr(𝐴 ∩ 𝐵) = Pr(𝐴) Pr(𝐵), i.e., 𝐴 and 𝐵 are independent.

Example 1:
336 DiSCRETE PROBABiLiTY i

On p. 324 in § 9.5, we asked if having first letter ‘c’ makes it more or less likely that
a random three-letter word has a vowel as its second letter. We now answer this, using
the probabilities that we calculated there.
Recall that
760
Pr(2nd letter is a vowel) = ≈ 0.527 (9.31)
1443
We need to compare this with the appropriate conditional probability, which is the
probability of the second letter being a vowel given that the first letter is ‘c’.

Pr((2nd letter is a vowel) ∩ (1st letter is ‘c’))


Pr(2nd letter is a vowel ∣ 1st letter is ‘c’) =
Pr(1st letter is ‘c’)
(by (9.29))
34/1443
= (see p. 324 in § 9.5)
44/1443
34
=
44
≈ 0.773.

By comparing this with the unconditional probability that the second letter is a vowel,
in (9.31), we see that the conditional probability is greater than the unconditional one:

Pr(2nd letter is a vowel) < Pr(2nd letter is a vowel ∣ 1st letter is ‘c’).

We conclude that the first letter being ‘c’ makes it more likely that the second letter is a
vowel. This agrees with our intuition. In fact, it does more: it replaces vague intuition
by a precise quantitative statement.

Example 2:
We saw on p. 316 that the probability of drawing a tile with a vowel from a bag of
100 Scrabble tiles is 0.42. But this does not mean that 42% of letter tiles are vowels,
since there are also two blank tiles. The probability that a random tile is a vowel, given
that it is not a blank, is

Pr(vowel ∣ not blank) = Pr({vowel tiles} ∣ {non-blank tiles})


Pr({vowel tiles} ∩ {non-blank tiles})
=
Pr({non-blank tiles})
Pr({vowel tiles})
=
Pr({non-blank tiles})
(since every vowel tile is non-blank)
0.42
=
0.98

≃ 0.43.
9.9 B AY E S ’ T H E O R E M 337

9.9 B AY E S ’ T H E O R E M

The previous section introduced the conditional probability Pr(𝐴 ∣ 𝐵) of event 𝐴 given
event 𝐵. What if we put the same two events the other way round, and ask about Pr(𝐵 ∣
𝐴)? This is also a valid conditional probability, and we now pin down the relationship
between the two conditional probabilities.

Theorem 47 (Bayes’ Theorem).


Theorem)

Pr(𝐵 ∣ 𝐴) Pr(𝐴)
Pr(𝐴 ∣ 𝐵) = .
Pr(𝐵)

Proof. Recall (9.30):


Pr(𝐴 ∩ 𝐵) = Pr(𝐵) Pr(𝐴 ∣ 𝐵).
Similarly,
Pr(𝐴 ∩ 𝐵) = Pr(𝐴) Pr(𝐵 ∣ 𝐴).
Equating the right-hand sides of the two previous equations gives

Pr(𝐵) Pr(𝐴 ∣ 𝐵) = Pr(𝐴) Pr(𝐵 ∣ 𝐴).

Then dividing each side by Pr(𝐵) gives

Pr(𝐵 ∣ 𝐴) Pr(𝐴)
Pr(𝐴 ∣ 𝐵) = .
Pr(𝐵)

So, to convert Pr(𝐵 ∣ 𝐴) to Pr(𝐴 ∣ 𝐵), just multiply it by the ratio Pr(𝐴)/ Pr(𝐵) of the
probabilities of the events. To remember which ratio to use (so you don’t accidentally
use Pr(𝐵)/ Pr(𝐴) instead), keep in mind that the ratio you want is the one which, when
written in-line using “/ ”, has 𝐴 and 𝐵 in the same order as they are in the conditional
probability you’re aiming for. So, if you’re aiming for Pr(𝐴 ∣ 𝐵), the ratio you want has
𝐴 and 𝐵 in that same order, i.e., Pr(𝐴)/ Pr(𝐵).
One of the main applications of Bayes’ Theorem is to capture how beliefs change
when new information becomes available.

A magician keeps three coins in their pocket, to help with their various tricks. One
of the three coins is a fair coin, with Heads and Tails having probability 12 each. Another
has Heads on each side, and the third has Tails on each side. One of the three coins is
chosen at random, with each of the three being equally likely to be chosen.
1
Pr(Fair) = Pr(DoubleHead) = Pr(DoubleTail) = .
3
338 DiSCRETE PROBABiLiTY i

This chosen coin is then tossed once, and the outcome observed. We see only the outcome
on the upper face of the coin; we do not get to turn it over and see what was on the
other side.
Let 𝐴 be the event that the chosen coin is the fair coin. Before we see the outcome
of a toss, our knowledge about 𝐴 is captured by
1
Pr(𝐴) = Pr(Fair) = .
3
Now suppose the coin comes up Heads. We can work out its probability in (at least)
two different ways. Firstly,

Pr(Heads) = Pr(Heads ∩ Fair) + Pr(Heads ∩ DoubleHead) + Pr(Heads ∩ DoubleTail)


(by the Law of Total Probability, (9.22))
= Pr(Heads ∣ Fair) Pr(Fair) +
Pr(Heads ∣ DoubleHead) Pr(DoubleHead) +
Pr(Heads ∣ DoubleTail) Pr(DoubleTail)
(using (9.30) three times)
1 1 1 1
= ⋅ +1⋅ +0⋅
2 3 3 3
1 1
= + +0
6 3
1
= .
2
The second method of working this out is much quicker, and uses symmetry. Observe
that the whole set-up with the three coins and their uniform probabilities is completely
symmetric in Heads and Tails, so that any outcome from an experiment done with this
set-up must have the same probability as the outcome obtained by swapping Heads and
Tails throughout. Therefore, for a single toss of the coin chosen uniformly at random,
we have
1
Pr(Heads) = Pr(Tails) = .
2
Note how this insight from symmetry enables us to bypass the details of the earlier
calculation, which gave the same result.
Now that we have seen the tossed coin come up Heads, we would like to know how
likely it is that the coin is Fair. So we would like to determine

Pr(Fair ∣ Heads).
9.9 B AY E S ’ T H E O R E M 339

We already know
1
Pr(Heads ∣ Fair) = (by definition of a fair coin),
2
1
Pr(Fair) = (since the three coins are equally likely to be chosen),
3
1
Pr(Heads) = (as determined above).
2
Therefore, by Bayes’ Theorem,
1
Pr(Heads ∣ Fair) Pr(Fair) 2 ⋅ 13 1
Pr(Fair ∣ Heads) = = 1 = .
Pr(Heads) 2
3

So, from a single coin-toss, we do not change our belief about how likely it is that the
coin is fair. But what about the coin being a DoubleHead or DoubleTail? Certainly a
DoubleTail coin cannot ever show Heads, so the observation of Heads, even from just a
single coin toss, rules out this possibility entirely:

Pr(DoubleTail ∣ Heads) = 0.

We can satisfy ourselves that this is in accord with Bayes’ Theorem:

Pr(Heads ∣ DoubleTail) Pr(DoubleTail) 0 ⋅ 13


Pr(DoubleTail ∣ Heads) = = 1 = 0.
Pr(Heads) 2

Finally, what does our observation of Heads tell us about the probability that the chosen
coin is DoubleHeads? As a shortcut, we can use the fact that our three conditional
probabilities,

Pr(Fair ∣ Heads), Pr(DoubleHead ∣ Heads), Pr(DoubleTail ∣ Heads),

must sum to 1, since the three outcomes are mutually exclusive and cover all possibilities
for the chosen coin:

Pr(Fair ∣ Heads) + Pr(DoubleHead ∣ Heads) + Pr(DoubleTail ∣ Heads) = 1.

Therefore

Pr(DoubleHead ∣ Heads) = 1 − Pr(Fair ∣ Heads) − Pr(DoubleTail ∣ Heads)


1
= 1− −0
3
2
= .
3
340 DiSCRETE PROBABiLiTY i

So we have changed our belief about how likely it is that the coin is DoubleHeads. Before
we observed the coin toss outcome, our belief was summarised by Pr(DoubleHead) = 13 .
But now, we have Pr(DoubleHead ∣ Heads) = 23 , so we believe that the DoubleHead coin
is twice as likely as before.
We can still work this last conditional probability out using Bayes’ Theorem, for
practice.

Pr(Heads ∣ DoubleHead) Pr(DoubleHead) 1⋅ 1 2


Pr(DoubleHead ∣ Heads) = = 13 = ,
Pr(Heads) 2
3

which agrees with our calculation above.


The process we have gone through here, of using Bayes’ Theorem to update the
probability we assign to an event we are interested in, is very powerful, even though
the mathematical rule we are using is simple. This process find applications throughout
science, engineering, economics, and indeed any field where data is analysed and beliefs
must be quantified and then revised in the light of observations. It is a fundamental tool
in machine learning and statistics. Every modern computer scientist must be proficient
in its use.
There is some standard terminology that is used when Bayes’ Theorem is applied
in this way. The probability of an event before an observation is made is called its
prior probability,
probability and its conditional probability after the observation is made is
called its posterior probability.
probability In the last use of Bayes’ Theorem above, we began
with
1
prior probability: Pr(DoubleHead) = .
3
We then observed a coin toss giving Heads, and determined:
2
posterior probability: Pr(DoubleHead ∣ Heads) = ,
3
which was different. In this case, the posterior probability was larger than the prior
probability, but it can also be smaller. When we considered DoubleTail, we had

prior probability: Pr(DoubleTail) = 13 ,


posterior probability: Pr(DoubleTail ∣ Head) = 0.

We saw that it is also possible for the prior and posterior probabilities to be equal:
1
prior probability: Pr(Fair) = 3 ,
1
posterior probability: Pr(Fair ∣ Head) = 3 .
9.9 B AY E S ’ T H E O R E M 341

Notice that the expressions Bayes’ Theorem gives for our three probabilities

Pr(Fair ∣ Heads), Pr(DoubleTail ∣ Heads), Pr(DoubleHead ∣ Heads),

each have Pr(Heads) in their denominator. So calculating Pr(Heads) is necessary if we


want to calculate these three probabilities exactly. But what if we just want to compare
the three probabilities, relative to each other? Maybe we just want to know which of
the three probabilities is the largest. Or maybe we just want to make relative statements,
like

“Given that we have observed Heads, the Double-Head coin is twice as likely
as the Fair coin.”

In these cases, we don’t need to know Pr(Heads), because it just serves as a scaling factor.
If we don’t use it, then we won’t get exact conditional probabilities any more; instead,
we get three quantities that are in the same ratios to each other as the probabilities. We
can compute

Pr(Fair ∣ Heads) ∝ Pr(Heads ∣ Fair) Pr(Fair) = 12 ⋅ 13 = 1


6 ,
Pr(DoubleTail ∣ Heads) ∝ Pr(Heads ∣ DoubleTail) Pr(DoubleTail) = 0 ⋅ 13 = 0,
Pr(DoubleHead ∣ Heads) ∝ Pr(Heads ∣ DoubleHead) Pr(DoubleHead) = 1 ⋅ 13 = 1
3.

Each of these statements says that the left-hand side is proportional to the right-hand
side. The same constant of proportionality is used in each case, namely 1/ Pr(Heads),
but this constant factor is now omitted from the calculation. The three numbers we
obtain are no longer probabilities; they are still ≥ 0, but they no longer sum to 1. But
they are in the same ratios with each other as the probabilities were. We can see from
these numbers that Pr(DoubleHead ∣ Heads) is the largest of the three probabilities, and
that the double-headed coin is twice as likely as the fair coin.
So, in some situations where we only want to do comparisons rather than compute
the probabilities exactly, we may not need to compute Pr(Heads).

Having made that important point, let’s return to considering the exact caluclation
of the probabilities, and in particular the calculation of the denominator Pr(Heads). In
the three-coins example, we first calculated Pr(Heads) as a sum of terms of the form
Pr(Heads ∣ 𝐴) Pr(𝐴), where 𝐴 is each of the three coins: 𝐴 ∈ {Fair, DoubleHead, DoubleTail}.
So we had

Pr(Heads) = Pr(Heads ∣ Fair) Pr(Fair) +


Pr(Heads ∣ DoubleHead) Pr(DoubleHead) +
Pr(Heads ∣ DoubleTail) Pr(DoubleTail).

This is very typical of applications of Bayes’ Theorem, so much so that the theorem is
often presented in the following form.
342 DiSCRETE PROBABiLiTY i

Theorem 48 (Bayes’ Theorem, extended version).


version) If 𝑈 = 𝐴1 ⊔ 𝐴2 ⊔ ⋯ ⊔ 𝐴𝑛 then,
for each 𝑗 ∈ {1, 2, … , 𝑛},

Pr(𝐵 ∣ 𝐴𝑗 ) Pr(𝐴𝑗 )
Pr(𝐴𝑗 ∣ 𝐵) = 𝑛 .
∑𝑖=1 Pr(𝐵 ∣ 𝐴𝑖 ) Pr(𝐴𝑖 ))

Proof. If 𝑈 = 𝐴1 ⊔ 𝐴2 ⊔ ⋯ ⊔ 𝐴𝑛 then, by the Law of Total Probability (9.22),


𝑛
Pr(𝐵) =  Pr(𝐵 ∣ 𝐴𝑖 ) Pr(𝐴𝑖 )). (9.32)
𝑖=1

(We are using (9.22) with the names 𝐴 and 𝐵 interchanged throughout, but that makes
no difference to the underlying mathematics.)
Using Theorem 47 with 𝐴𝑗 instead of 𝐴, with the substitution (9.32), we obtain

Pr(𝐵 ∣ 𝐴𝑗 ) Pr(𝐴𝑗 ) Pr(𝐵 ∣ 𝐴𝑗 ) Pr(𝐴𝑗 )


Pr(𝐴𝑗 ∣ 𝐵) = = 𝑛 .
Pr(𝐵) ∑𝑖=1 Pr(𝐵 ∣ 𝐴𝑖 ) Pr(𝐴𝑖 ))

An important special case is when 𝑛 = 2. We have two events, which we can call 𝐴
and 𝐴, and
Pr(𝐵 ∣ 𝐴) Pr(𝐴)
Pr(𝐴 ∣ 𝐵) = .
Pr(𝐵 ∣ 𝐴) Pr(𝐴)) + Pr(𝐵 ∣ 𝐴) Pr(𝐴))

9.10 EXERCiSES

1. A random word is chosen from a list of all five-letter English words, and then one
of that word’s five letters is chosen.

(a) What is a suitable sample space for this experiment? What should the probabilities
of its elements be?

(b) Challenge: use your Linux skills and /usr/share/dict/words to determine the
probability of each English letter in this experiment.

2. A fair coin is tossed three times. The outcome is the sequence of results of the
coin tosses. For convenience, we denote Heads and Tails by H and T, respectively.
Specify a suitable sample space, with associated probabilities, for this experiment.

3. A fair coin is tossed repeatedly until it comes up Heads for the first time. Then
the tossing stops. The outcome is the sequence of the results of the tosses. For example,
if the first Head occurs on the third toss, then the outcome is TTH.
9.10 E X E R C i S E S 343

Specify a suitable sample space, with associated probabilities, for this experiment.

4. Suppose two fair dice are thrown and their numbers are added. Let 𝑇 be the total
of the numbers on the two dice.

(a) What is the probability that 𝑇 is prime?

(b) What is the minimum value of 𝑑 such that the probability that the total differs from
the middle value, 7, by at least 𝑑 is at most 0.1? Symbolically, we are asking for the
minimum 𝑑 such that
Pr(|𝑇 − 7| ≥ 𝑑) ≤ 0.1.

5. Find the probability that a random poker hand contains no diamonds.

6. A random positive integer < 100 is chosen, with all choices being equally likely.
What is the probability that the chosen number is divisible by 3 but not by 9?

7. (Birthday Paradox).
Let 𝑛 ∈ ℕ. Suppose 𝑛 people are chosen at random.

(a) What is the probability that at least two of them share a birthday?

(b) What is the minimum value of 𝑛 such that it is more likely than not that there are
at least two people in the set that share a birthday?

8. Prove that, for any events 𝐴 and 𝐵,

Pr(𝐴 ∪ 𝐵) = 1 − Pr(𝐴 ∩ 𝐵).

9. Prove by induction on 𝑛 that, for all 𝑛 ∈ ℕ and any mutually disjoint sets
𝐴1 , 𝐴2 , … , 𝐴𝑛 ,
𝑛
Pr(𝐴1 ⊔ 𝐴2 ⊔ ⋯ ⊔ 𝐴𝑛 ) =  Pr(𝐴𝑘 ).
𝑘=1

10. Suppose events 𝐴, 𝐵, 𝐶 are independent. Must 𝐴 also be independent of 𝐵 ∩ 𝐶?


Justify your answer, with a proof (if the answer is Yes) or a counterexample (if the
answer is No).

11. Recall the network reliability problems in § 9.7. Now consider the following
network. Again, all links behave independently and have identical survival probability 𝑝.
344 DiSCRETE PROBABiLiTY i

𝑆 𝑇

(a) Give an expression, in terms of 𝑝, for the probability that there is a path of surviv-
ing links from 𝑆 to 𝑇.

(b) Compare your answer to (a) with the answer to Exercise 4.8(b). Discuss the rela-
tionship between your answers to the two questions.

Now let’s upgrade this communications network by adding a link between 𝑀 and 𝑁 ,
giving the following network.

𝑆 𝑇

(c) Give an expression, in terms of 𝑝, for the probability that there is a path of surviv-
ing links from 𝑆 to 𝑇 in this upgraded network.

(d) Compare your answer to (c) with the answer to Exercise 4.8(c). Discuss the rela-
tionship between your answers to the two questions.

12. Determine the probability that a random permutation on a set of two elements
is fixed-point-free.
Then do the same for a set of three elements, and then a set of five elements.

13. What is the probability that a Scrabble letter is a vowel, given that it is in the
first half of the alphabet?

14. Prove that, for any two events 𝐴 and 𝐵, the two conditional probabilities Pr(𝐴 ∣ 𝐵)
and Pr(𝐵 ∣ 𝐴) are equal if and only if the events have the same probability.
9.10 E X E R C i S E S 345

15. Consider again the network of four nodes and five links from Exercise 11. Suppose
that 𝑝 = 12 .

1. What is the probability that there is a path from 𝑆 to 𝑇?

2. What is the probability that there is a path from 𝑀 to 𝑁 ?

16. You are at a cricket match and the ball is hit high towards a fielder. You
think, will they catch it? The fielder could be any of four players: Alex, Chris, Kim and
Sam. You can’t tell the difference between them in their white cricket uniforms at this
distance. You assess that Alex has a probability of catching of 0.9, and a probability of
dropping the catch of 0.1. Each of Chris, Kim and Sam has catching probability 0.4 and
dropping probability 0.6.

(a) What is the probability that the ball is caught?

(b) Suppose now that you see that the ball is caught. What is the probability that the
fielder who caught it was Alex? What is the probability that it was not Alex?

17. A serious crime is committed in a big city, and police are trying to identify the
perpetrator. There are 5 million people who cannot be ruled out. A degraded fragment
of DNA from the criminal is found at the scene. There is no other evidence. A random
person has only a one-in-a-million chance of matching the DNA fragment. The fragment
is compared with a database and a match is found with someone who provided DNA
for a different reason in a completely irrelevant context some years ago. How likely is it
that this person committed the crime? Are they guilty, beyond reasonable doubt?

18. In the three-coins example considered from page 337 onwards in § 9.9, suppose
that, instead of observing just one coin toss, we observe two coin tosses instead.
For each of the following observations of the outcomes of the two coin tosses, de-
termine the prior and posterior probabilities of the chosen coin being each of the three
possibilities: Fair, DoubleHead, DoubleTail.

(a) both tosses are Heads;

(b) both tosses are Tails;

(c) the first toss is Heads, the second is Tails.


10
DISCRETE PROBABILITY I I

We continue our study of discrete probability by looking at random variables and proba-
bility distributions, including the four most important discrete probability distributions.
These give us tools for modelling and analysing a huge variety of random processes.

10.1𝛼 R A N D O M VA R i A B L E S

For a given sample space, there are many numerical quantities we might be inter-
ested in. For example, when throwing two dice — with each member of sample space
{1, 2, 3, 4, 5, 6}×{1, 2, 3, 4, 5, 6} having probability 1/36 — we might be interested in their
sum (if playing Monopoly), or their product, or their maximum. The idea of a numerical
quantity determined from a random member of a sample space is captured by the notion
of a random variable.
A random variable is a function from a sample space to the real numbers. So, if
𝑈 is the sample space, it’s just a function 𝑓 ∶ 𝑈 → ℝ.
We think of a random variable working as follows.
1. First, a random member 𝑥 ∈ 𝑈 of the sample space is chosen, according to the
probabilities defined on 𝑈.

2. Then the function 𝑓 is applied to 𝑥, giving the real number 𝑓(𝑥).


• This part of the process is deterministic, meaning that it is not random: 𝑓(𝑥)
is entirely determined by 𝑥. All the randomness in a random variable comes
from the initial random selection of 𝑥 from 𝑈.
We can also use a random variable repeatedly. If we do so, then each time we choose a
random member of the sample space 𝑈, we do so independently of all previous choices
we may have made.
Other examples of random variables:
• Points on a random Scrabble letter. The sample space consists of 100 tiles, each
with probability 1/100 (see p. 316). Given a tile, the function gives the number of
points shown on that tile (or 0 for a blank).

• The length of a word chosen at random from a dictionary or word list.

347
348 DiSCRETE PROBABiLiTY i i

• The number of tests a program passes, during testing on a fixed number of ran-
domly chosen inputs.

• The number of visits to a specific website in a given time period.

• The lifetime of a device, in days, until it fails.

• The score made by a Test cricket batter in their next innings.

It’s common to denote a random variable by a capital letter, and often one near the
end of the alphabet, such as 𝑋 , 𝑌 or 𝑍, but that is not a formal requirement. This
symbol is usually thought of as containing a random value, where that value is obtained
by taking a random member of the sample space and then applying the function to it.
For example, if 𝑍 is a random variable representing the sum of the numbers shown
when throwing two dice, then 𝑍 contains a value chosen randomly from {2, 3, 4, … , 11, 12},
and these values have the probabilities given on p. 315. If we pick any possible value
𝑘 ∈ {2, 3, 4, … , 11, 12}, then we write
𝑍 =𝑘
for the event that the random variable 𝑍 has the value 𝑘. Since it’s an event, it has a
probability. We can use the usual definition of the probability of an event to work out
the probability of the event 𝑍 = 𝑘. We just add up the probabilities of the elements of
the sample space that belong to the event.
Here, the sample space members are the pairs (𝑖, 𝑗), where 𝑖, 𝑗 ∈ {1, 2, 3, 4, 5, 6}. So
we add up the probabilities of all these pairs such that 𝑖 + 𝑗 = 𝑘:

Pr(𝑍 = 𝑘) =  Pr((𝑖, 𝑗)).


(𝑖,𝑗)∶ 𝑖+𝑗=𝑘

For example, if 𝑘 = 9, then we have the event 𝑍 = 9, which consists of the pairs
(3, 6), (4, 5), (5, 4), (6, 3). Its probability is given by

Pr(𝑍 = 9) = Pr({(3, 6), (4, 5), (5, 4), (6, 3)}) =  Pr((𝑖, 𝑗))
(𝑖,𝑗)∶ 𝑖+𝑗=9

= Pr((3, 6)) + Pr((4, 5)) + Pr((5, 4)) + Pr((6, 3))


1 1 1 1
= + + +
36 36 36 36
4
=
36
1
= .
9

We have defined random variables as having codomain ℝ, following usual practice.


This is because we will do computations with them that require real number operations,
such as arithmetic operations and comparing size. But there are situations where people
10.2𝛼 P R O B A B i L i T Y D i S T R i B U T i O N S 349

use “random variables” with other codomains, such as ℤ𝑛 or sets. This is fine as long as
you don’t try to do operations with them that are not defined in the codomain being
used. For example, you can’t find average values of “random variables” that are sets,
since averaging needs addition and division and neither of these operations is defined on
sets.

10.2𝛼 PROBABiLiTY DiSTRiBUTiONS

A probability distribution is a function that assigns a probability to every member


of a set in such a way that the probabilities add up to 1.
We have seen probability distributions before. Every time we define a sample space,
we must specify a probability distribution on its elements, where each element is given
a probability and these must add up to 1. (The probabilities must, of course, also be in
the interval [0,1]; this follows from calling them probabilities.)
Now, we use probability distributions more widely. We are particularly interested
in probability distributions for random variables.
For any random variable 𝑋 , each of its possible values 𝑘 has a probability, written
Pr(𝑋 = 𝑘) and equal to the sum of the probabilities of all elements of the sample space
for which 𝑋 = 𝑘 (see the end of the previous section). The probabilities of the values of
𝑋 together constitute the probability distribution of the random variable 𝑋 .
Once we specify the set of values a random variable 𝑋 can take, together with its
probability distribution, we have specified everything we need to know about 𝑋 in order
to use it. In the background, there is still a formal definition of 𝑋 in terms of a sample
space and a function from the sample space to the set of possible values. And we can
go back to that formal definition if we want to (because any probability distribution
contains probabilities, and these must ultimately come from a sample space). But, from
now on, when we use random variables, we will mostly just refer to the set of possible
values they can take together with the probability distribution on those numbers.
Keep in mind that

events have probabilities,


but
random variables have probability distributions.

So, for example, if 𝑘 is some fixed number in the codomain of random variable 𝑋 , then
it’s ok to write Pr(𝑋 = 𝑘) or to refer to the probability that 𝑋 = 𝑘 or the probability of
the value 𝑘 (if the random variable 𝑋 is clear from the context). And it’s ok to refer to
the probability distribution of the random variable 𝑋 . But it’s not ok to write “Pr(𝑋 )”
or refer to the “probability of 𝑋 ” when 𝑋 is a random variable, since a random variable
is not an event and therefore does not just have one single probability. And it’s not ok
to refer to the probability distribution of 𝑋 = 𝑘, since 𝑘 is just a single specific value
of 𝑋 , and 𝑋 = 𝑘 is an event; as such, 𝑋 = 𝑘 has a probability, but it does not have an
350 DiSCRETE PROBABiLiTY i i

entire probability distribution.

Although there are some similarities between sample spaces and random variables,
there are also important differences, both in the way we define them and in their purpose
and motivation.

• When defining sample spaces, we divide the range of possible outcomes up into
elementary, “atomic” outcomes. We try to make the sample space elements as
simple as we can, so that any event can be described by some set of them, and
there is no requirement for these sample space elements to be numbers. We want
the probabilities of individual elements to be easy to calculate, and we often try
to get a sample space where all elements have the same probability. The main aim
is for the sample space to be a good model of the underlying random process.

• When defining random variables, our priority is to capture useful numerical func-
tions of random data. These might not be easy to calculate, and they typically
lump many elements of a sample space together. The main aim is for the random
variable to be a good model of what we are interested in about the random data.

Sometimes we need to combine random variables. One of the most common opera-
tions we want to do, to combine random variables, is to add them. We have seen one
example of this, when we added the numbers obtained from throwing two dice.
So let’s consider what sums of random variables look like.
Let 𝑋 and 𝑌 be random variables. Their sum is denoted by 𝑋 + 𝑌. This, too,
is a random variable, but its probability distribution is, in general, different to the
distributions of 𝑋 and 𝑌. To work out the probability that this new random variable
𝑋 + 𝑌 takes a specific value 𝑘, we have to look at all possible pairs of values of 𝑋 and
𝑌 whose sum is 𝑘, and add up their probabilities:

Pr(𝑋 + 𝑌 = 𝑘) =  Pr((𝑋 = 𝑖) ∧ (𝑌 = 𝑗)),


(𝑖,𝑗)∶ 𝑖+𝑗=𝑘

where the sum is over all pairs (𝑖, 𝑗) such that 𝑖 + 𝑗 = 𝑘.


For example, let 𝑋 and 𝑌 be the number shown on two separate fair dice. Then
𝑍 = 𝑋 + 𝑌 is their total. We studied this on p. 348 and worked out the probability that
𝑋 + 𝑌 = 9. See also p. 313 in § 9.3𝛼 .
Two random variables 𝑋 and 𝑌 are independent if, for every possible value 𝑥 of 𝑋
and every possible value 𝑦 of 𝑌, the events 𝑋 = 𝑥 and 𝑌 = 𝑦 are independent. In other
words, for all values 𝑥 and 𝑦 we have

Pr((𝑋 = 𝑥) ∧ (𝑌 = 𝑦)) = Pr(𝑋 = 𝑥) ⋅ Pr(𝑌 = 𝑦). (10.1)

Examples:
10.3𝛼 E X P E C TAT i O N 351

• If two dice are thrown, and 𝑋 and 𝑌 are the numbers shown on the first and sec-
ond die respectively, then 𝑋 and 𝑌 are considered independent (unless something
dodgy is going on).

• Let 𝑋 be the rank and let 𝑌 be the suit of a random card drawn from a standard
deck of 52 playing cards. Then 𝑋 and 𝑌 are independent.

• A random variable is not independent of itself! This seems intuitively obvious,


and we can also see it from (10.1) (using the pair 𝑋 , 𝑋 instead of the pair 𝑋 , 𝑌).

• If 𝑋 is a word chosen at random from a dictionary and 𝑌 is a word chosen at


random from the same dictionary page as 𝑋 , then 𝑋 and 𝑌 are not independent.

10.3𝛼 E X P E C TAT i O N

A random variable takes many different values, each with some probability. Sometimes,
though, we want to work with a single number that is somehow representative of the
entire set of possible values, taking into account their probabilities. This representative
number should, in some sense, correspond to what we expect the random variable to
give us, or to be typical of the values it gives. There are several different ways to do this;
indeed, the terms “expected value” and “typical value” are often interpreted differently.
But the most important and widely used representative value of a random variable is
its expected value, also called its expectation or mean. This is based on the familiar
notion of the average of a set of numbers.
You know that, to compute the average of a set of numbers, you just add them up
and divide by how many of them there are. So, the average of the three numbers 2, 5,
6 is
2+5+6 13
= = 4 13 .
3 3
Another way to put it is that each number is multiplied by 13 and then they are added.
So the three numbers 2, 5, 6 each have a coefficient or “weight”, and the three coefficients
are fractions and add up to 1. In this case, the three coefficients are all the same: 13 , 13 , 13 .
There might be repetition among the numbers we are averaging. The average of 2,
5, 5, 6 (with 5 appearing twice) is

2+5+5+6 1 2 1
= ⋅ 2 + ⋅ 5 + ⋅ 6 = 0.5 + 2.5 + 1.5 = 4.5.
4 4 4 4
We may view this as using the same three numbers 2, 5, 6 as before, but with different
coefficients: 14 , 12 , 14 . These coefficients are still fractions and they still add up to 1.
These coefficients are reminiscent of probabilities, since they belong to the unit
interval [0, 1] and add up to 1. So, taken together, the coefficients can be viewed as a
probability distribution.
352 DiSCRETE PROBABiLiTY i i

We can use this viewpoint to define the average of any set of numbers on which there
is a probability distribution. And a set of numbers with a probability distribution is
nothing more or less than a random variable, as we saw in § 10.2𝛼 . So, for any random
variable 𝑋 , we define its expectation 𝐸(𝑋 ) to be

𝐸(𝑋 ) =  𝑘 ⋅ Pr(𝑋 = 𝑘), (10.2)


𝑘

where the sum is over all possible values of 𝑋 . So we just multiply each value by its
probability, and add up all these products.
The expectation is also called the expected value or the mean mean. It can also be
called the average, although it is more usual to reserve that term for averages obtained
from actual observed data rather than from probability distributions.

Examples:

• Let 𝑋 be the number obtained from throwing a fair die once. Then each value in
{1, 2, 3, 4, 5, 6} has probability 1/6. So

1 1 1 1 1 1 1+2+3+4+5+6 21 7
𝐸(𝑋 ) = 1⋅ +2⋅ +3⋅ +4⋅ +5⋅ +6⋅ = = = = 3 12 .
6 6 6 6 6 6 6 6 3

• Suppose we have a biased coin for which Heads is twice as likely as Tails:
2
Pr(Heads) = ,
3
1
Pr(Tails) = .
3
Let 𝑋 be the number of Heads in a single toss, so

1, with probability 23 ;
𝑋 = 
0, with probability 13 ;

Then
2 1 2 2
𝐸(𝑋 ) = 1 ⋅ + 0 ⋅ = + 0 = .
3 3 3 3

• In Scrabble, every letter tile displays (as a subscript) the points that letter is worth,
if used in a word.1 Blanks are worth 0 points.

1 There are other factors involved when the game is played. The actual points you get from playing a letter
may be doubled or tripled when played on certain special squares on the board, and may be doubled if
used to make two words in a single turn. We ignore those factors. We focus only on the face value of a tile.
10.3𝛼 E X P E C TAT i O N 353

lettter points prob. letter points prob. letter points prob.


A 1 0.09 J 8 0.01 S 1 0.04
B 3 0.02 K 5 0.01 T 1 0.06
C 3 0.02 L 1 0.04 U 1 0.04
D 2 0.04 M 3 0.02 V 4 0.02
E 1 0.12 N 1 0.06 W 4 0.02
F 4 0.02 O 1 0.08 X 8 0.01
G 2 0.03 P 3 0.02 Y 4 0.02
H 4 0.02 Q 10 0.01 Z 10 0.01
I 1 0.09 R 1 0.06 □ 0 0.02

Let 𝑋 be the number of points on a randomly chosen Scrabble tile. Then

𝐸(𝑋 ) = 1 ⋅ 0.09 + 3 ⋅ 0.02 + 3 ⋅ 0.02 + 2 ⋅ 0.04 + 1 ⋅ 0.09 + 4 ⋅ 0.02 + ⋯ ⋯


⋯ ⋯ + 8 ⋅ 0.01 + 4 ⋅ 0.02 + 10 ⋅ 0.01 + 0 ⋅ 0.02
= 1.87.

• You spend $5 on a lottery ticket. Your ticket is one of 10,000 sold. First prize is
$20,000, second prize is $5,000 and third prize is $2,000. What is your expected
prizemoney? What is your expected profit?
Let 𝑃 be the random variable representing your prizemoney. The values of 𝑃, with
their probabilities, are

𝑃, in $ probability
20,000 0.0001
5,000 0.0001
2,000 0.0001
0 0.9997

We have

𝐸(𝑃) = 20, 000 ⋅ 0.0001 + 5, 000 ⋅ 0.0001 + 2, 000 ⋅ 0.0001 + 0 ⋅ 0.9997


= 2 + 0.5 + 0.2 + 0
= 2.7.

So your expected prizemoney is $2.70. To find your expected profit, deduct the
cost of your ticket. Therefore your expected profit is

$2.70 − $5.00 = −$2.30.

So your expected loss is $2.30.

Some simple facts about expectation:


354 DiSCRETE PROBABiLiTY i i

• The expectation of a constant is itself. In other words, if a random variable doesn’t


actually vary — i.e., it just takes one single constant value, with probabiltiy 1 —
then that single value is its expectation. Symbolically: if 𝑐 is a constant then

𝐸(𝑐) = 𝑐. (10.3)

• If we scale a random variable by a constant factor, then its expectation is scaled


by the same factor. If 𝑋 is a random variable and 𝛼 is a constant, then 𝛼𝑋 is the
random variable formed from 𝑋 by multiplying each of its values by 𝛼. So, for
each value 𝑘 of 𝑋 , then 𝛼𝑋 takes value 𝛼𝑘 with probability Pr(𝑋 = 𝑘). In other
words,
Pr(𝛼𝑋 = 𝛼𝑘) = Pr(𝑋 = 𝑘).
Then the expectation of 𝛼𝑋 is given by

𝐸(𝛼𝑋 ) = 𝛼𝐸(𝑋 ). (10.4)

We often want to add random variables together. We have already seen an example
of this: adding the numbers on two dice to form a total, when playing Monopoly. We
now show that we can calculate the expectation of a sum of random variables by just
working out the expectations of all the random variables separately and then adding
them up.

Theorem 49 (Linearity of Expectation).


Expectation) For any random variables 𝑋 and 𝑌, the
expectation of their sum is the sum of their expectations:

𝐸(𝑋 + 𝑌) = 𝐸(𝑋 ) + 𝐸(𝑌).

Proof.

𝐸(𝑋 + 𝑌) =  𝑘 ⋅ Pr(𝑋 + 𝑌 = 𝑘)
𝑘

= 𝑘  Pr((𝑋 = 𝑖) ∧ (𝑌 = 𝑗))
𝑘 (𝑖,𝑗)∶ 𝑖+𝑗=𝑘

=   𝑘 ⋅ Pr((𝑋 = 𝑖) ∧ (𝑌 = 𝑗)),
𝑘 (𝑖,𝑗)∶ 𝑖+𝑗=𝑘

with the last step being possible because 𝑘 is a constant as far as the inner sum is
concerned (even though it varies in the outer sum). Now the two summations are both
out the front (on the left), and together they just amount to summing over all pairs
(𝑖, 𝑗), without restriction. Then we can replace 𝑘 by 𝑖 + 𝑗. Therefore

𝐸(𝑋 + 𝑌) = (𝑖 + 𝑗) ⋅ Pr((𝑋 = 𝑖) ∧ (𝑌 = 𝑗)).


(𝑖,𝑗)
10.3𝛼 E X P E C TAT i O N 355

So now we are summing over all pairs (𝑖, 𝑗) where 𝑖 is a possible value for 𝑋 and 𝑗 is
a possible value for 𝑌. We can organise this sum as an outer sum over all possible 𝑖
and an inner sum over all possible 𝑗, where the possibilities for 𝑗 in the inner sum are
unaffected by the choices of 𝑖 in the outer sum. So we can replace the single summation
∑(𝑖,𝑗) over all pairs by these two nested summations ∑𝑖 ∑𝑗 . Or we could put the nested
summations the other way round, as in ∑𝑗 ∑𝑖 . Right now, we do the former.

𝐸(𝑋 + 𝑌) =  (𝑖 + 𝑗) ⋅ Pr((𝑋 = 𝑖) ∧ (𝑌 = 𝑗))


𝑖 𝑗

=   ⒧𝑖 ⋅ Pr((𝑋 = 𝑖) ∧ (𝑌 = 𝑗)) + 𝑗 ⋅ Pr((𝑋 = 𝑖) ∧ (𝑌 = 𝑗))⒭ (10.5)


𝑖 𝑗

(by the distributive law: (𝑖 + 𝑗)𝑃 = 𝑖𝑃 + 𝑗𝑃,


where 𝑃 is the probability shown)

= ⒧  𝑖 ⋅ Pr((𝑋 = 𝑖) ∧ (𝑌 = 𝑗))⒭ + ⒧  𝑗 ⋅ Pr((𝑋 = 𝑖) ∧ (𝑌 = 𝑗))⒭


𝑖 𝑗 𝑖 𝑗

(grouping all the “first summands” of (10.5) together,


and likewise with all the “second summands”)

= ⒧  𝑖 ⋅ Pr((𝑋 = 𝑖) ∧ (𝑌 = 𝑗))⒭ + ⒧  𝑗 ⋅ Pr((𝑋 = 𝑖) ∧ (𝑌 = 𝑗))⒭


𝑖 𝑗 𝑗 𝑖

(reversing the nesting order in the second summation,


which we can do because 𝑖 and 𝑗 vary independently).

Now, in the first nested summation, the factor 𝑖 in the inner sum over 𝑗 does not depend
on 𝑗 at all, so as far as that inner sum is concerned, it is fixed. So it can be taken outside
the inner sum as a fixed common factor. Similarly, in the second nested summation, the
factor 𝑗 in the inner sum over 𝑖 does not depend on 𝑖, so that factor 𝑗 can be taken
outside the inner sum. (Neither of these factors can be taken outside the outer sum,
though, as these factors are the variables used in those outer summations, so they each
vary as their outer sum is being done.) So we have

𝐸(𝑋 + 𝑌) = ⒧ 𝑖  Pr((𝑋 = 𝑖) ∧ (𝑌 = 𝑗))⒭ + ⒧ 𝑗  Pr((𝑋 = 𝑖) ∧ (𝑌 = 𝑗))⒭


𝑖 𝑗 𝑗 𝑖
(10.6)
But consider the inner sums.

 Pr((𝑋 = 𝑖) ∧ (𝑌 = 𝑗)) = Pr(𝑋 = 𝑖), (10.7)


𝑗
356 DiSCRETE PROBABiLiTY i i

by the law of total probability, (9.22). Here we are using the fact that the events 𝑌 = 𝑗,
considered for all 𝑗, together cover all possibilities, and are mutually exclusive. Similarly,

 Pr((𝑋 = 𝑖) ∧ (𝑌 = 𝑗)) = Pr(𝑌 = 𝑗). (10.8)


𝑖

Using (10.7) and (10.8) in (10.6), we have

𝐸(𝑋 + 𝑌) =  𝑖 Pr(𝑋 = 𝑖)) +  𝑗 Pr(𝑌 = 𝑗).


𝑖 𝑗

The first term here is just 𝐸(𝑋 ) and the second term is just 𝐸(𝑌). Therefore

𝐸(𝑋 + 𝑌) = 𝐸(𝑋 ) + 𝐸(𝑌).

Linearity of expectation is a very simple rule. It’s one of those rare cases where
something simple that you hope might be true actually is true! It is remarkably general,
too: it does not even require the two random variables to be independent; you can check
that we did not assume independence of 𝑋 and 𝑌 at any stage in the proof. It also turns
out to be surprisingly powerful.
Linearity of expectation is another instance of working something out for a complex
object by working it out for components of that object and then combining the answers.

We can also ask about the expectation of a product of random variables. This time,
we need independence to get a simple rule.

Theorem 50. 50 If two random variables 𝑋 and 𝑌 are independent, then the expectation
of their product is the product of their expectations:

𝐸(𝑋 𝑌) = 𝐸(𝑋 )𝐸(𝑌).

Proof. By definition of expectation, we have

𝐸(𝑋 𝑌) =  𝑘 ⋅ Pr(𝑋 𝑌 = 𝑘) (10.9)


𝑘

where the sum is over all possible products of values of 𝑋 and 𝑌. We can express the
event 𝑋 𝑌 = 𝑘 as a partition into events of the form (𝑋 = 𝑥) ∧ (𝑌 = 𝑦) where 𝑥𝑦 = 𝑘, so
that Pr(𝑋 𝑌 = 𝑘) is just the sum of the probabilities of those smaller events:

Pr(𝑋 𝑌 = 𝑘) =  Pr((𝑋 = 𝑥) ∧ (𝑌 = 𝑦)),


𝑥,𝑦∶ 𝑥𝑦=𝑘
10.3𝛼 E X P E C TAT i O N 357

where the sum is over all pairs 𝑥, 𝑦 whose product is 𝑘. Substituting this into (10.9), we
obtain

𝐸(𝑋 𝑌) =  𝑘 ⋅  Pr((𝑋 = 𝑥) ∧ (𝑌 = 𝑦))


𝑘 𝑥,𝑦∶ 𝑥𝑦=𝑘

=   𝑘 Pr((𝑋 = 𝑥) ∧ (𝑌 = 𝑦))
𝑘 𝑥,𝑦∶ 𝑥𝑦=𝑘

(moving 𝑘 inside the inner sum)


=   𝑥𝑦 Pr((𝑋 = 𝑥) ∧ (𝑌 = 𝑦))
𝑘 𝑥,𝑦∶ 𝑥𝑦=𝑘

(replacing 𝑘 by 𝑥𝑦 in the inner sum).

Now these two summations, together, just involve summing over all possible pairs 𝑥, 𝑦,
without regard to their sum (since the inner sum is over all pairs with a specific product
𝑘, but the outer sum is over all possible values of 𝑘). So

𝐸(𝑋 𝑌) =  𝑥𝑦 Pr((𝑋 = 𝑥) ∧ (𝑌 = 𝑦)). (10.10)


𝑥,𝑦

Since we are assuming that 𝑋 and 𝑌 are independent, we have, for every 𝑥 and 𝑦,

Pr((𝑋 = 𝑥) ∧ (𝑌 = 𝑦)) = Pr(𝑋 = 𝑥) Pr(𝑌 = 𝑦).

Substituting this into the right-hand side of (10.10), we have

𝐸(𝑋 𝑌) =  𝑥𝑦 Pr(𝑋 = 𝑥) Pr(𝑌 = 𝑦)


𝑥,𝑦

=   𝑥𝑦 Pr(𝑋 = 𝑥) Pr(𝑌 = 𝑦)
𝑥 𝑦

(just writing the sum over all pairs 𝑥, 𝑦 as a sum over 𝑥 and a sum over 𝑦)
=   𝑥 Pr(𝑋 = 𝑥) ⋅ 𝑦 Pr(𝑌 = 𝑦)
𝑥 𝑦

=  𝑥 Pr(𝑋 = 𝑥)  𝑦 Pr(𝑌 = 𝑦)
𝑥 𝑦

(since 𝑥 Pr(𝑋 = 𝑥) does not depend on 𝑦, so it can be taken outside the inner sum)

= ⒧ 𝑥 Pr(𝑋 = 𝑥)⒭ ⋅ ⒧ 𝑦 Pr(𝑌 = 𝑦)⒭


𝑥 𝑦

= 𝐸(𝑋 ) ⋅ 𝐸(𝑌).
358 DiSCRETE PROBABiLiTY i i

10.4 MEDiAN

We mentioned earlier that the expectation gives a single number that represents what we
expect from a random variable. Although a random variable is inherently unpredictable,
the expectation gives us a rough idea where it tends to sit on the real number line. We
also mentioned that we might want a number that is typical of the values taken by the
random variable, and we noted in passing that the terms “expected value” and “typical
value” might be interpreted differently.
The median is intended to capture this notion of a “typical” value of a random
variable.
The median of a random variable 𝑋 is a real number 𝑚 such that
1
Pr(𝑋 ≤ 𝑚) ≥ ,
2
1
Pr(𝑋 ≥ 𝑚) ≥ .
2
So we can think of 𝑚 as lying exactly “in the middle of 𝑋 ” as far as its probability
distribution is concerned.
In saying this, we are not saying that 𝑚 lies exactly half-way between the smallest
and largest values that 𝑋 might take. So the median must not be confused with the
mid-range which is defined by

(smallest value) + (largest value)


mid-range = .
2
The mid-range is easy to calculate, but it ignores most of the information in 𝑋 , including
the probabilities (except that the minimum and maximum values have nonzero proba-
bility). So it is not very useful in general, although it can be a convenient short-cut
in some situations where the distribution of 𝑋 is symmetric (since in such cases the
expectation and median both equal the mid-range).
When we say that the median 𝑚 lies “in the middle of 𝑋 ”, we intend that this
be understood probabilistically. The median is the value for which at least half the
probability is on one side, and at least half the probability is on the other side.
It is for this reason that the median is often considered to be typical of the values
of the random variable.

Examples

• Let 𝑋 be the number obtained from tossing a single die. Then its median lies
between 3 and 4, which we can determine from the probabilities or simply from
the symmetry of 𝑋 . It makes sense in this case to define the median to be 3.5. In
general, when the median falls naturally in a gap, its value can be chosen to be the
middle of that gap. There are also more detailed ways of calculating the median
in such cases, but we do not consider them here.
10.4 M E D i A N 359

• Let 𝑋 be the number of points on a random Scrabble letter. The median of 𝑋


is 1, since

Pr(𝑋 = 0) = 0.02,
Pr(𝑋 = 1) = 0.68,
Pr(𝑋 ≥ 2) = 0.3,

so Pr(𝑋 ≤ 1) = 0.7 ≥ 0.5 and Pr(𝑋 ≥ 1) = 0.98 ≥ 0.5, satisfying the definition. You
should satisfy yourself that no other median value works in this case.

• Let 𝑃 be the lottery prizemoney random variable defined in the example on p. 353.
Its median is 0, since Pr(𝑃 ≥ 0) = 1 ≥ 0.5 and Pr(𝑃 ≤ 0) = 0.9997 ≥ 0.5.

The expectation and median each have their pros and cons.

• The median is less vulnerable than the expectation to changes to the extreme
values of the random variable. If the lowest (or highest) value changes a bit, then
the median is usually unaffected. This is different to the expectation, which can
be affected by a small change to any value of the random variable.

• The median does not mix with arithmetic operations as well as the expectation.
For example, the median of a sum of two random variables does not, in general,
equal the sum of their medians. So there is no analogue of Linearity of Expectation
to help simplify calculations. This means that the expectation is usually preferred
for random variables that might be used in arithmetical calculations.

• The expectation can be quite different to most actual values of the random variable.
For example, consider a random variable which can take any of the five values 1,
2, 3, 4, 90, with probability 15 each. Then its expectation is

1 1 1 1 1 100
1 ⋅ + 2 ⋅ + 3 ⋅ + 4 ⋅ + 90 ⋅ = = 20.
5 5 5 5 5 5
But its median is 3. Clearly, in this case, the median is more like the other values,
and therefore more “typical” of them, than the expectation, which is a long way
from any of the values.

Both the expectation and the median of a random variable 𝑋 give a numerical
indicator of where 𝑋 may be located on the number line. For this reason, they can each
be described as a measure of location. The mid-range is another measure of location.
But none of these measures says anything about how widely spread, or not, the values
of 𝑋 are. This is often important to know, so we need a new measure to try to capture
it.
360 DiSCRETE PROBABiLiTY i i

10.5 MODE

Our last measure of location is the mode. We only consider it briefly, but it is important
terminology to know.
The mode of a random variable 𝑋 is the value 𝑔 for which Pr(𝑋 = 𝑔) is greatest.
This may not be unique.
The mode may be described as the most frequent, or most popular, value of 𝑋 . This
is useful, since there are times when we just want to know what the most likely value is.
The mode’s probability gives us an upper bound on the probability of every other
value of 𝑋 . But it does not say anything else about how the distribution behaves at
other values. It is quite possible for the mode to be a long way from the mean and/or
the median. So it may not be a good representative of the random variable as a whole.
Nonetheless, for many important well-behaved distributions, the mode is not too far
from the mean and median.

10.6 VA R i A N C E

In many situations, it is important to have some measure of how much a random variable
varies. The values of the variable might be tightly concentrated or loosely smeared out.
The expectation or median tell us nothing about this variability. So we now introduce
a measure of variability, the variance, and its close relative, the standard deviation.
The variance of a random variable 𝑋 measures how far its values tend to be, on
average, from its expected value. In measuring this, it gives greater weight to values that
are further from the expected value. It does this by using the square of the difference
from the expected value, and taking the expectation of this squared difference. We now
define it formally.
Let 𝜇 = 𝐸(𝑋 ) be the expected value of 𝑋 . Then the variance Var(𝑋 ) is defined by

Var(𝑋 ) = 𝐸((𝑋 − 𝜇)2 ). (10.11)

This has many important properties, some of which we will come to soon. But we have
to keep in mind that, if the values of 𝑋 have units on some scale (e.g., if 𝑋 arises from
a physical measurement), then Var(𝑋 ) does not have the same units as 𝑋 ; rather, its
units are the square of the units of 𝑋 , since it is an expectation of squares. For example,
if 𝑋 is a distance in metres, then the units of Var(𝑋 ) are square metres, which is a unit
of area rather than length. So, although the variance does indicate how widely 𝑋 varies
around 𝜇 — with a larger variance indicating wider variation — it does not do so on the
same scale. In order to get a measure of variation that is on the same numerical scale
as 𝑋 and has the same units, we can use the standard deviation,
deviation which is defined by

standard deviation of 𝑋 = Var(𝑋 ).


10.6 VA R i A N C E 361

The variance and standard deviation go hand-in-hand. Whenever one of them is


used, it is likely that the other one will be used at some stage too. We are not asked to
choose between them! They are both indispensable.

Examples
• Let 𝑋 be the number given by the throwing a single die. We know that 𝜇 =
𝐸(𝑋 ) = 3.5 (see p. 352). So

Var(𝑋 ) = 𝐸((𝑋 − 𝜇)2 )


= 𝐸((𝑋 − 3.5)2 )
6
= (𝑘 − 3.5)2 ⋅ Pr(𝑋 = 𝑘)
𝑘=1
= (1 − 3.5)2 ⋅ Pr(𝑋 = 1) + (2 − 3.5)2 ⋅ Pr(𝑋 = 2) + (3 − 3.5)2 ⋅ Pr(𝑋 = 3) +
(4 − 3.5)2 ⋅ Pr(𝑋 = 4) + (5 − 3.5)2 ⋅ Pr(𝑋 = 5) + (6 − 3.5)2 ⋅ Pr(𝑋 = 6)
1 1 1 1 1 1
= (−2.5)2 ⋅ + (−1.5)2 ⋅ + (−0.5)2 ⋅ + 0.52 ⋅ + 1.52 ⋅ + 2.52 ⋅
6 6 6 6 6 6
1 1 1 1 1 1
= 6.25 ⋅ + 2.25 ⋅ + 0.25 ⋅ + 0.25 ⋅ + 2.25 ⋅ + 6.25 ⋅
6 6 6 6 6 6
6.25 + 2.25 + 0.25 + 0.25 + 2.25 + 6.25
=
6
17.5
=
6
= 2 11
12
≈ 2.92.

Therefore the standard deviation is

(Var(𝑋 ))1/2 ≈ 1.71.

Another way to calculate the variance is given by the following theorem.


Theorem 51.
51 Let 𝑋 be a random variable with 𝜇 = 𝐸(𝑋 ). Then

Var(𝑋 ) = 𝐸(𝑋 2 ) − 𝜇 2 .

This theorem may be written in the visually neat form,

Var(𝑋 ) = 𝐸(𝑋 2 ) − 𝐸(𝑋 )2 .

Proof. The variance is just the expectation of (𝑋 − 𝜇)2 . Let’s expand (𝑋 − 𝜇)2 before
taking its expectation. Observe that

(𝑋 − 𝜇)2 = 𝑋 2 − 2𝑋 𝜇 + 𝜇 2 . (10.12)
362 DiSCRETE PROBABiLiTY i i

So we can work out the variance, 𝐸((𝑋 −𝜇)2 ), by taking the expectation of the expanded
form on the right of (10.12). When doing that, we can use Theorem 49 (Linearity of
Expectation).

Var(𝑋 ) = 𝐸((𝑋 − 𝜇)2 )


= 𝐸(𝑋 2 − 2𝑋 𝜇 + 𝜇 2 ) (by (10.12))
2 2
= 𝐸(𝑋 ) − 𝐸(2𝑋 𝜇) + 𝐸(𝜇 ) (by Linearity of Expectation)
2 2
= 𝐸(𝑋 ) − 2𝜇𝐸(𝑋 ) + 𝜇 (by (10.4) and (10.3))
= 𝐸(𝑋 2 ) − 2𝜇 2 + 𝜇 2 (since 𝜇 = 𝐸(𝑋 ))
2 2
= 𝐸(𝑋 ) − 𝜇 .

You can use either way of calculating the variance — the definition (10.11), or
Theorem 51 — according to your preference, or whichever is easier for the data you
have.
You may wonder why, when defining the variance, we use the average squared dif-
ference rather than just the differences themselves. What would happen if we just took
the expectation of the difference 𝑋 − 𝜇, rather than the expectation of (𝑋 − 𝜇)2 ? By
Linearity of Expectation and (10.3), we have

𝐸(𝑋 − 𝜇) = 𝐸(𝑋 ) − 𝐸(𝜇) = 𝐸(𝑋 ) − 𝜇 = 𝜇 − 𝜇 = 0.

So this tells us nothing! The problem with this approach was that the total positive
and negative differences end up cancelling each other out. We could, instead, take the
expectation of the absolute difference, which is called the mean absolute deviation:
deviation

mean absolute deviation = 𝐸(|𝑋 − 𝜇|).

In fact, this is occasionally used, and it can be treated as being on the same scale as 𝑋 ,
with the same units. But its mathematical properties are not as strong as the variance
or standard deviation, so we do not consider it further.

Theorem 52.52 If two random variables 𝑋 and 𝑌 are independent, then the variance of
their sum is the sum of their variances:

Var(𝑋 + 𝑌) = Var(𝑋 ) + Var(𝑌).


10.6 VA R i A N C E 363

Proof.

Var(𝑋 + 𝑌) = 𝐸((𝑋 + 𝑌)2 ) − 𝐸(𝑋 + 𝑌)2


= 𝐸(𝑋 2 + 2𝑋 𝑌 + 𝑌 2 ) − (𝐸(𝑋 ) + 𝐸(𝑌))2
= 𝐸(𝑋 2 ) + 2𝐸(𝑋 𝑌) + 𝐸(𝑌 2 ) − ⒧𝐸(𝑋 )2 + 2𝐸(𝑋 )𝐸(𝑌) + 𝐸(𝑌)2 ⒭
= 𝐸(𝑋 2 ) + 2𝐸(𝑋 𝑌) + 𝐸(𝑌 2 ) − 𝐸(𝑋 )2 − 2𝐸(𝑋 )𝐸(𝑌) − 𝐸(𝑌)2
= 𝐸(𝑋 2 ) − 𝐸(𝑋 )2 + 𝐸(𝑌 2 ) − 𝐸(𝑌)2 + 2(𝐸(𝑋 𝑌) − 𝐸(𝑋 )𝐸(𝑌))
= Var(𝑋 ) + Var(𝑌) + 2(𝐸(𝑋 𝑌) − 𝐸(𝑋 )𝐸(𝑌)).

If 𝑋 and 𝑌 are independent, then 𝐸(𝑋 𝑌) = 𝐸(𝑋 )𝐸(𝑌), by Theorem 50. So the final
term above, 2(𝐸(𝑋 𝑌) − 𝐸(𝑋 )𝐸(𝑌)), is zero. Therefore

Var(𝑋 + 𝑌) = Var(𝑋 ) + Var(𝑌).

Comments:

• The variance is easier to work with, when adding random variables, than the
standard deviation, for which we have

StdDev(𝑋 + 𝑌) = (StdDev(𝑋 ))2 + (StdDev(𝑌))2 .

• The quantity 𝐸(𝑋 𝑌) − 𝐸(𝑋 )𝐸(𝑌), used near the end of the above proof, is im-
portant in its own right. It is the covariance of the random variables 𝑋 and 𝑌,
denoted by Cov(𝑋 , 𝑌):

Cov(𝑋 , 𝑌) = 𝐸(𝑋 𝑌) − 𝐸(𝑋 )𝐸(𝑌).

This is zero when 𝑋 and 𝑌 are independent, and can be zero in some other situa-
tions too. If it is nonzero then it indicates some kind of linear relationship between
the two random variables, though not usually an exact one; the relationship may
be approximate and probabilistic.

Since the standard deviation measures how far 𝑋 tends to be from its mean 𝜇, we
would expect that 𝑋 is unlikely to be too many standard deviations away from 𝜇, and
that the further away from 𝜇 we go, the less likely 𝑋 is to appear there.
This intuition can be made precise. We use 𝑡 for how far away from the mean we
want to go (in terms of numbers of standard deviations). We will put an upper bound
on the probability of being at least that far away.
364 DiSCRETE PROBABiLiTY i i

Theorem 53 (Chebyshev’s Inequality).


Inequality) For any random variable 𝑋 with expecta-
tion 𝜇 and variance 𝜎 , and any 𝑡 ∈ ℝ+ , the probability that 𝑋 is at least 𝑡 standard
2

deviations away from its mean is at most 1/𝑡 2 :

1
Pr ⒧|𝑋 − 𝜇| ≥ 𝑡𝜎⒭ ≤ .
𝑡2
Proof. The variance of 𝑋 is 𝐸((𝑋 − 𝜇)2 ). This expectation is a sum over all values of
𝑋 , but we’ll compare that with what we get by only taking values that are far enough
away from the mean.

Var(𝑋 ) = 𝐸((𝑋 − 𝜇)2 )


= (𝑘 − 𝜇)2 Pr(𝑋 = 𝑘)
𝑘

≥  (𝑘 − 𝜇)2 Pr(𝑋 = 𝑘)
𝑘∶ |𝑘−𝜇|≥𝑡𝜎

(we now only sum over those 𝑘 that are ≥ 𝑡 standard deviations
away from the mean)
≥  (𝑡𝜎)2 Pr(𝑋 = 𝑘)
𝑘∶ |𝑘−𝜇|≥𝑡𝜎

(since, if |𝑘 − 𝜇| ≥ 𝑡𝜎, then (𝑘 − 𝜇)2 ≥ (𝑡𝜎)2 )


= (𝑡𝜎)2  Pr(𝑋 = 𝑘)
𝑘∶ |𝑘−𝜇|≥𝑡𝜎

(since (𝑡𝜎)2 does not depend on 𝑘, so can be taken outside the sum)
= (𝑡𝜎)2 Pr(|𝑋 − 𝜇| ≥ 𝑡𝜎).

Therefore, since Var(𝑋 ) = 𝜎 2 ,

𝜎 2 ≥ (𝑡𝜎)2 Pr(|𝑋 − 𝜇| ≥ 𝑡𝜎).

Therefore
1
Pr ⒧|𝑋 − 𝜇| ≥ 𝑡𝜎⒭ ≤ .
𝑡2

So, the larger 𝑡𝜎 is, the less likely it is that 𝑋 is that far away from the mean, and
the theorem gives an upper bound on the probability of being that far away.
This theorem’s main virtue is its generality. It applies to any random variable at all.
In practice, when we have a random variable with a specific well-behaved probability
distribution, stronger statements may be possible; there may be smaller bounds on the
probability of being a given distance from the mean. But, if we don’t know much about
how a given random variable is distributed, or if it’s hard to analyse it, then we can
10.7 U N i F O R M D i S T R i B U T i O N 365

always fall back on Chebyshev’s Inequality.

Having introduced many of the basic concepts pertaining to random variables, es-
pecially expectation and variance, we now look at some important and useful random
variables and their associated probability distributions.

10.7 UNiFORM DiSTRiBUTiON

Let 𝑎 and 𝑏 be integers with 𝑎 ≤ 𝑏. The uniform distribution gives the same proba-
bility to all integers between 𝑎 and 𝑏 inclusive, and zero probability to all other integers.
The number of integers in the interval [𝑎, 𝑏] is 𝑏 − (𝑎 − 1) = 𝑏 − 𝑎 + 1, so they each get
probability 1/(𝑎 − 𝑏 + 1). All other integers have probability zero.
If a random variable 𝑥 has the uniform distribution as its probability distribution,
we say it is uniformly distributed,
distributed and its probabilities are given by

⎧ 1
, if 𝑎 ≤ 𝑥 ≤ 𝑏;
Pr(𝑋 = 𝑥) = ⎨ 𝑏 − 𝑎 + 1
⎩ 0, otherwise.

We can write 𝑋 ∼ Unifℤ (𝑎, 𝑏) to mean that the random variable 𝑋 is uniformly dis-
tributed over the integer interval [𝑎, 𝑏] ∩ ℤ. The subscript ℤ may be omitted if it is clear
from the context that 𝑋 can only take integer values in the interval [𝑎, 𝑏].
A plot of the discrete uniform distribution, for 𝑎 = 2 and 𝑏 = 6, is shown in Fig-
ure 10.1.
Here are some examples of uniformly distributed random variables, some of which
we have seen before.
• Let 𝑋 be the outcome of the toss of a fair coin, with outcomes encoded as 0 and
1 for Tails and Heads respectively. Then 𝑋 ∼ Unif(0, 1).

• Let 𝑋 be the number shown on a fair die after it is thrown. Then 𝑋 ∼ Unif(1, 6).

• Let 𝑋 be the age of an adult student who has not reached the minimum age for
a full Victorian Driver’s Licence. This means that their age could be 18, 19, 20
or 21 (since the minimum age for a full licence is 22). In the absence of any
other information about the student or their driving history, we might use 𝑋 ∼
Unif(18, 21).
Sometimes, we use a uniform distribution because we have reason to believe that all
possible values are indeed equally likely, as for a fair die or a fair coin.
We also use a uniform distribution to model ignorance or uncertainty, as we did
in the third example above. Suppose we know that an integer-valued random variable
always takes values in [𝑎, 𝑏], but we know nothing more about it. So we don’t know
where its most popular or least popular values are; we don’t know whether it tends to
lie closer to 𝑎 or closer to 𝑏 or somewhere in the middle; we don’t know whether its
366 DiSCRETE PROBABiLiTY i i

Pr(𝑋 = 𝑘)

0.3

0.2

0.1

0.0
𝑘
0 1 2 3 4 5 6 7 8 9 10

Figure 10.1: The discrete uniform distribution Unif(2, 6).


10.8 B i N O M i A L D i S T R i B U T i O N 367

distribution is symmetric or not; we don’t know if the probabilities vary smoothly or


jump around wildly as we move along the interval from 𝑎 to 𝑏; we don’t know how
many “peaks” or “troughs” a graph of its probabilities may have; in fact, we don’t know
anything at all beyond the fact that all its values lie within that interval. Then the
uniform distribution is the natural one to use. It is the distribution that best captures
our prior ignorance or uncertainty, since it makes no assumptions about any value of 𝑋
being more or less likely than any other value.2
The expectation of a random variable 𝑋 ∼ Unif(𝑎, 𝑏) must be exactly in the middle
of the interval, by symmetry:
𝑎+𝑏
𝐸(𝑋 ) = . (10.13)
2
You can also confirm this by working from the definition and doing the algebra.
For the case of a single throw of a die, we have 𝑎 = 1 and 𝑏 = 6, and on p. 352 we
worked out the expectation to be (𝑎 + 𝑏)/2 = 3.5.
The median is also (𝑎 + 𝑏)/2, by symmetry.
The variance needs more work. The simplest approach is probably to work out
𝐸(𝑋 2 ), using the definition of expectation, and then deduct 𝐸(𝑋 )2 (with 𝐸(𝑋 ) given
by (10.13) above), and use Theorem 51. You may also need Exercise 3.8. You should
obtain
(𝑏 − 𝑎 + 1)2 − 1
Var(𝑋 ) = .
12
For the throw of a die, this formula gives ((𝑏 −𝑎 +1)2 −1)/12 = ((6−1+1)2 −1)/12 =
35/12 = 2 11
12 ≈ 2.92. This agrees with our computation on p. 361.

10.8 BiNOMiAL DiSTRiBUTiON

A Bernoulli trial is a random experiment with two possible outcomes, success and
failure. The probabilities of these outcomes are denoted by 𝑝 and 𝑞 = 1−𝑝 respectively:
failure

𝑝 = Pr(success),
𝑞 = Pr(failure) = 1 − 𝑝.

This very simple random experiment can be used to model a huge variety of situa-
tions. The outcomes “success” and “failure” can be renamed according to the needs of the
situation; the names themselves are just conventional, and are not an essential feature
of the model. We could, instead, call the outcomes “yes” and “no”, or “on” and “off”, or 1

2 The notion of uncertainty of a random variable can be quantified, using the concept of entropy. You should
learn about entropy later in your computer science studies. It is a foundational concept in information
theory and was first introduced by Claude Shannon in coding theory, where data is encoded for sending over
a communications channel, and the encoding scheme used must be as efficient as possible while protecting
against some level of random “noise” on the channel. It is also used in cryptography, data compression,
and machine learning. One of the basic theorems about entropy, as a measure of uncertainty, is that the
entropy of a distribution over an integer interval is maximised by the uniform distribution on that interval.
368 DiSCRETE PROBABiLiTY i i

and 0. Coin tosses can be regarded as Bernoulli trials in which the outcomes are Heads
and Tails. Mostly, our coins have been fair, with 𝑝 = 𝑞 = 12 , but biased coins can also be
modelled, using 𝑝 ≠ 12 (see, e.g., p. 352). In our network reliability example on p. 330
in § 9.7, each edge was a Bernoulli trial in which the outcomes represent the survival or
failure of links in a network, with link outcomes being independent and the probability
of survival being 𝑝 for all links.
When we refer to some number or sequence of Bernoulli trials, they are understood
to be independent and identically distributed. So, the outcome of any one of them is
independent of all previous trials, and each trial uses the same success probability 𝑝.
A Bernoulli trial can be described by a {0, 1}-valued random variable:

1, with probability 𝑝;
𝑋 = 
0, with probability 1 − 𝑝.

This is about the simplest random variable that can be defined that has any randomness
at all. We can work out its mean and variance:

𝐸(𝑋 ) = 0 ⋅ Pr(𝑋 = 0) + 1 ⋅ Pr(𝑋 = 1) = 0 ⋅ (1 − 𝑝) + 1 ⋅ 𝑝 = 𝑝, (10.14)


2 2
Var(𝑋 ) = 𝐸(𝑋 ) − 𝐸(𝑋 ) (by Theorem 51)
= 𝐸(𝑋 2 ) − 𝑝 2 (using 𝐸(𝑋 ) from (10.14))
= ⒧0 ⋅ Pr(𝑋 = 0) + 12 ⋅ Pr(𝑋 = 1)⒭ − 𝑝 2
2

= ⒧0 ⋅ Pr(𝑋 = 0) + 1 ⋅ Pr(𝑋 = 1)⒭ − 𝑝 2


= 𝑝 − 𝑝2
= 𝑝(1 − 𝑝). (10.15)

Suppose we have 𝑛 Bernoulli trials with success probability 𝑝. How many successes
are there? This is a random variable 𝑍 whose set of possible values is {0, 1, 2, … , 𝑛−1, 𝑛}.
If 𝑘 is a value in this set, what is the probability that we have 𝑘 successes (and therefore
𝑛 − 𝑘 failures)?
Each success has probability 𝑝 and each failure has probability 1−𝑝. Since the trials
are independent, a given sequence of outcomes must have probability

𝑝 # successes (1 − 𝑝)# failures .

So a sequence with 𝑘 successes and 𝑛 − 𝑘 failures has probability

𝑝 𝑘 (1 − 𝑝)𝑛−𝑘 .

The number of sequences of 𝑛 outcomes consisting of 𝑘 successes and 𝑛 − 𝑘 failures is


just the number of ways of choosing which 𝑘 of the trials are be the successes This is
⒧𝑛𝑘⒭. Therefore
𝑛
Pr(𝑍 = 𝑘) = ⒧ ⒭𝑝 𝑘 (1 − 𝑝)𝑛−𝑘 . (10.16)
𝑘
10.8 B i N O M i A L D i S T R i B U T i O N 369

Pr(𝑍 = 𝑘)

0.3

0.2

0.1

0.0
𝑘
0 1 2 3 4 5 6 7 8 9 10

Figure 10.2: The binomial distribution Bin(10, 0.3).

This probability distribution is known as the binomial distribution.


distribution We can say that
the random variable 𝑍 has the binomial distribution based on 𝑛 Bernoulli trials and
success probability 𝑝 by writing

𝑍 ∼ Bin(𝑛, 𝑝).

A plot of the binomial distribution, for 𝑛 = 10 and 𝑝 = 0.3, is shown in Figure 10.2.
Note that, for the highest values of 𝑘 (i.e., 𝑘 = 9, 10), the probabilities are nonzero even
though the points appear to be on the horizontal axis.
For example, suppose we want to know the probability that we get exactly six
successes from 𝑛 = 10 Bernoulli trials each with success probability 𝑝 = 0.3. This is the
370 DiSCRETE PROBABiLiTY i i

same scenario as for the plot in Figure 10.2, where we now seek the distribution’s value,
Pr(𝑍 = 6), when 𝑘 = 6. Using (10.16), we have

10
Pr(𝑍 = 6) = ⒧ ⒭0.36 (1 − 0.3)10−6 = 210 ⋅ 0.36 ⋅ 0.74 ≈ 0.037.
6

We now consider the expectation of a binomially distributed random variable, 𝑍 ∼


Bin(𝑛, 𝑝).
This can be calculated directly using the definition of expectation (10.2) with the
binomial probability (10.16). This takes some time, effort and care, and is good practice
in doing the required algebra and manipulating summations. But there is a much easier
way, using Linearity of Expectation (Theorem 49).
The binomial random variable 𝑍 is the number of successes in 𝑛 Bernoulli trials, each
with probability 𝑝. For each 𝑖 ∈ {1, 2, … , 𝑛}, define the {0, 1}-valued random variable
𝑋𝑖 to be the outcome of the 𝑖-th Bernoulli trial, where the outcome is written as 1 for
success and 0 for failure. Since 𝑋𝑖 counts the number of successes in just the 𝑖-th trial,
adding them up over all 𝑖 gives the total number of successes over 𝑛 trials:

𝑍 = 𝑋 1 + 𝑋2 + ⋯ + 𝑋𝑛 .

So the expectation of 𝑍 is given by

𝐸(𝑍) = 𝐸(𝑋1 + 𝑋2 + ⋯ + 𝑋𝑛 )
= 𝐸(𝑋1 ) + 𝐸(𝑋2 ) + ⋯ + 𝐸(𝑋𝑛 ) (by Linearity of Expectation)
= 𝑝
+𝑝 +⋯+𝑝 (by (10.14))
𝑛 copies
= 𝑛𝑝.

Since Bernoulli trials are independent, we can compute the variance by adding up
the variances of all the 𝑋𝑖 (by Theorem 52):

Var(𝑍) = Var(𝑋1 + 𝑋2 + ⋯ + 𝑋𝑛 )
= Var(𝑋1 ) + Var(𝑋2 ) + ⋯ + Var(𝑋𝑛 ) (by Theorem 52)
= 𝑛𝑝(1 − 𝑝).

Therefore the standard deviation of 𝑍 is 𝑛𝑝(1 − 𝑝).

10.9 POiSSON DiSTRiBUTiON

The binomial distribution is often used when the number 𝑛 of trials is very large. We
should be on the lookout for efficient approximations when working with large numbers.
10.9 P O i S S O N D i S T R i B U T i O N 371

Fortunately, there are two very useful approximations to the binomial distribution that
can be used in two types of situations with large 𝑛 (although they do not cover all
possible scenarios). We consider the first of these in this section.
Let 𝑋 be a random variable that can take any nonnegative integer value. We say
that 𝑋 has Poisson distribution with parameter 𝜇 if, for all 𝑘 ∈ ℕ0 ,

𝑒−𝜇 𝜇 𝑘
Pr(𝑋 = 𝑘) = . (10.17)
𝑘!
We can write 𝑋 ∼ Poisson(𝜇) to mean that 𝑋 has Poisson distribution with parameter 𝜇.
In (10.17), 𝑒 is the base of natural logarithms, as usual.
We should first check that this is a valid probability distribution. The values are
nonnegative, but do they add up to 1? We can use the infinite series for 𝑒𝑥 :

𝑥2 𝑥3 𝑒𝑖 𝑥𝑖
𝑒𝑥 = 1 + 𝑥 + + +⋯+ +⋯⋯ =  , (10.18)
2! 3! 𝑖! 𝑖=0
𝑖!

for all 𝑥 ∈ ℝ. The total of all the probabilities in the Poisson distribution is

𝑒−𝜇 𝜇 0 𝑒−𝜇 𝜇 1 𝑒−𝜇 𝜇 2 𝑒−𝜇 𝜇 3
 Pr(𝑋 = 𝑘) = + + + +⋯
𝑘=0
0! 1! 2! 3!
𝜇0 𝜇1 𝜇2 𝜇3
= 𝑒−𝜇 ⒧ + + + + ⋯⒭
0! 1! 2! 3!
𝜇2 𝜇3
= 𝑒−𝜇 ⒧1 + 𝜇 + + + ⋯⒭
2! 3!
= 𝑒−𝜇 ⋅ 𝑒𝜇 (by (10.18))
= 1,

so we can indeed call it a probability distribution.


A plot of the Poisson distribution, for 𝜇 = 3.0, is shown in Figure 10.3. Again, the
higher values of 𝑘 give very small positive values.
For a numerical example, suppose we seek the probability that a Poisson random
variable with parameter 𝜇 = 3 (as in the plot) has value ≤ 2. Then

Pr(𝑋 ≤ 2) = Pr(𝑋 = 0) + Pr(𝑋 = 1) + Pr(𝑋 = 2)


𝑒−3 30 𝑒−3 31 𝑒−3 32
= + +
0! 1! 2!
9
= 𝑒−3 + 𝑒−3 ⋅ 3 + 𝑒−3 ⋅
2
≈ 0.04980 + 0.14936 + 0.22404
≈ 0.423
372 DiSCRETE PROBABiLiTY i i

Pr(𝑋 = 𝑘)

0.3

0.2

0.1

0.0
𝑘
0 1 2 3 4 5 6 7 8 9 10 11 12 13

Figure 10.3: The Poisson distribution Poisson(3).


10.9 P O i S S O N D i S T R i B U T i O N 373

Let’s now give the expectation and variance of the Poisson distribution in general.

Theorem 54.
54 If 𝑋 is a Poisson random variable with parameter 𝜇,

𝐸(𝑋 ) = 𝜇,
Var(𝑋 ) = 𝜇,
StdDev(𝑋 ) = √𝜇.

Proof.

𝐸(𝑋 ) =  𝑘 Pr(𝑋 = 𝑘)
𝑘=0

𝑒−𝜇 𝜇 𝑘
= 𝑘
𝑘=0
𝑘!

𝜇𝑘
= 𝑒−𝜇  𝑘
𝑘=1
𝑘!
(since 𝑒−𝜇 does not depend on 𝑘, so it can be taken outside the sum;
we have also started the sum at 𝑘 = 1, since for 𝑘 = 0 we have 𝑘𝜇 𝑘 /𝑘! = 0)

𝜇𝑘
= 𝑒−𝜇 
𝑘=1
(𝑘 − 1)!

𝜇 𝑘−1
= 𝑒−𝜇 ⋅ 𝜇  (taking one factor 𝜇 outside the sum)
𝑘=1
(𝑘 − 1)!

−𝜇 𝜇𝑖
= 𝑒 ⋅𝜇 (writing the sum in terms of 𝑖 = 𝑘 − 1 rather than 𝑘)
𝑖=0
𝑖!
= 𝑒−𝜇 ⋅ 𝜇 ⋅ 𝑒 𝜇

= 𝜇.

A similar argument can be used to show that Var(𝑋 ) = 𝜇. We leave that as an exercise.
It then follows that the standard deviation is √𝜇.

The symbol 𝜇 for the Poisson parameter was chosen as a reminder that it is actually
the expectation, and also the variance, of the distribution.

The Poisson distribution arises in situations where

• a random variable could conceivably take any nonnegative integer value (or it has
a finite upper bound which is very large), and

• its value is counting the number of occurrences of some event, and

• occurrences of these events do not affect each other.


374 DiSCRETE PROBABiLiTY i i

Many Poisson random variables are based on counting how many times an event happens
within some time interval, in situations where these events are independent of each
other and can happen at any point in time. The parameter 𝜇 depends on the situation,
including the length of the time interval.
Here are some examples of random variables with Poisson distributions.

• Telephone calls: the number of calls received (by a phone, call centre or exchange)
during a given time interval.

• Website visits: the number of visits to a website within a given time interval.

• Busking: the number of coins tossed into a busker’s cap during a given time
interval.

• Radioactive decay. Suppose you have a quantity of radioactive material, the atoms
of which emit alpha particles (which are essentially Helium nuclei, and consist of
two protons and two neutrons). The emissions by the atoms are independent of
each other. The number of alpha particles emitted by a given quantity of material
in a given period of time follows a Poisson distribution. The parameter 𝜇 depends
on the particular element (and isotope) as well as the quantity of material and the
length of the time interval during which observations are made.

One important application of the Poisson distribution is as an approximation to the


Binomial distribution. If 𝑛 is large and the Binomial mean 𝑛𝑝 is comparatively small,
then the Poisson distribution with parameter 𝜇 = 𝑛𝑝 is usually a good approximation
to Bin(𝑛, 𝑝).

10.10 GEOMETRiC DiSTRiBUTiON

In the binomial distribution (§ 10.8), we count the number of successes in a finite


sequence of Bernoulli trials. We now consider the geometric distribution, where we
count the number of trials needed to get the first success, however long that may take.
Suppose we have a sequence of Bernoulli trials, each with success probability 𝑝. Let
𝑘 ∈ ℕ, so we are supposing here that the sequence of trials can be arbitrarily long. What
is the probability that the 𝑘-th trial is the first successful one? For this to happen,

• the first 𝑘 − 1 trials must result in failure, which has probability (1 − 𝑝)𝑘−1 , since
the trials are independent; and

• the 𝑘-th trial must result in success, which has probability 𝑝.

We multiply these together (again using independence) to obtain

Pr(first success is at 𝑘-th trial) = (1 − 𝑝)𝑘−1 𝑝.

This motivates the following definitions.


10.10 G E O M E T R i C D i S T R i B U T i O N 375

The geometric distribution assigns, to 𝑘 ∈ ℕ, the probability (1 − 𝑝)𝑘−1 𝑝. We


give expressions for the first few values in the following table.

𝑘 1 2 3 4 ⋯⋯
probability 𝑝 (1 − 𝑝)𝑝 (1 − 𝑝)2 𝑝 (1 − 𝑝)3 𝑝 ⋯⋯

As usual, we check that these are indeed probabilities. They are clearly nonnegative,
since 0 ≤ 𝑝 ≤ 1. So we now check that they sum to 1. Adding the probabilities gives an
infinite series
𝑝 + (1 − 𝑝)𝑝 + (1 − 𝑝)2 𝑝 + (1 − 𝑝)3 𝑝 + ⋯ .
This is an infinite geometric series with first term 𝑎 = 𝑝 and common ratio 𝑟 = 1 − 𝑝, so
its sum is
𝑎 𝑝 𝑝 𝑝
= = = = 1,
1−𝑟 1 − (1 − 𝑝) 1−1+𝑝 𝑝
as required. So they are indeed probabilities.
Let 𝑋 be a random variable taking positive integer values. 𝑋 is geometrically distributed
if, for some 𝑝 ∈ [0, 1] and every 𝑘 ∈ ℕ,

Pr(𝑋 = 𝑘) = (1 − 𝑝)𝑘−1 𝑝.

We write 𝑋 ∼ Geom(𝑝) to mean that 𝑋 is geometrically distributed with parameter 𝑝.


Note that, although our geometric random variable represents the number of trials
until the first success (so the first successful trial is included in the count), other sources
you read may use the term instead for the number of failures before the first success. It
does not matter very much, as one of these random variables is just a shift, by one, of
the other. But it is important to be aware of this.
A plot of the geometric distribution, for 𝑝 = 0.3, is shown in Figure 10.4.
For an example, suppose we have 𝑝 = 0.3, as in the plot, and we seek the probability
that 𝑋 = 6 (so 𝑘 = 6).

Pr(𝑋 = 6) = (1 − 0.3)6−1 ⋅ 0.3 = 0.75 ⋅ 0.3 ≈ 0.050.

We now give the expectation and variance for the general case.

Theorem 55.
55 If 𝑋 ∼ Geom(𝑝) then

1
𝐸(𝑋 ) = ,
𝑝
1−𝑝
Var(𝑋 ) = .
𝑝2

Proof. (outline)
∞ ∞ ∞
𝐸(𝑋 ) =  𝑘 Pr(𝑋 = 𝑘) =  𝑘(1 − 𝑝)𝑘−1 𝑝 = 𝑝  𝑘(1 − 𝑝)𝑘−1 , (10.19)
𝑘=1 𝑘=1 𝑘=1
376 DiSCRETE PROBABiLiTY i i

Pr(𝑋 = 𝑘)

0.3

0.2

0.1

0.0
𝑘
0 1 2 3 4 5 6 7 8 9 10 11 12 13

Figure 10.4: The geometric distribution Geom(0.3).


10.10 G E O M E T R i C D i S T R i B U T i O N 377

since the factor 𝑝, inside the sum, does not depend on 𝑘, so can be taken outside the
sum.
The sum here is not a type of sum we have studied before. If the terms being added
had no coefficient 𝑘, then we would have an infinite geometric series. If, instead, the
exponent 𝑘 − 1 were removed, we would have an infinite arithmetic series. But, as it
stands, this infinite sum is not of either of those types.
Nonetheless, there is something familiar about it. Study the expression inside the
summation in (10.19). Where have you seen an expression of the form

𝑘𝑥𝑘−1

before?
See if you can remember, before going to the next page!
378 DiSCRETE PROBABiLiTY i i

Hopefully, the last expression on the previous page rang a bell.


The expression 𝑘𝑥𝑘−1 is the derivative3 of 𝑥𝑘 , with respect to 𝑥:

𝑑 𝑘
𝑥 = 𝑘𝑥𝑘−1 .
𝑑𝑥
Here we have 𝑥 = 1 − 𝑝; if we differentiate (1 − 𝑝)𝑘 with respect to 𝑝, then we must
introduce a minus sign, by the Chain Rule:

𝑑 𝑑
(1 − 𝑝)𝑘 = −𝑘(1 − 𝑝)𝑘−1 , therefore 𝑘(1 − 𝑝)𝑘−1 = − (1 − 𝑝)𝑘 .
𝑑𝑝 𝑑𝑝

Substituting this back into (10.19), we have



𝑑
𝐸(𝑋 ) = 𝑝  − (1 − 𝑝)𝑘
𝑘=1
𝑑𝑝

𝑑
= −𝑝 ⒧(1 − 𝑝)𝑘 ⒭
𝑑𝑝 𝑘=1
(since a sum of derivatives is the derivative of the sum).

This sum is now the sum of a geometric series with first term 𝑎 = 1 − 𝑝 and common
ratio 𝑟 = 1 − 𝑝. So this sum is given by
𝑎 1−𝑝 1−𝑝 1
= = = − 1.
1−𝑟 1 − (1 − 𝑝) 𝑝 𝑝

Therefore
𝑑 1
𝐸(𝑋 ) = −𝑝 ⒧ − 1⒭
𝑑𝑝 𝑝
−1
= −𝑝 ⒧ 2 ⒭
𝑝
1
= .
𝑝

This completes our derivation of the expectation of the geometric distribution.


We omit the derivation of the variance, which follows similar lines but is more
detailed.

The geometric distribution has the important property of being memoryless, which
we now explain.

3 It may be surprising to encounter calculus here, since our focus is on discrete probability. But mathematics
is not obliged to respect the walls that humans find it convenient to erect between its parts! It is common
for tools from continuous mathematics, such as calculus, to be applied to the study of discrete probability
and other parts of discrete mathematics. In fact, the traffic goes both ways.
10.10 G E O M E T R i C D i S T R i B U T i O N 379

Suppose you conduct a series of Bernoulli trials, with success probability 𝑝, and let
𝑋 be the number of trials until the first success. Then 𝑋 ∼ Geom(𝑝). Now suppose it so
happens, by chance, that the first 5 trials are all failures. How much longer do we have
to wait for success? The time for this wait, after the fifth trial and given that the first
five trials are all failures, is also geometrically distributed, and with the same success
probability.
This is just a consequence of the independence of the trials. If we have seen five
failures, then independence implies that this has no influence whatsoever on subsequent
trials. In other words, the trials “forget” previous outcomes. Our waiting time to success
behaves exactly as if we are right at the very start of our sequence of trials.
There is a common fallacy about these situations called the “law of averages”, which
is the belief that a sequence of failures makes success more likely and/or that a sequence
of successes makes failure more likely. It is possible to define sequences of random
experiments where such a law does indeed hold, but it does not hold for a sequence of
independent trials with the same success probability.
So, in general, the memoryless property means that, if 𝑋 ∼ Geom(𝑝) and 𝑡 ∈ ℕ, then
the distribution of 𝑋 − 𝑡 given that 𝑋 ≥ 𝑡 is also geometric with probability 𝑝.
In fact, it can be shown that this property characterises the geometric distribution:
any memoryless random variable with values in ℕ must be geometrically distributed.

Some examples of random variables with geometric distribution:

• Program running time: the number of iterations of a loop in a program, where the
stopping condition depends on a random quantity satisfying some condition. We
assume that the random quantities are independent in each iteration and identi-
cally distributed. (Even quantities that are not explicitly random can sometimes
be usefully modelled using randomness. See the discussion in § 9.1𝛼 .)

• Time to failure: the number of time intervals (seconds/hours/etc) that pass until
the failure of a specific component in a device/machine/system. Here we assume
that the component has the same failure probability in each time interval, and
that failures in different time intervals are independent. Note that here we are
inverting the roles of success and failure, so we wait until the first failure rather
than until the first success. This is just a matter of renaming the outcomes, and
does not affect the underlying theory.

• Cricket (batting): the number of balls faced by a batter in a Test cricket match.

• Cricket (bowling): the number of balls bowled until the first wicket falls in a Test
cricket match.
380 DiSCRETE PROBABiLiTY i i

10.11 THE COUPON COLLECTOR’S PROBLEM

Suppose now that, instead of asking just for the time until the first success, we ask for
the time until we have seen both success and failure. So we are asking for the time until
we have seen both possible outcomes. If we are tossing a coin, we are asking for the
number of tosses until we have seen both Heads and Tails. For simplicity, we focus on
the case where both outcomes are equally likely, i.e., 𝑝 = 12
We know that we need at least two tosses, because after just one toss we have only
seen one of the two outcomes. We could have both outcomes after two tosses, as in the
toss sequences HT or TH, or it could take three or more tosses, e.g., HHT, TTH, HHHT,
TTTTTTTTH, and so on.
Let the random variable 𝑍 be the number of tosses until we have seen both outcomes.
We know that 𝑍 ≥ 2. After the first toss (whatever its outcome may be), how long do
we have to wait until we see the other outcome too? This random variable is just 𝑍 − 1.
After the first toss, we just wait until the other outcome appears, and that outcome has
probability 12 . So our waiting time, 𝑍 − 1, is geometrically distributed with probability
𝑝 = 12 :
𝑍 − 1 ∼ Geom( 12 ).
Therefore
1
𝐸(𝑍 − 1) = 1 = 2. (10.20)
2
We can use this to calculate the expected total waiting time, from the very start, until
we have seen both outcomes. This is the expectation of 𝑍, which is

𝐸(𝑍) = 𝐸(1 + (𝑍 − 1)) (since 𝑍 = 1 + 𝑍 − 1)


= 1 + 𝐸(𝑍 − 1) (by a simple application of Linearity of Expectation)
= 1+2 (by (10.20))
= 3.

The coupon collector’s problem is the traditional name given to the extension
of this problem to an arbitrary number of equally likely outcomes.
It derives its name from a situation where items in a commercial product line (break-
fast cereal boxes, in the original scenario) each contain one of 𝑛 possible coupons, and
a prize is available to someone (perhaps the first person) who collects all 𝑛 coupons.
Assume that all 𝑛 coupons are equally likely to appear in a given cereal box, and that
the numbers of them are so large that we can consider the coupons to be chosen with
replacement from an infinite supply. If you want to collect all 𝑛 different coupons, how
many of the items do you have to buy in order to achieve your aim? This usually means
you get repeats of some coupons, but as soon as you have found your last coupon, you
stop.
10.11 T H E C O U P O N C O L L E C T O R ’ S P R O B L E M 381

We model this as a sequence of independent trials where, instead of just one or two
successful outcomes from each trial, we have 𝑛 possible outcomes, and that all outcomes
are equally likely, so each has probability 1/𝑛.
Let random variable 𝑍 be the number of trials until we have seen each possible
outcome at least once. We do not care about the order in which the outcomes occur; we
don’t mind which outcome is the first one we see, which is the second, etc, and which is
the last to be seen, as long as we do see them all eventually. As soon as the last outcome
(whichever one that may be) occurs, we stop, and the time taken (i.e., number of trials)
to reach this point is the value of 𝑍.
For the event 𝑍 = 𝑘, we need the 𝑘-th trial to be the first time some particular
outcome occurs. This means that no previous trial gives that outcome, and also that
the previous trials together include all other outcomes at least once. Deriving an exact
expression for this probability, Pr(𝑍 = 𝑘), is nontrivial, and we will not do it now. We
focus on the expectation of this random variable, which tells us how long we expect to
wait, on average, until all possible outcomes have occurred.
The first trial immediately gives one of the 𝑛 outcomes (though it could be any of
them). So, immediately, we have seen our first outcome, and there are 𝑛 −1 we have yet
to see. One down, 𝑛 − 1 to go!
Then we must wait some period of time until we get another outcome. How long do
we have to wait until we get an outcome that’s different to the first one? For each trial
after the first,
1
Pr(outcome is the same as the first trial) = ,
𝑛
since each outcome has probability 1/𝑛, so the chance that a specific later trial agrees
with the first trial is 1/𝑛. Therefore

1
Pr(outcome is different to the first trial) = 1 − .
𝑛
When we are waiting for a different outcome to that of the first trial,

• we regard a different outcome (to the first trial) as a “success”, so the success
probability is 1 − 1/𝑛;

• we regard the same outcome (as the first trial) as a “failure”, so the failure proba-
bility is 1/𝑛.

So, with this new view of success/failure, the trials can be viewed as Bernoulli trials
with
1
𝑝 = 1− ,
𝑛
1
1−𝑝 = .
𝑛
382 DiSCRETE PROBABiLiTY i i

So the time to success from these trials has geometric distribution with 𝑝 = 1 − 1/𝑛.
Denote this time by 𝑋2 . So
1
𝑋2 ∼ Geom ⒧1 − ⒭ .
𝑛
Since 𝑋2 is the minimum time after the first trial until we get a different outcome to
the first trial, the total time from the start until we have seen two different outcomes is

1 + 𝑋2 ,

since we have one trial for the first outcome (whatever that is), followed by 𝑋2 trials for
an outcome different to the first outcome.
After the second outcome has occurred, how many trials are there until we get a
third outcome? We represent this by a new random variable 𝑋3 . We can again model
this using a geometric distribution, but with different concepts of “success” and “failure”,
and different probabilities. Now, “failure” is having one of the first two outcomes again,
and “success” is having any other outcome. So
2
Pr(“success”) = Pr(trial outcome is different to the first two outcomes to appear)
= 1− ,
𝑛
2
Pr(“failure”) = Pr(trial outcome is the same as the first two outcomes to appear) = .
𝑛
So
2
𝑋3 ∼ Geom ⒧1 − ⒭ .
𝑛
The time from the start until we have seen three different outcomes is

1 + 𝑋2 + 𝑋3 .

We can continue in this vein. For each 𝑘 ∈ {1, 2, … , 𝑛}, define the random variable 𝑋𝑘
by

𝑋𝑘 = number of trials after (𝑘 − 1) different outcomes have occurred, until the 𝑘-th outcome occurs.

So 𝑋𝑘 describes the number of Bernoulli trials in a sequence, where

“success” = trial outcome is one of the first 𝑘 − 1 different outcomes;


“failure” = trial outcome is not one of the first 𝑘 − 1 different outcomes,

and the sequence consists of some number of “failures” followed by a single “success”.
Therefore
𝑘−1
𝑋𝑘 ∼ Geom ⒧1 − ⒭. (10.21)
𝑛
The extreme cases here are
10.11 T H E C O U P O N C O L L E C T O R ’ S P R O B L E M 383

• 𝑘 = 1: in this case, success is certain, failure is impossible, and 𝑋1 = 1. This is the


situation right at the start, when we do not have any previous trials, so the very
first trial is new, and gives an outcome we haven’t seen before. Whatever that
outcome is, it becomes the first outcome in the sequence. We can plug 𝑘 = 1 into
(10.21), which gives 𝑋1 ∼ Geom(1), but that’s just a fancy way of saying that 𝑋1
always equals 1.

• 𝑘 = 𝑛: in this case, all outcomes have already occurred except for one, and we are
just waiting for that last outcome, which has probability
𝑛−1 1
Pr(“success”) = 1 − =
𝑛 𝑛
in each trial. So
1
𝑋𝑛 ∼ Geom ⒧ ⒭ .
𝑛

The total time, from start to finish, is therefore

𝑍 = 1 + 𝑋2 + 𝑋3 + ⋯ + 𝑋𝑘 + ⋯ + 𝑋𝑛−1 + 𝑋𝑛 .

This is a sum of random variables, and they are all geometrically distributed, although
with different probabilities. As we mentioned earlier, our priority is to work out how
long we expect to wait until we have seen all outcomes. So we won’t work out the entire
probability distribution of 𝑍, but rather will focus on its expectation.
Fortunately, this expectation can be calculated using linearity of expectation together
with our knowledge of the expectations of the individual geometric random variables 𝑋𝑘 .
384 DiSCRETE PROBABiLiTY i i

𝐸(𝑍) = 𝐸(1 + 𝑋2 + 𝑋3 + 𝑋4 + ⋯ ⋯ + 𝑋𝑛−1 + 𝑋𝑛 )


= 1 + 𝐸(𝑋2 ) + 𝐸(𝑋3 ) + 𝐸(𝑋4 ) + ⋯ ⋯ + 𝐸(𝑋𝑛−1 ) + 𝐸(𝑋𝑛 )
1 1 1 1 1
= 1+ + + +⋯⋯+ +
1 2 3 𝑛−2 𝑛−1
1− 1− 1− 1− 1−
𝑛 𝑛 𝑛 𝑛 𝑛
(by (10.21) and Theorem 55)
𝑛 1 1 1 1 1
= + + + +⋯⋯+ +
𝑛 𝑛 − 1 𝑛 − 2 𝑛 − 3 𝑛 − (𝑛 − 2) 𝑛 − (𝑛 − 1)
𝑛 𝑛 𝑛 𝑛 𝑛
𝑛 𝑛 𝑛 𝑛 𝑛 𝑛
= + + + +⋯⋯+ +
𝑛 𝑛−1 𝑛−2 𝑛−3 2 1
1 1 1 1 1 1
= 𝑛⒧ + + + +⋯⋯+ + ⒭
𝑛 𝑛−1 𝑛−2 𝑛−3 2 1
1 1 1 1 1 1
= 𝑛⒧ + + +⋯⋯+ + + ⒭
1 2 3 𝑛−2 𝑛−1 𝑛
(just reversing it, so it looks more familiar)
= 𝑛 𝐻𝑛 ,

where 𝐻𝑛 is the 𝑛-th harmonic number, being the sum of the reciprocals of the first 𝑛
positive integers:
1 1 1 1 1
𝐻𝑛 = 1 + + + ⋯ ⋯ + + .
2 3 𝑛−2 𝑛−1 𝑛
We learned about the harmonic numbers in § 6.15. Recall the approximation in (6.48),
which we repeat here:
𝐻𝑛 ≈ log𝑒 𝑛 + 𝛾.
It follows that
𝐸(𝑍) ≈ 𝑛(log𝑒 𝑛 + 𝛾),
and in fact it is usually a very good approximation to simply use

𝐸(𝑍) ≈ 𝑛 log𝑒 𝑛.

The coupon collector’s problem arises in many different contexts.

• A computer program can produce any one of 𝑛 possible outputs, and for random
inputs these are equally likely. The program is complex, so you have no feasible
way of crafting an input that produces a given output. (Perhaps you do not have
access to its inner workings, but must treat it as a black box.) Nonetheless, you
must test it thoroughly, to ensure that each possible output can be produced
without crashing. So you test the program by running it on a sequence of random
10.12 E X E R C i S E S 385

inputs. How many tests do you expect to have to do, until you have seen every
possible output?

• A proposed random number generator is supposed to produce each of the first


𝑛 positive integers with equal probability. You run it and observe how long it
takes to produce each of these numbers for the first time, and finally, how long it
takes before you have seen all numbers at least once. Comparing your observations
with theory (i.e., with your knowledge of how long you expected the process to
take) may help you assess whether the numbers are sufficiently random for your
purposes. (This is the general idea behind the coupon collector’s test for random
number generators.)

• In biology, the problem has been used to estimate the number of different species
of some type of life form in some area. This is a kind of “inverse” of the coupon
collector’s problem, since we use sequences of species actually observed (which may
be very long and contain many repetitions) to make inferences about how many
species there are in total, and we may need to use a more general setting where
the distribution of outcomes (species) is not uniform. Extensive theory has been
developed for this.

10.12 EXERCiSES

1. Suppose two independent fair dice are thrown, with one represented by the
random variable 𝑋 and the other represented by the random variable 𝑌. Calculate the
probability distribution of |𝑋 − 𝑌|.

2. Suppose 𝑋 and 𝑌 are independent {0, 1}-valued random variables where 𝑋 is


unbiased, so
1
Pr(𝑋 = 0) = Pr(𝑋 = 1) = ,
2
but the probability distribution of 𝑌 is unknown.
What can you say about the probability distribution of 𝑋 ⊕ 𝑌, by which we mean
𝑋 + 𝑌 mod 2?

3. Suppose one of the four bitstrings 000, 011, 101, 110 is chosen uniformly at random.
So each of these four strings of three bits has probability 1/4 of being chosen. Define
{0, 1}-valued random variables 𝑋 , 𝑌, 𝑍 as follows:

𝑋 = first bit of the chosen bitstring;


𝑌 = second bit of the chosen bitstring;
𝑍 = third bit of the chosen bitstring.
386 DiSCRETE PROBABiLiTY i i

(a) Prove that (i) 𝑋 and 𝑌 are independent; (ii) 𝑋 and 𝑍 are independent; (iii) 𝑌 and
𝑍 are independent,

(b) Determine whether or not 𝑍 is independent of the other two, i.e., if 𝑍 is independent
of the pair (𝑋 , 𝑌).

(c) Similarly, determine whether or not 𝑌 is independent of the other two, and determine
whether or not 𝑋 is independent of the other two.

4. You want to play a game that needs one die, but you do not have one. Your friend
suggests using five fair coins instead, observing the number of Heads (which can be any
number in {0, 1, 2, 3, 4, 5}), and adding one to get a number in {1, 2, 3, 4, 5, 6}, and using
this final number in place of the number shown by a die.
Discuss this suggestion.
Can you think of any other way in which a set of fair coins (possibly more than six)
can be used to simulate the throw of a fair die?

5. Define the random variable 𝑊 as follows.


𝑘 Pr(𝑊 = 𝑘)
1
0 2
1
1 3
1
2 6

Determine the expectation, median, mode, variance and standard deviation of 𝑊.

6. Find the variance of the sum of the numbers shown on two fair dice.
What is the probability that this sum is at least two standard deviations away from
its mean? Compare this with what Chebyshev’s Inequality says about this case.

7. Suppose 𝑋 ∼ Bin(16, 0.5).

(a) Your friend claims that, because 𝑝 = 12 , half the trials must be successes and half
must be failures. What is the probability that that actually happens?

(b) What is the probability that 𝑋 is at least two standard deviations away from the
mean? (Use a calculator/spreadsheet/program as needed.)

(c) What is the probability that 𝑋 is at least three standard deviations away from the
mean?

(d) How do these exact probabilities compare with the bounds given by Chebyshev’s
Theorem?
10.12 E X E R C i S E S 387

8. During a meteor shower, meteors can arrive randomly at any time and are
independent of each other. The Eta Aquariids meteor shower (on now!) has — at its
peak and in idealised conditions — an average rate of 0.83 meteors per minute that are
visible to the unaided eye.4

(a) What is the distribution of the number of meteors seen in a given minute around
the peak of the shower?

(b) What is the probability that no meteors are seen in a given minute?

(c) How long should an observer be prepared to wait for, if they want to have a proba-
bility of 0.95 of seeing a meteor? (You’ll need a calculator, spreadsheet or program
for this.)

9. Each time a bowler on the opposing team bowls to a Test cricket batter, the
probability of the batter being dismissed (thereby ending their innings) is 1.25%, with
the outcomes from different balls being independent.

(a) What is the distribution of the number of balls the batter faces until they are dis-
missed?

(b) What is the probability they get a “golden duck” (i.e., dismissed by the first ball
they face)?

(c) What is the expected length of their innings?

(d) Suppose the batter has faced 200 balls without being dismissed. How many further
balls would you expect them to face until being dismissed?

10. For each of the random variables listed below, state which of the following
distributions is the best model for it: uniform, binomial, Poisson, geometric. If none of
these seem to fit, discuss why, and suggest an appropriate distribution.

(a) the face value of the top card in a well-shuffled standard deck of 52 playing cards,
where the face values of Ace, Jack, Queen and King are defined to be 1, 11, 12 and
13, respectively, and the face value of any other card is the number shown on it.

(b) the number of days in a particular week on which your regular morning train to
work arrives late, where train arrival times on different days are independent and
identically distributed, and a train is late if it arrives at least a minute after its
scheduled arrival time.
4 This rate is idealised in the sense that it assumes perfect atmospheric conditions and gives the number of
meteors that would be seen if they were all high in the sky, at the best time of night for viewing the shower.
In reality, viewing conditions seldom come close to that ideal. You have to be much more patient than the
“official” rates may seem to indicate!
388 DiSCRETE PROBABiLiTY i i

(c) the number of cosmic rays that hit the CPU of your laptop during your next class;

(d) the number of working days you have to wait, from tomorrow onwards, until your
regular morning train arrives on time (with same assumptions as for (b)).

(e) the number of cereal packets you have to buy until you get one of your three favourite
coupons (under the original coupon collector’s scenario).

(f) the number of cowrie shells that land aperture-up when seven of them are thrown in a
game of Pachisi5 , assuming the cowrie shells are identical and behave independently.

(g) a random digit from the first 1,002 digits after the decimal point in the decimal
representation of 3/7.

(h) a random digit from the first 1,000 digits after the decimal point in the decimal
representation of 5/11.

(i) a random digit from the first 10100 digits of the decimal representation of 𝜋.

11. You met the Birthday Paradox in Exercise 9.7. There, the focus was on how
many people you need to meet in order for it to be more likely than not that at least two
of those people have the same birthday. Now let’s ask a different question, but under
the same assumptions we made then.
Suppose you are recording birthdays of members of a customer loyalty scheme, so
that your company can send them greetings on their birthday each year. How many
people do you expect to have to enrol in the scheme until their birthdays cover every
day of the year?

5 an ancient Indian ancestor of Ludo, Trouble and Cờ cá ngựa


11
G R A P H T H E O RY I

Graphs are abstract models of networks. They can be used to model any system con-
sisting of components that interact in some way. For example, they can model social
networks, molecules, maps, electronic circuits, transport networks, the web, communi-
cations networks, software systems, timetabling requirements, and much else. In each
case, we have a set of nodes or vertices together with links or edges between certain
pairs of vertices. Table 11.1 lists a number of network types, identifying the vertices
and edges for each. We met some of these previously, on p. 66 in § 2.13.

type of network vertices edges

social network people friendships


phone call network phones phone calls
the web web pages hyperlinks
molecule atoms bonds
map countries borders
railway network towns/cities train lines
road network intersections roads
timetabling units (i.e., subjects) unit pairs with
a student in common
electrical circuit junctions wires
polyhedron vertices edges

Table 11.1: Some types of networks, with their vertices and edges.

Like other mathematical models, graphs are abstractions, so they don’t represent
everything about a system. They are intended to capture the structure of the inter-
actions, without incorporating the details of how those interactions work or what the
nodes do. This means a lot of information is thrown away. For example, in a graph that
models a social network, we don’t record people’s height or where each pair of friends

389
390 G R A P H T H E O RY i

first met. Nonetheless, the information retained in the graph enables many important
problems about the network to be solved.
Problems we can tackle using graphs include:

• In a social network, who has the most friends? Who has the most central position
in the network? What is the largest clique of people, who all know each other?
Can you identify subcommunities of people within the network?

• What’s the minimum number of cities you need to travel through in order to drive
between two specific cities? Is there a tour that visits every city and returns to its
starting point, and if so, which of these has the fewest repeat visits to cities?

• Given an electronic circuit, can it be embedded in a silicon wafer so that no wires


cross each other? (This is important for avoiding short circuits and reducing
manufacturing costs.)

• Given a computer network, how can it be displayed on a screen with the fewest
edge crossings? (This is important for constructing network diagrams that are
readable and help people understand networks better.)

• How many different ways are there of travelling by train between two specific
cities?

• How many different timeslots are needed for an exam timetable in which no student
has a clash?

This chapter introduces the basic concepts of graph theory and some foundational
results relating to vertex degrees and various kinds of paths and cycles.

11.1𝛼 BASiC DEFiNiTiONS

A graph consists of a set of vertices and a set of edges. The set of vertices can be any
set, and is intended to represent the objects we are interested in. Each edge must be a
pair of vertices.
We now state this definition more formally.
A graph is a pair (𝑉, 𝐸) where 𝑉 is a set and 𝐸 is a set of unordered pairs of
elements of 𝑉. Each member of 𝑉 is called a vertex and each member of 𝐸 is called an
edge.
edge
If 𝐺 is a graph, we may write 𝐺 = (𝑉, 𝐸) to make clear that 𝑉 is its set of vertices
and 𝐸 is its set of edges. We can also write 𝑉(𝐺) for the set of vertices of 𝐺 and 𝐸(𝐺)
for the set of edges of 𝐺.
So, to specify a graph, we need to specify its vertex set and edge set. Each member
of the edge set is an unordered pair of vertices. So, in a graph 𝐺 = (𝑉, 𝐸), an edge 𝑒 ∈ 𝐸
between two vertices 𝑣, 𝑤 ∈ 𝑉 is the set {𝑣, 𝑤} containing just those two vertices.
11.1𝛼 B A S i C D E F i N i T i O N S 391

For example, let’s define a graph 𝐺 = (𝑉, 𝐸) with

vertex set: 𝑉 = {𝑎, 𝑏, 𝑐, 𝑑, 𝑒, 𝑓, 𝑔};


edge set: 𝐸 = {{𝑎, 𝑏}, {𝑎, 𝑐}, {𝑏, 𝑐}, {𝑐, 𝑑}, {𝑓, 𝑔}}.

This graph has seven vertices and five edges. It is depicted in Figure 11.1, where the
dots represent vertices and the lines represent edges.

𝑑 𝑓
𝑎

𝑒
𝑐
𝑏 𝑔

Figure 11.1: A graph 𝐺.

We say that two vertices 𝑣, 𝑤 of a graph are adjacent if {𝑣, 𝑤} is an edge in the
graph. We also use the convenient shorthand 𝑣 ∼ 𝑤 for this, since it is a bit briefer than
writing {𝑣, 𝑤} ∈ 𝐸, but they mean the same thing. If 𝑣 and 𝑤 are not adjacent, then
we can write 𝑣 ≁ 𝑤. For example, in our graph 𝐺 in Figure 11.1, we have 𝑎 ∼ 𝑏, 𝑎 ∼ 𝑐,
𝑏 ∼ 𝑐, 𝑐 ∼ 𝑑, and 𝑓 ∼ 𝑔. For any other pair 𝑣, 𝑤 of distinct vertices of 𝐺, we have 𝑣 ≁ 𝑤.
For example, 𝑑 ≁ 𝑓.
If 𝑣 is a vertex, then every vertex 𝑤 that is adjacent to 𝑣 is called a neighbour of
𝑣. The set of all neighbours of 𝑣 is the neighbourhood of 𝑣.
If 𝑣 and 𝑤 are adjacent, then they are said to be endpoints
endpoints, or endvertices
endvertices, of the
edge {𝑣, 𝑤} between them.
Adjacency refers to the relationship between two vertices that share an edge. We
use a different term for describing the relationship between a vertex and an edge. If
vertex 𝑣 is one of the two vertices in edge 𝑒 — which means that there is some other
vertex 𝑥 for which 𝑒 = {𝑣, 𝑥} — then we say that 𝑣 is incident with 𝑒.
The same term is used to describe the relationship between two edges that have a
common vertex. Let {𝑣, 𝑤} and {𝑣, 𝑥} be edges that meet at the vertex 𝑣, with 𝑤 ≠ 𝑥.
Then we say that these two edges are incident with each other.
In the graph of Figure 11.1:

• Vertex 𝑐 is incident with edges {𝑎, 𝑐}, {𝑏, 𝑐} and {𝑐, 𝑑}, but not with any others.

• Edge {𝑎, 𝑐} has endpoints 𝑎 and 𝑐. This edge is incident with {𝑎, 𝑏}, since they
meet at vertex 𝑎, and it is also incident with {𝑏, 𝑐} and {𝑐, 𝑑}, since they meet at
vertex 𝑐.

• Edge {𝑓, 𝑔} is incident with vertices 𝑓 and 𝑔 (its endpoints), but it is not incident
with any other edges.
392 G R A P H T H E O RY i

A vertex is isolated if it does not belong to (i.e., is not incident with) any edges.
This happens if and only if it is not adjacent to any other vertices. In the graph in
Figure 11.1, vertex 𝑒 is isolated.
A vertex is a leaf if it belongs to exactly one edge. In Figure 11.1, there are three
leaves: 𝑑, 𝑓 and 𝑔.

11.2𝛼 TYPES OF GRAPHS

Under our definition of graphs,

• no vertex is adjacent to itself. Since an edge is a set of two vertices, those vertices
must be distinct (else we’d have a set consisting of the same vertex appearing
twice, but duplicates are not allowed in sets).

• no two edges can have the same pair of endpoints. To see this, suppose we have
two edges with the same two endpoints 𝑣 and 𝑤. Then the two edges must both be
{𝑣, 𝑤}. (Writing one as {𝑣, 𝑤} and the other as {𝑤, 𝑣} makes no difference, because
order does not matter in a set, so they are still really the same set, even though
we have written them differently.) And since the graph’s edges are considered to
be a set (recalling our definition of a graph as (𝑉, 𝐸), where 𝐸 is the set of edges),
there cannot be duplicate edges in that set.

In graph theory, a graph that satisfies both these conditions is called a simple graph.
graph
We will focus on simple graphs in this unit. We will usually omit the adjective “simple”,
and will just assume that our graphs are all simple graphs unless we say otherwise.
In focusing on simple graphs, we are not saying that they are the only type that
matters. There are broader classes of graphs in which the two conditions above are
relaxed.

• Sometimes, loops are allowed. A loop is an edge whose two endpoints are identical.
It therefore joins a vertex to itself, and is shown in diagrams as a closed curve
through the vertex.

• Sometimes, we allow more than one edge between a pair of vertices. We call these
multiple edges or parallel edges.
edges

There are other classes of graphs in which we allow the vertices and/or edges to
carry other information. A weighted graph has a number on each edge, which could
represent a length, or size, or cost of some kind.
In our graphs, the edges have no direction. An edge {𝑣, 𝑤}, being a set, has no
notion of order between 𝑣 and 𝑤. Such a graph is said to be undirected
undirected. So far,
we have been working entirely with undirected graphs, and we will omit the adjective
“undirected” and just assume that graphs are of this type unless otherwise stated.
11.3𝜔 G R A P H S A N D R E L AT i O N S 393

But there are many situations where the order of vertices in an edge does matter.
For example, a one-way street in a road network has a strict order between its endpoints.
A hyperlink between two webpages goes from the linking page to the linked page. A
phone call has a caller and a receiver. To model these situations, we use ordered pairs for
edges, instead of unordered pairs. So, a directed edge from vertex 𝑣 to vertex 𝑤 is an
ordered pair (𝑣, 𝑤). A directed edge is also called an arc
arc. A directed graph is defined
just as we have done, except that we change from unordered pairs to ordered pairs for
edges. In other words, a directed graph is a graph in which all edges are directed. It is
also possible to define “mixed” or “hybrid” graphs in which edges may be either directed
or undirected.
All these different types of graphs are used as models in a wide variety of practical
contexts. We focus on simple (undirected) graphs because
• they are simple! Relatively so, anyway.

• they appear as special cases in most other classes of graphs, so we’d have to master
them anyway if we want to learn about other classes of graphs.

• they are still very widespread, as models of practical situations;

• they are still complex enough to capture all the main computational issues that
arise in more general classes of graphs.

11.3𝜔 G R A P H S A N D R E L AT i O N S

We first encountered graphs informally — under the more informal term network —
when discussing binary relations in § 2.13. (See p. 66.) The relationship between graphs
and binary relations is very close.
Adjacency is a binary relation defined on the set of vertices of a graph. It consists of
all ordered pairs (𝑣, 𝑤) such that 𝑣 is adjacent to 𝑤. Our graphs are undirected, so if 𝑣
is adjacent to 𝑤 then 𝑤 is adjacent to 𝑣. In other words, the pair (𝑣, 𝑤) belongs to the
adjacency relation if and only if (𝑤, 𝑣) also belongs to it. This tells us that adjacency is a
symmetric binary relation. However, it is not reflexive. In fact, no vertex is adjacent to
itself, so the adjacency relation does not contain any pairs (𝑣, 𝑣) where the two vertices
in the pair are the same. So we say that the adjacency relation is irreflexive.1 We can
think of the adjacency binary relation as being obtained by replacing each edge {𝑣, 𝑤}
by the two ordered pairs (𝑣, 𝑤) and (𝑤, 𝑣).
So simple graphs may be regarded as irreflexive symmetric binary relations. Every
simple graph gives rise to such a relation, via adjacency. Conversely, every irreflexive
symmetric relation may be used to define a simple graph whose vertex set is the domain
of the relation.
1 This is a stronger condition than just “not reflexive”; it is not merely a logical negation of reflexivity, but a
kind of extreme opposite of it.
394 G R A P H T H E O RY i

If we discard irreflexivity, then we are allowing (but not enforcing) loops. If, instead,
we discard symmetry, then we have directed graphs rather than undirected graphs. If
we discard both of these, then we have a directed graph which may have loops but is
not allowed to have multiple edges. (Here, we mean that, if we have an edge (𝑣, 𝑤) from
vertex 𝑣 to vertex 𝑤, we do not have any extra edges frm 𝑣 to 𝑤, i.e., no other edges
“parallel” to (𝑣, 𝑤) and in the same direction. But we do allow the reverse edge (𝑤, 𝑣).
In directed graphs, forbidding multiple edges does not forbid us from having both (𝑣, 𝑤)
and (𝑤, 𝑣).) So we may regard any binary relation as a directed graph, possibly with
loops but with no multiple edges. Similarly, any directd graph with no multiple edges
gives rise to a binary relation.

11.4𝛼 REPRESENTiNG GRAPHS

It is natural to display graphs as diagrams, as we did in Figure 11.1, when they are small
enough for this to be practical. But diagrams have their limits. Many graphs from real
applications are too large to be depicted in a diagram on a page or a device screeen. In
any case, to run algorithms on graphs, we need to be able to store them in computer
memory. To do this, we need formal, purely symbolic ways of representing graphs.
We now consider the four main ways of representing graphs symbolically. We will
not discuss the details of using these representations in programs; that will be considered
in FIT2004 Algorithms and Data Structures.

11.4.1 Edge list

Our first representation corresponds to the formal definition of graphs that we have just
given.
An edge list of a graph is just a list of its edges. Typically, the order does not
matter, so we can think of this as a set, although inside a computer information is
always stored in some specific order.
To represent a graph completely, listing the edges alone is not sufficient, in general.
This is because it’s possible for a vertex to belong to no edges. Such a vertex will not
be apparent just from looking at all the edges. So, we should specify both the vertex
set and the set of edges. This just means specifying exactly the information required in
our formal definition of graphs.
An edge list does not necessarily group edges together according to the vertices they
share. This may mean it is less efficient, for some tasks, than other representations we
will cover shortly.

11.4.2 Adjacency matrix

The adjacency matrix of a graph 𝐺 is an 𝑛 × 𝑛 array 𝐴 of bits in which


11.4𝛼 R E P R E S E N T i N G G R A P H S 395

• 𝑛 is the number of vertices of 𝐺,


• the rows of the array correspond to the vertices of 𝐺;
• the columns of the array also correspond to the vertices of 𝐺;
• for every two vertices 𝑣 and 𝑤, the entry in the 𝑣-row and 𝑤-column of 𝐴 is

1, if 𝑣 ∼ 𝑤;
0, if 𝑣 ≁ 𝑤.

We think of 1 as meaning “edge” and 0 as meaning “no edge”. Another way to put this
is to say that every entry of the matrix gives the number of edges between the two
vertices. For simple graphs, this number must be 0 or 1, so we can represent it by a
single bit.
Here is an adjacency matrix for the graph 𝐺 in Figure 11.1. We show the vertices
corresponding to each row and column, for convenience, but they are not part of the
matrix itself.
𝑎 𝑏 𝑐 𝑑 𝑒 𝑓 𝑔

𝑎 0 1 1 0 0 0 0
⎛ ⎞
𝑏 ⎜ 1 0 1 0 0 0 0 ⎟
𝑐 ⎜ 1 1 0 1 0 0 0 ⎟
⎜ ⎟
𝑑 ⎜ 0 0 1 0 0 0 0 ⎟
⎜ ⎟
⎜ ⎟
𝑒 ⎜ 0 0 0 0 0 0 0 ⎟
⎜ ⎟
𝑓 ⎜ 0 0 0 0 0 0 1 ⎟
𝑔 ⎝ 0 0 0 0 0 1 0 ⎠

We see, for example, that 𝑎 ∼ 𝑐, since the entry in the 𝑎-row and 𝑐-column is 1, while
𝑐 ≁ 𝑓, since the entry in the 𝑑-row and 𝑓-column is 0.
We make two observations about the adjacency matrix.
• The entries in the main diagonal (top left to bottom right) are all 0, because a
vertex is never adjacent to itself.
• The matrix is symmetric about the main diagonal. This means that, for all 𝑣 and
𝑤, the entry in row 𝑣 and column 𝑤 is the same as the entry in row 𝑤 and column
𝑣. This means that the matrix is completely determined by all the entries that lie
above the main diagonal. Similarly, it is completely determined by all the entries
that lie below the main diagonal.
As we often do with tables of data, it is natural to ask about the sums of the entries
in each row and each column, and the sum of all entries in the entire matrix.
• For a given vertex 𝑣, what information about it is captured by the number of 1s
in the 𝑣-row of the adjacency matrix? What about the 𝑣-column? We return to
this question in § 11.7.
396 G R A P H T H E O RY i

• Consider the number of 1s in the entire adjacency matrix. The 1s indicate the
edges in the graph, but each edge is counted twice: edge {𝑣, 𝑤} gives 1 in the
entry for the 𝑣-row and 𝑤-column, and also for its “mirror image” entry in the
𝑤-row and 𝑣-column. So the number of 1s in the adjacency matrix is exactly twice
the number of edges of the graph.
The rules for adjacency matrices can be relaxed, if we are working with a wider class
of graphs than just simple graphs. If we allow loops, then some of the diagonal entries
can be 1. If we have a directed graph, then the entry in row 𝑣 and column 𝑤 need not
equal the entry in row 𝑤 and column 𝑣. If we allow multiple edges between two vertices,
then we can have entries that are neither 0 nor 1; the entry for row 𝑣 and column 𝑤 can
give the number of edges between 𝑣 and 𝑤.
We referred to an adjacency matrix as an array, and it can be represented in pro-
grams using two-dimensional array structures, which most programming languages have.
It can also be thought of as a “table”, with 𝑛 rows and 𝑛 columns, with the row and
column headings not included in the counts of rows and columns.
But there is a reason why we called it an adjacency matrix rather than an adjacency
“array” or “table”. The use of the term “matrix” is not just about the way it stores the
adjacency information in a two-dimensional 𝑛 × 𝑛 way. If that were all, then the terms
“2D array” and “table” would serve just as well. The term “matrix” is also about what
you can do with it. In mathematics, we can do operations with matrices, to calculate
important numbers from them or form other matrices. For adjacency matrices of graphs,
these operations can shed light on the graph, revealing aspects of its structure that might
have been hard to determine otherwise. They are also used in some algorithmic problems
on graphs. We don’t cover these graph-theoretic applications of matrices in this unit.
But it is good to be aware that they are “out there”, even if you mainly just use adjacency
matrices as a method for storing graphs.
One virtue of the adjacency matrix is that you can efficiently test whether or not
two given vertices are adjacent: you just have to look up an entry in a matrix, and
most computer representations of matrices (using 2D arrays) enable this to be done very
quickly. But other operations — like searching through all the neighbours of a vertex —
can be done more efficiently using adjacency lists, which we come to next (§ 11.4.3).
The adjacency matrix takes the same amount of space regardless of how many edges
the graph has. This may not be a problem, if the graph has many edges. But many
real-world networks are relatively “sparse”, meaning (roughly speaking) that most pairs
of vertices do not have an edge between them. If you are dealing with a class of large
sparse graphs, then the adjacency matrix representation may take up too much space in
memory.

11.4.3 Adjacency list

An adjacency list specifies the vertices in some order and, with each vertex, it gives a
list of all the vertices it is adjacent to.
11.4𝛼 R E P R E S E N T i N G G R A P H S 397

For example, the graph 𝐺 in Figure 11.1 has the following adjacency list representa-
tion.
𝑎 : 𝑏, 𝑐
𝑏 : 𝑎, 𝑐
𝑐 : 𝑎, 𝑏, 𝑑
𝑑 : 𝑐
𝑒 :
𝑓 : 𝑔
𝑔 : 𝑓
Each line nominates a vertex, followed by a list of all its neighbours. We have used a colon
here to separate each vertex, at the start of its line, from the list of all its neighbours.
But that is just a superficial detail; other ways of conveying this information can be
used, provided it is done clearly and consistently. We have used punctuation for human
readers, but when graphs are represented as adjacency lists in computers, the separation
of the different types of information is done differently, using data structures such as
arrays or lists which you will learn about in programming units.
We can see from the adjacency list that, for example, 𝑐 is adjacent to 𝑎, 𝑏 and 𝑑, so
that the neighbourhood of 𝑐 is {𝑎, 𝑏, 𝑑}. We can also readily see that 𝑒 has no neighbours
and that the graph has three leaves, namely 𝑑, 𝑓 and 𝑔.
This is probably the most widely used representation of graphs in computers. It
enables efficient searching of neighbourhoods of vertices, which is a common task in
many graph algorithms. It is compact, more so than the adjacency matrix.

11.4.4 Incidence matrix

An incidence matrix of a graph 𝐺 is an 𝑚 × 𝑛 array 𝐴 of bits such that


• 𝑚 is the number of edges of 𝐺, and 𝑛 is the number of vertices of 𝐺;

• the rows of the array correspond to the edges of 𝐺;

• the columns of the array correspond to the vertices of 𝐺;

• for every edge 𝑒 and vertex 𝑣, the entry in the 𝑒-row and 𝑣-column is

1, if 𝑒 is incident with 𝑣;
0, if 𝑒 is not incident with 𝑣.

We think of 1 as meaning “incident” and 0 as meaning “not incident”.


As with the adjacency matrix, we call this a “matrix” because some standard matrix
operations can be done with it in order to study the graph.
This representation of graphs is seldom used in computers. But it is theoretically
important and has been used to prove theorems about graphs.
398 G R A P H T H E O RY i

11.5 SUBGRAPHS

Suppose we want to focus our attention only on a portion of a graph 𝐺. If this portion
is a graph in its own right, we say it is a subgraph of 𝐺. This is not yet a precise formal
definition, because we have not defined what we mean by a “portion” of a graph.
Let 𝐺 = (𝑉, 𝐸) be a graph. A subgraph of 𝐺 is a graph 𝐹 = (𝑈, 𝐷), with vertex set
𝑈 and edge set 𝐷, such that

𝑈 ⊆ 𝑉, and 𝐷 ⊆ 𝐸.

In other words, every vertex of 𝐹 is also a vertex of 𝐺, and every edge of 𝐹 is also an
edge of 𝐺. When 𝐹 is a subgraph of 𝐺, we write 𝐹 ≤ 𝐺. In effect, we are “overloading”
the ≤ symbol so that, as well as standing for ordinary numerical inequality, it also stands
for the subgraph relation. Context should make it clear which is meant.
For example, let 𝐹 = (𝑈, 𝐷) be the graph defined by

𝑈 = {𝑎, 𝑏, 𝑐, 𝑒, 𝑔},
𝐷 = {{𝑎, 𝑏}, {𝑎, 𝑐}},

shown in Figure 11.2. Then 𝐹 is a subgraph of 𝐺, and we can write 𝐹 ≤ 𝐺.

𝑒
𝑐
𝑏 𝑔

Figure 11.2: A subgraph 𝐻 of the graph 𝐺 of Figure 11.1.

A graph is a subgraph of itself: 𝐺 ≤ 𝐺. You can check that the definition is satisfied.
But there are times when we want to exclude this possibility and only consider those
subgraphs that are not the whole graph. We say that 𝐹 is a proper subgraph of 𝐺 if
it is a subgraph of 𝐺 but is not equal to 𝐺. Symbolically, 𝐹 ≤ 𝐺 and 𝐹 ≠ 𝐺. We can
write this as 𝐹 < 𝐺.

11.6 SOME SPECiAL GRAPHS

The complete graph on 𝑛 vertices, denoted by 𝐾𝑛 , has every pair of vertices joined by
an edge.
At the other extreme, the null graph on 𝑛 vertices, denoted by 𝐾𝑛 , has no edges
at all.
11.7 D E G R E E 399

𝐾4 𝐾4 𝑃3 𝐶4

Figure 11.3: 𝐾4 , 𝐾4 , 𝑃3 , 𝐶4 .

The path graph of length 𝑛 − 1, denoted by 𝑃𝑛−1 , has 𝑛 vertices and 𝑛 − 1 edges
with the property that the vertices can be put in a sequence so that two vertices are
adjacent if and only if they are consecutive in the sequence.
The cycle of length 𝑛, denoted by 𝐶𝑛 , has 𝑛 vertices and 𝑛 edges and can be formed
from a path graph on the same vertices by adding a new edge between the first and last
vertex of the path.
Examples of these graphs, with four vertices, are shown in Figure 11.3.

11.7 DEGREE

The degree of a vertex is the number of neighbours it has. Since our graphs are simple,
it also equals the number of edges that are incident with the vertex.
We denote the degree of vertex 𝑣 in graph 𝐺 by

deg𝐺 (𝑣).

If the graph 𝐺 is clear from the context, we may drop the subscript and just write
deg(𝑣).
This gives us some alternative descriptions of some concepts we introduced earlier.
• A vertex is isolated if and only if it has degree 0.

• A vertex is a leaf if and only if it has degree 1.


If 𝑛 is the number of vertices of a graph 𝐺, then the degrees of the vertices of 𝐺 can
be any of 0, 1, 2, … , 𝑛 − 1. There are 𝑛 different possibilities for the degrees here. But it
turns out that we never get all these different degrees in the same graph. The proof is
a good example of proof by cases.
Theorem 56.
56 In every graph, there are two vertices with the same degree.
Before reading the proof of this theorem, you should draw some small graphs and
work out the degrees of their vertices. Try and construct a graph with all vertex degrees
being different, and try to get a sense of why this doesn’t seem to be possible.
400 G R A P H T H E O RY i

Proof. Let 𝐺 be any graph, and let 𝑛 be its number of vertices.


We use proof by cases.
We consider two cases, depending on whether or not 𝐺 has a vertex of degree 𝑛 − 1.

Case 1: 𝐺 does not have a vertex of degree 𝑛 − 1.


In this case, the possible degrees that vertices of 𝐺 can have are 0, 1, 2, … , 𝑛−2. There
are 𝑛 − 1 different numbers in this list. So there are only 𝑛 − 1 different degrees that the
vertices of 𝐺 can have. Yet there are 𝑛 vertices in 𝐺. So at least two of them must have
the same degree.

Case 2: 𝐺 has a vertex of degree 𝑛 − 1.


Let 𝑣 be a vertex of 𝐺 that has degree 𝑛 − 1. The only way a vertex in an 𝑛-vertex
graph can have degree 𝑛 − 1 is if it is adjacent to all other vertices. This means that all
other vertices have 𝑣 as a neighbour, so their degrees are all ≥ 1. Therefore 𝐺 has no
vertex of degree 0. Therefore all the degrees, of all the vertices of 𝐺, must be in the set
{1, 2, … , 𝑛 − 1}. This means there are only 𝑛 − 1 possible degrees that a vertex of 𝐺 can
have. But there are 𝑛 vertices in 𝐺. Therefore there must be at least two vertices in 𝐺
that have the same degree.

Now we study the sum of all the vertex degrees of a graph. One reason for studying
this is because it enables us to determine the average degree of a vertex in the graph,
which helps in understanding what the graph looks like locally.
Our main theorem on this is called the Handshaking Lemma. The name comes
the following scenario. At a social function with 𝑛 people, some pairs of people shake
hands when they first meet, others do not. How many handshakes occur? One way
to determine this is to find out, from each person, how many times they shook hands.
Adding up these individual numbers of handshakes counts each handshake twice: it
takes two hands to shake! Think of the people as vertices and the handshakes as edges.
Theorem 57 (Handshaking Lemma). Lemma) For every graph 𝐺 = (𝑉, 𝐸), the sum of the
degrees of the vertices of a graph is twice its number of edges:

 deg(𝑣) = 2𝑚,
𝑣∈𝑉

where 𝑚 is the number of edges of 𝐺.


Proof. If you rephrase the above handshaking explanation in terms of vertices and
degrees, then you have a proof of this theorem.
Alternatively, think of the adjacency matrix of 𝐺 (see § 11.4.2). For each vertex,
the number of 1s in the 𝑣-column is just the degree of 𝑣. Since the columns are disjoint
and together make up the entire matrix (i.e., they partition the matrix entries), adding
up the column counts gives the total number of 1s in the matrix. But we noted in
§ 11.4.2 that the number of 1s in the matrix is 2𝑚. Therefore the sum of all the degrees
equals 2𝑚.
11.7 D E G R E E 401

From this, we immediately obtain the average degree of the vertices of a graph.

Corollary 58.
58 For any graph 𝐺,
2𝑚
average degree of 𝐺 = ,
𝑛
where 𝑛 is the number of vertices of 𝐺 and 𝑚 is its number of edges.

Proof.

 deg(𝑣)
sum of degrees 𝑣∈𝑉(𝐺) 2𝑚
average degree of 𝐺 = = = ,
number of vertices 𝑛 𝑛
by Theorem 57.

The average degree of a graph is therefore very simple to calculate. You only need
to know the two most fundamental parameters of the graph: its number of vertices, and
its number of edges. These parameters are “global” in the sense that they pertain to
the graph as a whole rather than any part of it. But, once you have them, this simple
calculation of 2𝑚/𝑛 gives you the average degree, which tells you something about
the “local” structure of the graph, namely what happens, on average, in the immediate
vicinity of the vertices.
In principle, the average degree can be as low as 0, for graph with no edges, and as
high as 𝑛 − 1, for complete graphs on 𝑛 vertices. In practice, real-world graphs tend to
have average degrees that are much lower than the maximum possible. The worldwide
human social network currently has about 8, 000, 000, 000 people, but the average degree
is tiny by comparison. (The average degree depends on how you define the edges. If the
graph uses close friendships only, the average degree is said to be about 4.)
Another consequence of the Handshaking Lemma follows from the fact that the sum
of the degrees is even.

Corollary 59.
59 Every graph has an even number of vertices of odd degree.

Proof. Since 2𝑚 is even (where 𝑚 is the number of edges of the graph), the sum of
all the degrees is even (by Theorem 57). Now, the sum of the even degrees (i.e., those
vertex degrees that are even numbers) is also even, since the sum of even numbers is
always even. It follows that the sum of the odd degrees is even too.
Recall that

• the sum of two odd numbers is even,

• therefore the sum of an even number of odd numbers is even;

• the sum of an odd number of odd numbers is odd.


402 G R A P H T H E O RY i

It follows that there must be an even number of odd degrees, else the sum of the odd
degrees would be odd, whereas we observed above that it’s even.

Theorem 56 and Corollary 59 give some necessary conditions for a set of numbers
to be a valid set of degrees of some graph. They show that not every set of numbers
within the required range (0 to 𝑛 − 1) can be the set of degrees of a graph.

11.8 MOViNG AROUND

We often want to move from one part of a graph to another, using the edges to step
from vertex to vertex.
A walk from a vertex 𝑣 to a vertex 𝑤 is a sequence of vertices and edges,

𝑣0 , 𝑒1 , 𝑣1 , 𝑒2 , 𝑣2 , … , 𝑣𝑘−1 , 𝑒𝑘 , 𝑣𝑘 , (11.1)

where

• 𝑣0 = 𝑣, meaning that the walk starts at 𝑣,

• 𝑣𝑘 = 𝑤, meaning that the walk finishes at 𝑤,

• each 𝑣𝑖 is a vertex of 𝐺,

• each 𝑒𝑖 is an edge of 𝐺,

• ∀𝑖 ∶ 𝑒𝑖 = {𝑣𝑖−1 , 𝑣𝑖 }. This means that the edge 𝑒𝑖 links the two vertices listed on
either side of it. So, to go from 𝑣𝑖−1 to 𝑣𝑖 , we step along edge 𝑒𝑖 = {𝑣𝑖−1 , 𝑣𝑖 }.

The length of a walk is the number of edges in it. The shortest possible walk is the
walk of length zero, which consists of just a single vertex and no edges.
For example, consider in the graph in Figure 11.4. Here are some examples of walks
in this graph.

𝑎, {𝑎, 𝑏}, 𝑏, {𝑏, 𝑑}, 𝑑, {𝑑, 𝑒}, 𝑒, {𝑒, 𝑓}, 𝑓 a walk of length 4 from 𝑎 to 𝑓
𝑎, {𝑎, 𝑏}, 𝑏, {𝑏, 𝑑}, 𝑑, {𝑑, 𝑐}, 𝑐, {𝑐, 𝑏}, 𝑏, {𝑏, 𝑒}, 𝑒 a walk of length 5 from 𝑎 to 𝑒
𝑏, {𝑏, 𝑒}, 𝑒, {𝑒, 𝑓}, 𝑓, {𝑒, 𝑓}, 𝑒, {𝑒, 𝑑}, 𝑑 a walk of length 4 from 𝑏 to 𝑑
𝑎, {𝑎, 𝑏}, 𝑏, {𝑏, 𝑐}, 𝑐, {𝑐, 𝑎}, 𝑎 a closed walk of length 3 from 𝑎 to 𝑎
𝑎, {𝑎, 𝑏}, 𝑏, {𝑏, 𝑐}, 𝑐, {𝑐, 𝑏}, 𝑏, {𝑏, 𝑐}, 𝑐, {𝑐, 𝑎}, 𝑎 a closed walk of length 5 from 𝑎 to 𝑎
𝑎, {𝑎, 𝑐}, 𝑐, {𝑐, 𝑏}, 𝑏, {𝑏, 𝑑}, 𝑑, {𝑑, 𝑒}, 𝑒, {𝑒, 𝑏}, 𝑏, {𝑏, 𝑎}, 𝑎 a closed walk of length 6 from 𝑎 to 𝑎
𝑏, {𝑏, 𝑐}, 𝑐, {𝑐, 𝑑}, 𝑑, {𝑑, 𝑏}, 𝑏, {𝑏, 𝑎}, 𝑎, {𝑎, 𝑐}, 𝑐, {𝑐, 𝑑}, 𝑑, {𝑑, 𝑒}, 𝑒 a walk of length 7 from 𝑏 to 𝑒
𝑐 a walk of length 0 from 𝑐 to 𝑐

Note, in particular, two key freedoms we have in walking around a graph.

• We are allowed to visit a vertex more than once.


11.8 M O V i N G A R O U N D 403

𝑏
𝑎

𝑓
𝑒
𝑐
𝑑

Figure 11.4: A graph.

• We are even allowed to move along an edge more than once, and there is no
requirement to always move along a particular edge in the same direction. For
example, we could move along the edge {𝑎, 𝑏} from 𝑎 to 𝑏, and later in the walk we
could move along it from 𝑏 to 𝑎 (and/or we could go along it from 𝑎 to 𝑏 again).
– Note that, if we use an edge twice, we must necessarily use each of its end-
points at least twice too. But it is possible, in many graphs, to visit some
vertices more than once without using any edge more than once.

For simple graphs (which are our focus), there can only be one edge between two
specific vertices. So, if our walk steps from vertex 𝑎 to vertex 𝑏, then it must do so
along the edge {𝑎, 𝑏}, and there is only one such edge. So, in fact, the edge we use to go
from 𝑎 to 𝑏 is completely determined by the two vertices. This means that, to specify a
walk in a simple graph, it is sufficient to specify its sequence of vertices. So, our walk
in (11.1) could be specified by just listing the vertices in order:

𝑣 = 𝑣0 , 𝑣1 , 𝑣2 , … , 𝑣𝑘−1 , 𝑣𝑘 = 𝑤.

Let us do this for our examples of walks for the graph in Figure 11.4.

𝑎, 𝑏, 𝑑, 𝑒, 𝑓 a walk of length 4 from 𝑎 to 𝑓


𝑎, 𝑏, 𝑑, 𝑐, 𝑏, 𝑒 a walk of length 5 from 𝑎 to 𝑒
𝑏, 𝑒, 𝑓, 𝑒, 𝑑 a walk of length 4 from 𝑏 to 𝑑
𝑎, 𝑏, 𝑐, 𝑎 a closed walk of length 3 from 𝑎 to 𝑎
𝑎, 𝑏, 𝑐, 𝑏, 𝑐, 𝑎 a closed walk of length 5 from 𝑎 to 𝑎
𝑎, 𝑐, 𝑏, 𝑑, 𝑒, 𝑏, 𝑎 a closed walk of length 6 from 𝑎 to 𝑎
𝑏, 𝑐, 𝑑, 𝑏, 𝑎, 𝑐, 𝑑, 𝑒 a walk of length 7 from 𝑏 to 𝑒
𝑐 a walk of length 0 from 𝑐 to 𝑐

But if we are walking in graphs that may have multiple edges, then we need to specify
the edges between the vertices as well as the vertices themselves.
A walk is closed if its start and end vertices are the same. With our above notation,
this is when 𝑣 = 𝑤, i.e., 𝑣0 = 𝑣𝑘 . In our examples of walks in the graph of Figure 11.4,
the closed walks are the following.
404 G R A P H T H E O RY i

𝑎, 𝑏, 𝑐, 𝑎
𝑎, 𝑏, 𝑐, 𝑏, 𝑐, 𝑎
𝑎, 𝑐, 𝑏, 𝑑, 𝑒, 𝑏, 𝑎
𝑐
We now consider walks that are more restricted, in the sense that we forbid one or
both of the two freedoms we mentioned above.
A trail is a walk in which no edge is used more than once. Using (11.1), this means
that all the edges
{𝑣0 , 𝑣1 }, {𝑣1 , 𝑣2 }, {𝑣2 , 𝑣3 }, … , {𝑣𝑘−1 , 𝑣𝑘 }
are different. We cannot get around this restriction by going along an edge once in one
direction and then again later in the other direction; this still counts as using the edge
twice, which is forbidden for trails.
So, for trails, the second of our two walking freedoms (i.e., the freedom to use an
edge more than once) is prohibited. But we are still allowed to repeat vertices if we wish
(although we don’t have to).
In the graph of Figure 11.4, the following walks are trails:
𝑎, 𝑏, 𝑑, 𝑒, 𝑓
𝑎, 𝑏, 𝑑, 𝑐, 𝑏, 𝑒
𝑎, 𝑏, 𝑐, 𝑎
𝑎, 𝑐, 𝑏, 𝑑, 𝑒, 𝑏, 𝑎
𝑐
The following walks are not trails, for the reasons indicated.
𝑏, 𝑒, 𝑓, 𝑒, 𝑑 re-uses edge {𝑒, 𝑓}
𝑎, 𝑏, 𝑐, 𝑏, 𝑐, 𝑎 re-uses edge {𝑏, 𝑐}
𝑏, 𝑐, 𝑑, 𝑏, 𝑎, 𝑐, 𝑑, 𝑒 re-uses edge {𝑐, 𝑑}

A closed trail is just a closed walk that is also a trail. So, it finishes at the same
vertex where it starts, and uses each edge at most once (with repeat visits to vertices
being allowed).
In our list of examples for the graph of Figure 11.4, the following walks are closed
trails:
𝑎, 𝑏, 𝑐, 𝑎
𝑎, 𝑐, 𝑏, 𝑑, 𝑒, 𝑏, 𝑎
𝑐
A path is a trail in which no vertex or edge is repeated. So, both our walking
freedoms (reusing vertices and reusing edges) are now prohibited.
In the graph of Figure 11.4, the following walks are paths:
𝑎, 𝑏, 𝑑, 𝑒, 𝑓
𝑐
11.8 M O V i N G A R O U N D 405

Note that the path 𝑎, 𝑏, 𝑑, 𝑒, 𝑓 from 𝑎 to 𝑓 is not a shortest path from 𝑎 to 𝑓. The
shortest path from 𝑎 to 𝑓 is 𝑎, 𝑏, 𝑒, 𝑓, which has length 3. In this graph, there is a unique
shortest path from 𝑎 to 𝑓, but this does not always happen. For example, there are two
shortest paths from 𝑎 to 𝑑.

To help remember the distinctions between walks, trails and paths, imagine hiking
in the countryside and think of how these three terms get stronger in what they imply
about where you can go. You may be able to walk wherever you like2 ; it’s something
you do, rather than something that is laid out for you. A trail may be a simple, rough
track, more definite than an arbitrary walk but maybe lacking in official status. People
will tend to stay on a trail rather than wander off it. A path may be a more definite,
and may have been laid down by others or by some authority. Perhaps it has a special
surface; perhaps there is more expectation that you stick to it; perhaps it is more well
documented. So, in ordinary usage, the terms “walk”, “trail” and “path” get stronger,
and more restrictive, as you go from one to the next. Similarly, in graphs, the terms get
stronger and more restrictive, too. (The analogy here is very loose. It’s only designed
to help remember the order in which the terms get more restrictive. But this should
still help you remember which is which.)

Whenever there is a walk between vertices two vertices in a graph, there is also a
path between them. In fact, the shortest walk between two vertices is always a path.
(Why?)
The distance between two vertices is the length of the shortest path between them.
This is the same as the length of the shortest walk between them, by the observations
in the previous paragraph.

The prohibition on repeating vertices in paths means that a path cannot start and
end at the same vertex. But we also want a way to talk about “paths” that can do that.
A cycle is a closed trail in which no vertex except the first and last is repeated.
Because it is closed, the first and last vertex must be the same, but that vertex must
not appear anywhere else on the cycle. We might think of a cycle as a “closed path”
although, strictly speaking, that term is self-contradictory as a path cannot be closed.

Although a cycle is much more restricted, in its allowed repetitions, than an arbitrary
closed walk, it is interesting to note that, when their lengths are odd, the existence of
one implies the existence of the other.

Theorem 60.60 Let 𝐺 be any graph. 𝐺 has an odd closed walk if and only if it has an
odd cycle.

Proof.
(⟸)
2 subject, of course, to any rules in force, e.g., in nature reserves or on private property
406 G R A P H T H E O RY i

Suppose 𝐺 has an odd cycle. Now, every cycle is a closed walk. So an odd closed
cycle is an odd closed walk. Therefore 𝐺 has an odd closed walk.
(⟹)
Suppose 𝐺 has an odd closed walk. Let 𝑊 be the shortest odd closed walk in 𝐺.
The sequence of vertices of 𝑊 may be written as

𝑣0 , 𝑣1 , 𝑣2 , … , 𝑣𝑘−1 , 𝑣𝑘 = 𝑣0 ,

where 𝑘 is odd.
If no vertex of 𝐺 except the start/end vertex repeats in 𝑊, then 𝑊 is already an
odd cycle, and we are done.
We now prove, by contradiction, that no vertex of 𝐺 except the start/end vertex
repeats in 𝑊. From this it will follow that 𝑊 is actually an odd cycle.
Assume, by way of contradiction, that there is a vertex that is repeated in 𝑊. We
don’t want 𝑣0 = 𝑣𝑘 to count as a repeat here, so we actually mean a vertex in the slightly
shorter list
𝑣0 , 𝑣1 , 𝑣2 , … , 𝑣𝑘−1
that is repeated.
Let 𝑣𝑖 be the first vertex in this shorter list that appears again later in the list
(but not earlier). Suppose it appears later in the shorter list as 𝑣𝑗 . So 𝑣𝑖 = 𝑣𝑗 , and
0 ≤ 𝑖 < 𝑗 ≤ 𝑘 − 1.
Using the vertices 𝑣𝑖 and 𝑣𝑗 , we can divide the closed walk 𝑊 into two shorter closed
walks:

• 𝑣𝑖 , 𝑣𝑖+1 , … , 𝑣𝑗−1 , 𝑣𝑗 .

• 𝑣𝑗 , 𝑣𝑗+1 , … , 𝑣𝑘−1 , 𝑣0 , 𝑣1 , … , 𝑣𝑖−1 , 𝑣𝑖 .


– Keep in mind here that 𝑣0 = 𝑣𝑘 .

Recall that the length of 𝑊 is odd. Since we just divided 𝑊 into two “subwalks”, the
length of 𝑊 must equal the sum of the lengths of those two subwalks. Therefore, one
of those subwalks has odd length and the other has even length. Both are closed walks,
so the one that has odd length is an odd closed walk. Since both the subwalks are
shorter than 𝑊, we have constructed an odd closed walk in 𝐺 that is shorter than
𝑊. This contradicts our choice of 𝑊 as the shortest odd closed walk in 𝐺. So our
initial assumption, that there is a vertex 𝑣𝑖 with 𝑖 < 𝑘 that is repeated in 𝑊, is wrong.
Therefore there is no vertex repetition in 𝑊 except for 𝑣0 = 𝑣𝑘 . Therefore 𝑊 is an odd
cycle.

This theorem would not be true if we replaced “odd” by “even” in the theorem
statement. To see this, consider the complete graph on two vertices, 𝐾2 . Call its vertices
𝑣 and 𝑤. This graph has no cycles at all (even or odd). But it has an even closed walk:
𝑣, 𝑤, 𝑣. This closed walk is not a cycle, because the edge {𝑣, 𝑤} is used twice.
11.9 C O N N E C T i V i T Y 407

It is instructive to read through the proof of Theorem 60 again to find the point
where the proof would break down if “odd” were replaced by “even” throughout.
A cycle in a graph is Hamiltonian if it includes every vertex. Hamiltonian cycles
are used in planning routes which must visit every location in some set, without visiting
any location twice, and returning to the starting point. For example, suppose you must
visit every city in some region exactly once, returning to your starting point, while
using available transport links (roads or train lines, as the case may be) to go between
cities. Then your problem is to find a Hamiltonian cycle in the graph whose vertices
represent cities with edges representing available transport links between cities. In
practice, each edge may have an associated distance or cost, and you may also want to
find a Hamiltonian cycle with minimum total cost (assuming the total cost of the cycle
is the sum of the costs of its edges). The problem of finding the Hamiltonian cycle of
minimum total cost goes back to the 19th century and is traditionally known as the
Travelling Salesman Problem (TSP).

11.9 CONNECTiViTY

Two vertices 𝑣 and 𝑤 in a graph 𝐺 are connected if there is a path from 𝑣 to 𝑤 in 𝐺.


This in turn holds if and only if there is a walk from 𝑣 to 𝑤 in 𝐺.
Because of this terminology, which relates vertices that may be a long way away
from each other in a graph, the term “connected” must not be used as a synonym for
“adjacent”.
A graph 𝐺 is connected if, for every pair of vertices 𝑣 and 𝑤, there is a path from
𝑣 to 𝑤 in 𝐺.
A component of 𝐺 is a maximal connected subgraph of 𝐺, meaning that is not a
proper subgraph of any other connected subgraph of 𝐺.
For example, in the graph in Figure 11.1 has three components: one on the left, one
in the middle, and one on the right. Here they are in (vertex-set,edge-set) format.
• ({𝑎, 𝑏, 𝑐, 𝑑}, {{𝑎, 𝑏}, {𝑎, 𝑐}, {𝑏, 𝑐}, {𝑐, 𝑑}})
• ({𝑒}, ∅)
• ({𝑓, 𝑔}, {{𝑓, 𝑔}})
The subgraph ({𝑎, 𝑏, 𝑐}, {{𝑎, 𝑏}, {𝑎, 𝑐}, {𝑏, 𝑐}}) — which is the triangle on the left in
Figure 11.1 — is not a component of 𝐺. Although it is a connected subgraph of 𝐺, it
is not a maximal connected subgraph of 𝐺 because it is a proper subgraph of another
connected subgraph of 𝐺, namely the first of the three listed above.
Connectivity gives a binary relation on the vertices of a graph. Let 𝐺 = (𝑉, 𝐸) be a
graph. Define the relation 𝑅 on 𝑉 as follows. For all 𝑣, 𝑤 ∈ 𝑉,

𝑣𝑅𝑤 ⟺ there is a walk from 𝑣 to 𝑤 in 𝐺.

We claim this is an equivalence relation on 𝑉.


408 G R A P H T H E O RY i

• It is reflexive because, for any vertex 𝑣, the zero-length walk relates 𝑣 to itself, so
𝑣𝑅𝑣.

• It is symmetric because a walk from 𝑣 to 𝑤 can be reversed to give a walk from 𝑤


to 𝑣. So 𝑣𝑅𝑤 ⇒ 𝑤𝑅𝑣.

• It is transitive because, if 𝑢𝑅𝑣 and 𝑣𝑅𝑤, then we have a walk from 𝑢 to 𝑣 and
another walk from 𝑣 to 𝑤, and we can put them together at 𝑣 to make a walk from
𝑢 to 𝑤, showing that 𝑢𝑅𝑤.

Since 𝑅 is an equivalence relation, its equivalence classes partition 𝑉 (Theorem 12). In


fact, the equivalence classes are the vertex sets of the components of 𝐺.

11.10 B i PA RT i T E G R A P H S

A graph 𝐺 = (𝑉, 𝐸) is bipartite if its vertex set 𝑉 caqn be written as a disjoint union

𝑉 = 𝐴 ⊔𝐵

of two parts, 𝐴 and 𝐵, such that every edge of the graph has one endpoint in 𝐴 and the
other in 𝐵.3
We often draw bipartite graphs with all the vertices in one part on the left side and
all the vertices in the other part on the right, with edges going between them as required.
Two examples are given in Figure 11.5.

(a) (b)

Figure 11.5: Two bipartite graphs. The one on the right, (b), is the complete bipartite graph 𝐾2,3 .

3 This might not be a partition of 𝑉 because one of the parts may be empty, although that could only happen
if 𝐺 had no edges. As long as the graph has at least one edge, then the sets 𝐴 and 𝐵 are both nonempty
and they form the two parts of a partition of 𝑉.
11.10 B i PA RT i T E G R A P H S 409

A complete bipartite graph is a bipartite graph in which every pair of vertices


on different sides are adjacent. This means that, if the two sides are 𝐴 and 𝐵 (as above),
then the edge set is
{{𝑎, 𝑏} ∶ 𝑎 ∈ 𝐴 ∧ 𝑏 ∈ 𝐵}.
If the two sides have 𝑝 and 𝑞 vertices then we denote the complete bipartite graph by
𝐾𝑝,𝑞 . The graph in Figure 11.5 is the complete bipartite graph 𝐾2,3 .
Bipartite graphs are widely used, especially to model situations where there are two
kinds of entities and we must consider how to allocate them to each other subject to con-
straints of some kind. For example, if 𝐴 is a set of jobs and 𝐵 is a set of candidates, then
the edges of the graph may represent which candidates are suitable (or have applied) for
which jobs. If 𝐴 is a set of students and 𝐵 is a set of subjects, edges could be used to
represent which student is doing which subject, or which student wants to enrol in which
subject. (The diagram of the employer function in Figure 2.1 shows a graph of this type.)

Another way of viewing bipartite graphs is via 2-colourings.


A 2-colouring of a graph 𝐺 = (𝑉, 𝐸) is a function that assigns one of two colours
to each vertex of 𝐺 such that adjacent vertices are assigned different colours.
Theorem 61.
61 A graph is bipartite if and only if it has a 2-colouring.
Proof.
(⟹)
Suppose 𝐺 is bipartite, and let 𝐴 and 𝐵 be the two sides as above, so all edges
have one endpoint in 𝐴 and the other in 𝐵. Define a function which colours all vertices
in 𝐴 Black and all vertices in 𝐵 White. Then every edge has one endpoint coloured
Black (because it is in 𝐴) and the other endpoint coloured White (because it is in 𝐵).
Therefore adjacent vertices always get different colours. Therefore this function is a
2-colouring of 𝐺.

(⟸)
Suppose 𝐺 = (𝑉, 𝐸) has a 2-colouring 𝑓 ∶ 𝑉 → {Black, White}. Let 𝐴 = 𝑓 −1 (Black)
be the set of all vertices coloured Black, and let 𝐵 = 𝑓 −1 (White) be the set of all vertices
coloured White. Every vertex of 𝑉 belongs to one of these two sets, since every vertex
is mapped to one of those two colours. Furthermore, no vertex can belong to both sets,
because 𝑓 is a function and therefore cannot assign two different values to any element of
its domain. Therefore 𝑉 = 𝐴⊔𝐵. Also, since adjacent vertices get different colours under
𝑓, adjacent vertices must belong to different sets (𝐴 or 𝐵); they cannot both belong to
the same set, for then they would be given the same colour by 𝑓. These properties of
the two parts 𝐴 and 𝐵 show that 𝐺 is bipartite with 𝐴 and 𝐵 as the two sides.

Theorem 62.
62 A graph is bipartite if and only if it has no odd closed walk.
Proof.
(⟹)
410 G R A P H T H E O RY i

If 𝐺 is bipartite, then every walk must alternate between vertices in 𝐴 and vertices
in 𝐵. So, it goes from a vertex in 𝐴, to a vertex in 𝐵, to one in 𝐴, to one in 𝐵, and so
on, possibly with repetition of vertices and edges. At every stage, if we are at a vertex
in 𝐴, then two steps later we are back in 𝐴. This alternation ensures that a closed walk
has even length. Therefore 𝐺 has no odd closed walk.

(⟸)
If 𝐺 has no odd closed walk, then let us colour the vertices of 𝐺 as follows. In each
component of 𝐺, we do the following.

1. Choose one vertex in the component as the “ground vertex” of that component;
this will be the first vertex in the component to be coloured. Call it 𝑔.

2. Colour 𝑔 Black.

3. Then colour every other vertex 𝑣 of the component as follows.

Black, if the distance from 𝑔 to 𝑣 is even;


colour of 𝑣 = 
White, if the distance from 𝑔 to 𝑣 is odd.

In fact, colouring 𝑔 Black is a special case of this, since 0 is even.

Once this is done, for all components of 𝐺, we will have given a colour to every
vertex of 𝐺.
We claim that this is a 2-colouring of 𝐺. We prove this by contradiction.
Assume, by way of contradiction, that our assignment of colours is not a 2-colouring
of 𝐺. Then there must be two adjacent vertices that get the same colour. Let 𝑋 be
this colour, which could be Black or White. Let 𝑣 and 𝑤 be these two adjacent vertices,
each with colour 𝑋 . Since they are adjacent, they must belong to the same component
of 𝐺. Let 𝑔 be the ground vertex for that component.
Key observation: the distances from 𝑔 to 𝑣, and from 𝑔 to 𝑤, have the same parity.
In other words, they are both even or both odd. This is because the colours of 𝑣 and
𝑤 are completely determined by the parity of their distances from 𝑔, so the fact that
they get the same colour implies that the parities of these distances are the same. Since
these two distances have the same parity, their sum is even.
Let 𝑃 be a shortest path in 𝐺 from 𝑔 to 𝑣, and let 𝑄 be a shortest path in 𝐺 from 𝑔
to 𝑤. The lengths of these two paths have the same parity, as explained in the previous
paragraph. So
length of 𝑃 + length of 𝑄
is even. Now consider the walk formed by starting at 𝑣, going along 𝑃 to 𝑣, then
stepping along edge {𝑣, 𝑤} to 𝑤, then going along 𝑄 from 𝑤 back to 𝑔, finishing up at
𝑔. This is a closed walk, since it finishes where it starts. Its length is

length of 𝑃 + length of 𝑄 + 1,
11.11 E U L E R T O U R S 411

where we now have an extra +1 because of the edge {𝑣, 𝑤}. We have already seen that
the sum of the lengths of 𝑃 and 𝑄 is even. It follows that the length of this closed walk
is odd.
So we have constructed an odd closed walk in 𝐺. But this is a contradiction, since
this part of the proof starts with a graph with no odd closed walk. Therefore our
assumption, that our assignment of colours is not a 2-colouring of 𝐺, is wrong. Therefore
it is, in fact, a 2-colouring. Therefore 𝐺 is 2-colourable. Therefore 𝐺 is bipartite (using
Theorem 61).4

Corollary 63.
63 A graph is bipartite if and only if it has no odd cycle.

Proof. This follows from Theorem 60 and Theorem 62.

11.11 EULER TOURS

An Euler tour is a closed trail that uses every edge. Since it is a trail, this means each
edge gets used exactly once. Since it is closed, it finishes at its starting vertex.
This concept goes back to what is often considered to be the start of graph theory.5
In the city of Königsberg in Prussia (now Kaliningrad, in Russia), there were seven
bridges linking two islands in the River Pregel to each other and to the two sides of the
river. It was a popular pastime to try to do a walk which crossed each bridge exactly
once and returned to its starting point. No-one had succeeded in doing this, so it was
widely believed to be impossible, but no-one had pinned down why it couldn’t be done.
The great Swiss-German mathematician Leonhard Euler tackled this problem, modelled
it using a graph, developed some theory, and solved it, showing that such a tour around
the seven bridges was impossible. (Euler also developed some of the number theory we
have done: § 7.12 and Theorem 38.) He first presented his work to the St Petersburg
Academy of Sciences in 1735 and it was published in 1736.
A map showing the seven bridges, taken from Euler’s paper, is shown in Figure 11.6.6

From this geographic setting, Euler constructed the graph shown in Figure 11.7.
This is a classic case of abstraction: identifying the essential elements of the problem
at hand (the relationship between the bridges and the land masses) and omitting all

4 We only used 2-colourings in this part of the proof since it seemed easiest and neatest to write about giving
colours to vertices rather than putting them in 𝐴 or 𝐵. But it would not have made much difference. Had
we done everything in terms of 𝐴 and 𝐵, we would not have needed to appeal to Theorem 61) at the end.
5 This may be true for graph theory as a field of study. But there are graph-theoretic definitions and concepts
going back much further in time. For example, connectivity is used in the ancient game of Go, known as
Igo in Japan, Baduk in Korea and Wéiqí in China. This game was invented in China about 2,500 years
ago, and can be played on any graph.
6 Diagram taken from: Leonhard Euler, Solutio problematis ad geometriam situs pertinentis, Commentarii
Academiae Scientiarum Imperialis Petropolitanae, vol. 8 (1736) 128–140, with figures between pp. 158
& 159; this copy is from the copy in the Biodiversity Heritage Library, https://www.biodiversitylibrary.
org/.
412 G R A P H T H E O RY i

Figure 11.6: Diagram of the Seven Bridges of Königsberg, from Leonhard Euler’s paper published
in 1736.

irrelevant information (the sizes and shapes of the land masses, the lengths and widths
of the bridges, the curves of the river, etc.).

𝑐 𝑔
𝑑
𝑒
𝐴 𝐷

𝑏
𝑎 𝑓

Figure 11.7: A graph representing the Seven Bridges of Königsberg.

The graph is a precise model of the Königsberg Bridges problem, since a walk around
all the bridges of the required type exists if and only if the graph has an Euler tour.
At this point, you can try to construct an Euler tour for this graph and, through
this exploration, try and gain some insight as to why it is not possible.
Euler not only solved the problem for the Königsberg Bridges graph, but provided a
general characterisation of graphs that have Euler tours, giving a simple test that does
not require an exhaustive search.
11.11 E U L E R T O U R S 413

Theorem 64 (Euler, 1736).


1736) A connected graph 𝐺 has an Euler tour if and only if
every vertex has even degree.

Proof.
(⟹)
Let 𝐺 be a connected graph with an Euler tour. An Euler tour is a trail, and when a
trail passes through a vertex, it uses two of the edges incident with that vertex: one on
the way in, and one on the way out. An Euler tour uses each edge of the graph exactly
once. It follows that it must revisit each vertex as many times as necessary in order to
use each of its incident edges exactly once, and it uses two edges for each visit, so the
number of incident edges at the vertex must be a multiple of 2. Therefore each vertex
degree is even.
This reasoning applies to the vertex where the tour starts and ends, too, provided
we consider the tour to be entering the vertex at the end (via the last edge of the tour)
and leaving it at the start (via the first edge of the tour). Alternatively, we can treat
that vertex as a special case, noting that we use one of its incident edges at the start and
another at the very end, making two edges so far. Then, every other visit to that vertex
uses two of its incident edges, so again, the number of incident edges at that vertex must
be a multiple of 2, by the reasoning of the previous paragraph.

(⟸)
Let 𝐺 be a connected graph in which every vertex has even degree.
We give an algorithm for constructing an Euler tour for 𝐺.

1. Initialisation:
a) Choose any vertex 𝑣 of 𝐺.
b) Let 𝑇 be the trivial closed trail consisting just of the vertex 𝑣, with no edges.

2. While there is at least one edge we have not yet used in 𝑇:


a) Find a vertex in 𝑇 that has at least one edge that we have not yet used in 𝑇.
Call it 𝑣.
• There must be at least one such vertex. To see this, let 𝑒 be any edge that
we have not yet used (which might be elsewhere in the graph; it might
not be incident with any vertex of 𝑇). If 𝑒 has an endpoint in 𝑇, then
this endpoint is already a vertex in 𝑇 with at least one incident unused
edge. So suppose both the endpoints of 𝑇 are not in 𝑇. Let 𝑢 be one of
the endpoints of 𝑇 and let 𝑤 be any vertex in 𝑇. Since 𝐺 is connected,
there is a path 𝑃 from 𝑢 to 𝑤 in 𝐺. Let 𝑣 be the first vertex of 𝑇 that
we encounter on 𝑃, as we go along it from 𝑢 towards 𝑤. (This might be
𝑤 itself, if the path doesn’t meet 𝑇 until it ends up at 𝑤, or it might be
somewhere else in 𝑇.) The edge of 𝑃 that we go along to arrive at 𝑣 for
the first time cannot be in the closed trail 𝑇, else its other endpoint is in
414 G R A P H T H E O RY i

𝑇 too and we would have met 𝑇 earler. So this vertex 𝑣 is a vertex in 𝑇


with at least one incident edge not in 𝑇.
b) We now construct a new closed trail 𝑈 that starts and finishes at 𝑣 and which
has no edges in common with 𝑇 (although it can use vertices of 𝑇, including
𝑣0. Starting at 𝑣, walk along an edge incident at 𝑣 that is not in 𝑇. (We
established above that this is possible.) Continue walking along edges not in
𝑇. Since 𝑇 is a closed trail, it uses an even number of edges at each vertex,
so there will always be an even number of unused edges at each vertex. So,
if in 𝑈 we enter a vertex via an unused edge, there must be another unused
edge which we can use in order to leave that vertex. Therefore we can always
keep going, unless we return to 𝑣, in which case it’s possible we may have
no more unused edges from 𝑣 to use. This new closed trail can only stop at
𝑣, and because the graph is finite, it must therefore stop at 𝑣 eventually. So
we have constructed a new closed trail, 𝑈, starting and finishing at 𝑣, which
has no edges in common with 𝑇 (because we only ever used edges that were
unused by 𝑇).
c) We now combine the closed trails 𝑇 and 𝑈 to make a new closed trail. Starting
anywhere on 𝑇, we follow 𝑇 until we first reach 𝑣. Then we follow 𝑈 all the
way until we return to 𝑣 for the last time, having stepped along all edges of
𝑈. Then we resume following 𝑇, and we follow it all the rest of the way until
we return to our starting vertex for the last time, having used every vertex
of 𝑇.
d) We now redefine 𝑇, so that it is now the new name for this combined closed
trail, formed from the old 𝑇 and the extra closed trail 𝑈.

3. The algorithm finishes when there is no edge of 𝐺 that has not been used by the
closed trail 𝑇. Then, the closed trail 𝑇 necessarily uses each edge of 𝐺 exactly
once, and it is therefore an Euler tour. So we output 𝑇.

Motivated by this theorem, we call a graph Eulerian if it contains an Euler tour.


With this terminology, Theorem 64 says that a connected graph is Eulerian if and only
if every vertex of it has even degree.
Theorem Theorem 64 also works for graphs with multiple edges. In fact it also works
for graphs that may have loops, provided that each loop contributes 2 to the degree of
its vertex. (Why?)

Euler tours can be used in planning routes in other contexts, too.


Suppose you and a colleague are delivering items into mailboxes outside people’s
houses in some area. You walk along one side of each street, and your colleague walks
along the other side in the same direction, keeping roughly alongside each other. You
11.11 E U L E R T O U R S 415

would prefer to not walk again along any stretch of road where you have already done
your deliveries, because that seems like wasted time and effort.
You can model this situation by a graph, where vertices represent intersections of
streets and edges represent segments of streets between intersections (with these seg-
ments having no other intersections in the middle of them, otherwise they are really
more than one segment). You would like to find an Euler tour in this graph.
The Euler tour problem has various generalisations that also have practical applica-
tions. For example, you could try to find a closed walk which includes every edge and
re-uses the fewest edges. Or the edges might be weighted, and you could try to find the
closed walk of minimum total weight.

If a connected graph 𝐺 does not have an Euler tour, then it must have some vertices
of odd degree. By Corollary 59, the number of vertices of odd degree must be even.
Suppose that the number of vertices of odd degree is 2𝑘, where 𝑘 ∈ ℕ. We can, if we
wish, pair these odd-degree vertices off with each other, however we like. Let us call the
vertices in the 𝑖-th pair 𝑣𝑖 and 𝑤𝑖 . So the pairs are

𝑣1 and 𝑤1 ,
𝑣2 and 𝑤2 ,
⋮ ⋮ ⋮
𝑣𝑘 and 𝑤𝑘 ,

making 2𝑘 odd-degree vertices altogether. Note, each of these pairs may or may not be
adjacent.
Now suppose we add new edges linking each of these pairs of vertices: one between
𝑣1 and 𝑤1 , another between 𝑣2 and 𝑤2 , and so on, making 𝑘 new edges altogether. This
may create some multiple edges, but that’s ok. Once we’ve done this, all degrees are
even. By Theorem 64 (or rather, its extension to multigraphs), this new graph has an
Euler tour. This tour includes each of the edges of 𝐺 exactly once, plus each of the 𝑘
new edges exactly once. If we then delete all those new edges, then the tour is broken
into 𝑘 trails such that each edge of 𝐺 appears exactly once in exactly one of these trails
(and not at all in any of the others). In other words, these trails give a partition of the
edges of 𝐺. Each of these 𝑘 Euler trails must start at one vertex of odd degree and end
at another vertex of odd degree.
We have shown that the edge set of a graph 𝐺 with 2𝑘 vertices of odd degree can
be partitioned into 𝑘 trails, and that the 2𝑘 endpoints of the trails are precisely the 2𝑘
odd-degree vertices of the graph. Furthermore, it is not possible to partition the edges
of 𝐺 into fewer than 𝑘 trails. (Proving this, using the concepts we have been discussing,
is a good exercise.)
We have outlined the proof of the following extension of Theorem 64.
416 G R A P H T H E O RY i

Theorem 65. 65 For any 𝑘 ∈ ℕ, the edges of a graph can be partitioned into 𝑘 trails if
and only if the graph has at most 2𝑘 vertices of odd degree. The open trails in such a
partition start and end at odd-degree vertices. □

11.12 EXERCiSES

1. Let 𝑉 and 𝐸 be two finite sets. We’ll call the members of 𝑉 “vertices” and the
members of 𝐸 “edges”.
Suppose that
• 𝑣, 𝑤 and 𝑧 are variables with domain 𝑉;

• 𝑒, and 𝑓 are variables with domain 𝐸;

• incident is a binary predicate whose first argument has domain 𝑉 and whose second
argument has domain 𝐸.
It is our intention to use these ingredients (i.e., the given sets 𝑉 and 𝐸, the variables
and the predicate) to describe graphs and some of their properties. In particular, we
intend that incident(𝑣, 𝑒) means that the vertex 𝑣 is incident with the edge 𝑒. But we
need to establish logical rules to ensure that these intentions are actually carried out.

Write expressions in predicate logic with the following meanings.


(a) Every edge is incident with at least one vertex.

(b) No edge is incident with exactly one vertex (i.e., there are no loops).

(c) Every edge is incident with exactly two vertices.

(d) There are no multiple edges.

(e) (𝑉, 𝐸) represents a simple graph.

(f) There are no isolated vertices.

2. Consider the four types of graph representation we looked at in § 11.4𝛼 : edge lists,
adjacency matrices, adjacency lists, and incidence matrices. Suppose that
• the graphs we want to represent have 𝑛 vertices and 𝑚 edges,

• their vertices are named 1 to 𝑛,

• we represent vertex numbers in binary, using ⌊log2 𝑛⌋ + 1 bits for each vertex
number. (This is the number of bits required to represent 𝑛 in binary, and we
assume that smaller numbers have extra leading zeros if necessary to ensure that
all vertex numbers have the same number of bits.)
11.12 E X E R C i S E S 417

Under these assumptions, determine the total number of bits used to represent a graph
using
(a) an edge list, with the vertex sets and edge sets listed in full;

(b) an adjacency matrix;

(c) an adjacency list;

(d) an incidence matrix.


In each case, then give a simple upper bound in big-O notation, again just in terms of
𝑛 and 𝑚, and without using the floor or ceiling functions.

3. Rewrite the proof of Theorem 56 so that the two cases are based on whether or
not the graph has an isolated vertex.

4. Determine (a) the number of edges, and (b) the average degree of the graph of the
caffeine molecule.
Here, vertices represent atoms and edges represent bonds. The graph has some
multiple edges, in the form of double bonds. Its chemical formula is C8 H10 N4 O2 . In
molecules, the valency of an atom is the number of bonds it has; this is just the degree
of the corresponding vertex of the graph that represents the molecule.7 The valencies of
Carbon, Nitrogen, Oxygen and Hydrogen are 4,3,2,1, respectively.

5. Prove that the shortest walk between two vertices is always a path.

6. Prove that every graph with minimum degree at least 2 has a cycle.

7. Prove that the distance between two vertices in a graph satisfies the triangle inequality.
inequality
This means that, for every triple of vertices 𝑢, 𝑣, 𝑤 in any graph 𝐺,

𝑑𝐺 (𝑢, 𝑤) ≤ 𝑑𝐺 (𝑢, 𝑣) + 𝑑𝐺 (𝑣, 𝑤),

where 𝑑𝐺 (𝑥, 𝑦) denotes the distance between vertices 𝑥 and 𝑦 in 𝐺.

8. How many 2-colourings does a bipartite graph with 𝑛 vertices, 𝑚 edges and 𝑘
components have?

9. The following puzzle once circulated at a Victorian high school.

Consider the four-vertex multigraph in Figure 11.8. Determine if it has a trail that
includes every edge, and also if it has an Euler tour.
7 In fact, some graph theorists borrowed the term valency from chemistry and used it instead of “degree”,
but that is uncommon these days.
418 G R A P H T H E O RY i

Figure 11.8: A multigraph.

10. Each of the five platonic solids (tetrahedron, cube, octahedron, dodecahedron,
icosahedron) has a skeleton consisting of its vertices and edges, which is a graph. In fact,
the graph terminology “vertex” and “edge” came from their use for polyhedra.
For each of these five graphs: find a Hamiltonian cycle; determine if it is Eulerian,
and if it is, find an Euler tour.

11. Let 𝑄𝑛 be the graph whose vertices are all strings of 𝑛 bits, with two vertices
being adjacent if they differ in just one bit.

(a) Draw the cube, 𝑄3 .

(b) Find a Hamiltonian cycle in 𝑄3 .

(c) Describe a method for constructing a Hamiltonian cycle in 𝑄𝑛 , for each 𝑛. Think
recursively.

(d) Find an Eulerian cycle in 𝑄4 .

12. Using Theorem 64 for simple graphs, how would you prove it for multigraphs
(where multiple edges are allowed)?
To answer this, don’t re-do the proof of Theorem 64. Treat that theorem just as
a black box. So, your task is to show that a connected multigraph is Eulerian if and
only if every vertex has even degree, using Theorem 64 somehow. Since Theorem 64 is
only about simple graphs, you should try to construct, from any given multigraph 𝐺, a
simple graph 𝐻 such that applying Theorem 64 to 𝐻 helps prove this extension of the
theorem for 𝐺.
12
G R A P H T H E O RY I I

We continue our study of graphs by looking at two of the most important classes of
graphs: trees (and the closely related class of forests), which are ubiquitous in com-
puter science, and planar graphs, which are fundamental in many applications including
network layout and information visualisation.
We finish our exploration of graph theory by focusing on fun, through some games
that can be played on any graph. These games have a rich theory and are the source of
many good puzzles and challenges as well as being interesting to play.

12.1𝛼 TREES

A tree is a connected graph without cycles.


An example of a tree is given in Figure 12.1.

Figure 12.1: A tree.

Some of the contexts in which trees are used as abstract models include:

419
420 G R A P H T H E O RY i i

• communications networks. It can be useful to identify a subset of the links which,


together, still enable any node to communicate with any other node. A minimal
subset of links that does this turns out to be a tree that reaches every node.

• hierarchies. In a hierarchy, each object depends on a unique “higher” object, while


also potentially having other objects that depend on it. Examples include organi-
sational hierarchies1 , taxonomies in biology, and parts of family trees. Computer
science itself is rich in hierarchies:
– In many operating systems (and historically inspired by Unix), the files are
organised into directories which form a hierarchy which is a tree.
– In object-oriented programming languages, the possible data types — repre-
sented by classes — are organised into inheritance hierarchies. This means
that objects that belong to a class inherit the functionality specified in other
classes higher up the hierarchy. This kind of programming has proved very
beneficial for promoting reusability and maintainability of software, and is
especially useful when writing programs for simulating real-world processes
and systems.
For inheritance hierarchies to truly be trees, they must conform to single
inheritance, where each class can only have one superclass. Many object-
oriented languages are restricted to single inheritance. But some languages
allow multiple inheritance, where a class can have more than one superclass.
In such cases, the inheritance hierarchies are not trees.
Python supports object-oriented programming.

• decision-making. Suppose there is a collection of situations, each requiring a choice


to be made between several options. Each choice leads to another situation, re-
quiring another choice, and so on, until some final outcome is reached. If each
situation can be reached by only one sequence of choices, then a tree is a suitable
model.

• text analysis. The classical example of this is the use of trees in grammars, for
human languages or programming languages. A parse tree shows how a string of
text can be generated according to grammatical rules.

• trees, in the botanical sense!


Trees are used extensively as data structures in computer programs, to help sort, store
and retrieve information.

In some of these examples, the trees have a special vertex called the root on which
everything else ultimately depends. For example, in a directory hierarchy in Unix or
Linux, there is a root directory, denoted simply by /, which (unlike all other directories)
1 assuming each employee has only one supervisor
12.2 P R O P E RT i E S O F T R E E S 421

has no parent directory. Try finding it in your Linux environment, in an Ed Workspace,


and explore its contents and subdirectories. In graph theory, a tree with a root is called
a rooted tree.
tree
We will focus on normal (unrooted) trees, because they are important in their own
right, and because they are fundamental to using and understanding rooted trees.

12.2 P R O P E RT i E S O F T R E E S

When mentioning communications networks (p. 420 in § 12.1𝛼 ), we said that trees arise
as minimal networks that keep everything connected. We now formalise and prove this,
as a general statement about trees.
Suppose a graph 𝐺 = (𝑉, 𝐸) is connected, but deleting any edge of the graph discon-
nects it (but we retain all the vertices). Such a graph can be called a minimal connected
graph on 𝑉, since it’s connected but every proper subgraph with the same vertex set
is disconnected. It turns out that minimal connected graphs on 𝑉 are the same as trees
with vertex set 𝑉.

Theorem 66.
66 A graph 𝐺 = (𝑉, 𝐸) is a tree if and only if it is a minimal connected
graph on 𝑉.

Proof.
(⟹)
Let 𝐺 be a tree. By definition, it is connected. If it is not a minimal connected
graph on 𝑉, then there is some edge 𝑒 of 𝐺 such that the graph remains connected even
if 𝑒 is deleted. Let 𝑣 and 𝑤 be the endpoints of 𝑒, so that 𝑒 = {𝑣, 𝑤}. Since 𝐺 remains
connected after 𝑒 is deleted, there must be a path from 𝑣 to 𝑤 that does not use the
edge 𝑒. But adding 𝑒 to this path creates a cycle in 𝐺. Since 𝐺 contains a cycle, it is
not a tree, which is a contradiction.

(⟸)
Let 𝐺 be a minimal connected graph on 𝑉.
Since 𝐺 is already connected, we only need to show that it has no cycle. Then we
will know it is connected and has no cycle, which means it’s a tree.
Assume, by way of contradiction, that 𝐺 has a cycle 𝐶. Let 𝑒 be any edge in 𝐶.
Now, let 𝑣 and 𝑤 be any vertices in 𝐺. Since 𝐺 is connected, there is a walk from 𝑣 to
𝑤 in 𝐺. If this walk includes 𝑒, then we can construct an alternative walk from 𝑣 to 𝑤
that avoids 𝑒 by going all the way round 𝐶 instead. Since we can do this for any 𝑣 and
𝑤, it follows that every pair of vertices in 𝐺 are connected by a walk that avoids 𝑒.
Therefore, in the proper subgraph of 𝐺 obtained from it by deleting 𝑒, every pair of
vertices are connected by a walk (and hence by a path). Therefore this proper subgraph
of 𝐺 is connected. It also has the same vertex set, 𝑉, as 𝐺. So this subgraph contradicts
the minimality of 𝐺. Therefore our assumption, that 𝐺 has a cycle, was wrong. So 𝐺
has no cycle.
422 G R A P H T H E O RY i i

So, in the context of a communications network, a tree may be viewed as a minimal


or “bare bones” network that links all nodes. This view is reinforced by the observation
that, in a tree, every pair of vertices is connected by a unique path. In fact, this property
gives another characterisation of trees (see Exercise 5).

Trees are very diverse. The two most “extreme” trees on 𝑛 vertices are
• the path graph 𝑃𝑛−1 , with 𝑛 vertices and 𝑛−1 edges. This tree contains the longest
path of any tree on 𝑛 vertices. No other tree on 𝑛 vertices has a path of length
𝑛 −1. The maximum vertex degree of 𝑃𝑛−1 is 2; every other tree on 𝑛 vertices has
higher maximum degree.
• the star graph consisting of one central vertex and 𝑛 − 1 leaves, each adjacent to
the central vertex. Using complete bipartite graph notation, this is 𝐾1,𝑛−1 . By
contrast with the path graph, its paths are extremely short. It has no path of
length > 2; no other tree has this property. Its maximum degree is 𝑛 − 1; every
other tree on 𝑛 vertices has lower maximum degree.
Figure 12.2 shows three trees on five vertices: a path, a star, and another shown between
them. In fact, this is a complete list of the structures that trees on five vertices can have;
every tree on five vertices may be identified as one of these three (possibly after renaming
and redrawing to make the relationship clear).

𝑃4 𝐾1,4

Figure 12.2: Three trees on five vertices.

See if you can go beyond Figure 12.2 by drawing all possible trees on six vertices.
You will start to see more of the variety of shapes they have. You should also study their
12.2 P R O P E RT i E S O F T R E E S 423

structure and see if you can make some general observations about the structure of trees.
How many edges does a tree on 𝑛 vertices have? What values can their average degree
take? What values can the maximum degree of a vertex in a tree take? What values
can the minimum degree take? What possible values can the length of the longest path
take?
We now answer some of these questions. First, we consider minimum degree.
Theorem 67.
67 Every tree with at least two vertices has a leaf
Proof. See Exercise 11.6.

We now consider the number of edges in a tree.


Theorem 68.
68 For every 𝑛 ∈ ℕ, every tree on 𝑛 vertices has 𝑛 − 1 edges.
Proof. Let 𝑃(𝑛) be the following statement:
Every tree on 𝑛 vertices has 𝑛 − 1 edges.
We prove, by induction on 𝑛, that 𝑃(𝑛) is true for all 𝑛.

Inductive Basis:
When 𝑛 = 1, we only have one vertex, and there is only one graph with one vertex,
and it has no edge. This is the simplest tree, and its number of edges is 0, which is 1−1,
so the number of edges is indeed 𝑛 − 1 in this case. So 𝑃(0) is true.

Inductive Step:
Let 𝑘 ≥ 1.
Assume that 𝑃(𝑘) holds, i.e., that every tree on 𝑘 vertices has 𝑘 − 1 edges. This is
our Inductive Hypothesis.
We need to show that 𝑃(𝑘 + 1) holds. This is a statement about all trees on 𝑘 + 1
vertices, so it’s a universal statement. So our proof strategy for the Inductive Step is
to consider a general tree on 𝑘 + 1 vertices and prove that it has the required property,
looking out for a chance to apply the Inductive Hypothesis.
Let 𝑇 be any tree on 𝑘 + 1 vertices. By Theorem 67, 𝑇 has a leaf.
We now make a new tree 𝑆 from 𝑇 by deleting a leaf from it (which means we remove
the leaf vertex and also its incident edge). 𝑆 is indeed a tree, because removing a leaf
from a graph never disconnects it and never creates a cycle.
This new tree 𝑆 has one fewer vertex and one fewer edge than 𝑇, because of the
deletion of the leaf (and its incident edge). Since 𝑇 has 𝑘 + 1 vertices, this means 𝑆 has
𝑘 vertices. This in turn means that we can apply the Inductive Hypothesis to 𝑆.
By the Inductive Hypothesis, 𝑆 has 𝑘 − 1 edges.
Now we use the relationship between 𝑆 and 𝑇. Since 𝑆 has one fewer edge than 𝑇
(due to the way it was constructed from 𝑇), the fact that 𝑆 has 𝑘 − 1 edges tells us that
the number of edges of 𝑇 is
(𝑘 − 1) + 1 = 𝑘.
424 G R A P H T H E O RY i i

Therefore 𝑇 has 𝑘 edges. So 𝑃(𝑘 + 1) holds.


Summarising, we started with the assumption that 𝑃(𝑘) holds and deduced from
this that 𝑃(𝑘 + 1) holds. In other words, we proved 𝑃(𝑘) ⇒ 𝑃(𝑘 + 1). So we have com-
pleted the Inductive Step.

Conclusion:
Therefore, by the Principle of Mathematical Induction, 𝑃(𝑛) holds for all 𝑛 ∈ ℕ.

12.3 FORESTS

A forest is just a graph with no cycles.


So a forest is like a tree, except that we don’t require it to be connected.
Even though a forest does not have to be connected, it will (like any graph) have
components, and those components are all connected. The components cannot have
any cycles, since any cycle in a component is also a cycle in the whole graph. So each
component in a forest is connected and has no cycles. Therefore each component of a
forest is a tree.
So a forest is just a collection of trees, and this explains where the name comes from.
The graph in Figure 11.2, early in the previous chapter, is a forest with three com-
ponents. Every tree is a forest, even though it only has one component. So the graphs
in Figure 11.5(a) and Figure 12.1 are forests (because they are trees).2
Theorem 68 can be extended to forests.
Theorem 69.
69 For every 𝑛 ∈ ℕ, every forest with 𝑛 vertices and 𝑘 components has 𝑛−𝑘
edges.
Proof. Let 𝐹 be a forest and let 𝐹1 , 𝐹2 , … , 𝐹𝑘 be its components.
Let 𝑛 and 𝑚 be the numbers of vertices and edges, respectively, of 𝐹 . For each 𝑖,
let 𝑛𝑖 and 𝑚𝑖 be the numbers of vertices and edges, respectively, of the 𝑖-th component,
𝐹𝑖 .
We can express the number of vertices of 𝐹 as the sum of the numbers of vertices of
the components,
𝑘
𝑛 = 𝑛 1 + 𝑛 2 + ⋯ + 𝑛 𝑘 =  𝑛𝑖 . (12.1)
𝑖=1

We can treat the numbers of edges similarly:


𝑘
𝑚 = 𝑚 1 + 𝑚 2 + ⋯ + 𝑚 𝑘 =  𝑚𝑖 . (12.2)
𝑖=1

By Theorem 68 we have, for each 𝑖,

𝑚𝑖 = 𝑛𝑖 − 1.
2 But the graph in Figure 11.5(b) is certainly not a tree or a forest.
12.4 S PA N N i N G T R E E S 425

Let us use these observations to work out the number of edges of 𝐹 .


𝑘
𝑚 =  𝑚𝑖 (by (12.2))
𝑖=1
𝑘
= (𝑛𝑖 − 1) (by applying Theorem 68 to each component)
𝑖=1
𝑘 𝑘
= ⒧ 𝑛𝑖 ⒭ − ⒧ 1⒭
𝑖=1 𝑖=1
= 𝑛−𝑘 (by applying (12.1) to the first sum).

12.4 S PA N N i N G T R E E S

Let 𝐺 be a connected graph. A spanning tree of 𝐺 is a subgraph of 𝐺 which

• is a tree, and

• includes every vertex of 𝐺.

In Figure 12.3, we show a graph at the top, followed by two further drawings of it
with two different spanning trees of the graph. (This graph is the same as the one in
Figure 11.4.) See if you can find some other spanning trees for this graph.
Since a tree is a minimal connected graph on its vertex set (Theorem 66), a spanning
tree of 𝐺 is a minimal connected subgraph of 𝐺 that includes every vertex of 𝐺. In other
words, it’s a subgraph of 𝐺 in which every vertex is connected to every other vertex but
deleting any edge from the subgraph disconnects the subgraph.
We can find a spanning tree of a connected graph 𝐺 by deleting edges (but not ver-
tices) until we can’t delete any more edges without disconnecting the subgraph. We can
usually do this in many different ways. Often, this process will give different spanning
trees, but sometimes we will get the same spanning tree by deleting the same edges but
doing so in a different order.
For example, in the graph of Figure 12.3, suppose we delete edge {𝑎, 𝑏}, then {𝑐, 𝑑},
then {𝑏, 𝑒}. You can check that, at each stage in this process, the remaining graph is
still connected (and includes all the vertices, since we are not deleting vertices). After
we have deleted all these three edges, we have the spanning tree in the middle diagram.
Order of deletion does not matter here; we could have instead deleted {𝑐, 𝑑}, then {𝑏, 𝑒},
then {𝑎, 𝑏}, and we would still end up with the same spanning tree.
If, instead, we delete {𝑑, 𝑒}, then {𝑎, 𝑐}, then {𝑐, 𝑑}, then we get the spanning tree
shown at the bottom of the figure.
This always works, for any connected graph. It follows that every connected graph
has a spanning tree.
426 G R A P H T H E O RY i i

𝑏
𝑎

𝑓
𝑒
𝑐
𝑑

𝑏
𝑎

𝑓
𝑒
𝑐
𝑑

𝑏
𝑎

𝑓
𝑒
𝑐
𝑑

Figure 12.3: A graph (top), and two of its spanning trees, shown with thick edges (middle and
bottom).

This method of finding a spanning tree starts with the entire graph and deletes edges
until we are left with a spanning tree. Another method of finding a spanning tree in a
graph works in the opposite way: we start with nothing and build up. We now describe
this.
At the start, we pick any edge of the graph. (If the graph has no edge, then it has
just one vertex and nothing else, otherwise it is not connected. A one-vertex graph is
its own spanning tree, so in that case there is no need to do anything.) Then we keep
trying to add edges, to the graph we have built up so far, provided that this does not
create a cycle. While we are doing this, the subgraph we are building must be a forest
(as it has no cycle), but it might not be a connected subgraph, so it might not be a tree.
Eventually, the process stops, when it is no longer possible to add another edge without
creating a cycle. When that happens, it turns out that we will have a spanning tree of
the original graph.
Let us now set this out as an algorithm.

1. Input: a connected graph 𝐺.


12.4 S PA N N i N G T R E E S 427

2. 𝑋 ∶= ∅

3. while ∃𝑒 ∈ 𝐸 ∖ 𝑋 such that 𝑋 ∪ {𝑒} does not contain a cycle:


2.1. 𝑒 ∶= any edge 𝑒 ∈ 𝐸 ∖ 𝑋 such that 𝑋 ∪ {𝑒} does not contain a cycle
2.2. 𝑋 ∶= 𝑋 ∪ {𝑒}

4. Let 𝐹 be the graph (𝑉, 𝑋 ), which is a subgraph of 𝐺.


Output: 𝐹 .

This algorithm starts with a set of edges containing no cycle, and it only ever adds
edges that do not create a cycle. Therefore the output subgraph 𝐹 , with vertex set 𝑉
and edge set 𝑋 , has no cycles. Therefore it is a forest.
Since 𝐺 is connected, the output subgraph 𝐹 is connected too. (It is a good exercise
to try to prove this.) Since 𝐹 is a connected forest, it is a tree, and since it is also a
subgraph of 𝐺 that includes all vertices of 𝐺, it is a spanning tree of 𝐺.
If 𝐺 is disconnected, then the algorithm can be applied to each component of 𝐺 to
construct a spanning tree in each component.

Suppose you have a set 𝑉 of nodes that need to be joined up somehow into a
network. Applications include planning for transport or communications. There is a set
𝐸 of possible links that could be built between pairs of nodes, and each possible link
carries a cost. How do you find the minimum cost network that connects all the nodes?
We model this by a graph 𝐺 = (𝑉, 𝐸), where the vertices represent nodes in the
network and the edges represent pairs of nodes which could be linked. We assume 𝐺
is connected, otherwise it’s impossible to find a subgraph that connects all the vertices.
For each edge 𝑒 there is a cost 𝑤(𝑒) ∈ ℝ+ which represents the cost of building that link.
We want to find a subset 𝑋 of the edges such that

• (𝑉, 𝑋 ) is connected, so that every pair of vertices has a path using only edges in
𝑋 ; and

• the total cost 𝑤(𝑋 ) is minimum, where the total cost of 𝑋 is just the sum of all
the costs of the edges in 𝑋 :

𝑤(𝑋 ) =  𝑤(𝑒).
𝑒∈𝑋

In order to minimise the total cost, we do not want to include any unnecessary edges
in 𝑋 . Suppose 𝑋 contains an edge 𝑒 such that 𝑋 ∖ {𝑒} still connects all vertices of 𝐺.
Then we prefer to omit 𝑒, since that reduces the total cost by 𝑤(𝑒), the cost of 𝑒. So,
any minimum-cost subgraph (𝑉, 𝑋 ) will not only be a connected subgraph of 𝐺, but will
also be a minimal connected subgraph that includes all vertices of 𝐺. As we remarked
428 G R A P H T H E O RY i i

early in this section, a minimal connected subgraph that includes all vertices is just a
spanning tree of 𝐺. So what we want is a minimum-cost spanning tree of 𝐺.
Our earlier algorithm for finding a spanning tree takes no notice of any costs on the
edges. So, in general, it won’t find a minimum-cost spanning tree.
We can easily modify the algorithm to make use of the edge costs. In the key step
where the next edge is chosen (step 2.1), we could choose the edge of minimum cost,
among all those not in 𝑋 which would not create a cycle if we chose them. This algorithm
is known as Kruskal’s Greedy Algorithm,
Algorithm after Joseph Kruskal who introduced it in
1956. It is greedy because it always makes the choice that gives the greatest immediate
benefit, rather than looking further ahead to see what gives greatest benefit overall.
Here is the Greedy Algorithm in full, which is the same as the previous algorithm
except for the part shown in blue, where we choose the next edge greedily instead of
arbitrarily.

Kruskal’s Greedy Algorithm


1. Input: a connected graph 𝐺.

2. 𝑋 ∶= ∅

3. while ∃𝑒 ∈ 𝐸 ∖ 𝑋 such that 𝑋 ∪ {𝑒} does not contain a cycle:


2.1. 𝑒 ∶= an edge 𝑒 ∈ 𝐸 ∖ 𝑋 such that 𝑋 ∪ {𝑒} does not contain a cycle, and 𝑤(𝑒)
is minimum among all edges with this property
2.2. 𝑋 ∶= 𝑋 ∪ {𝑒}

4. Let 𝐹 be the graph (𝑉, 𝑋 ), which is a subgraph of 𝐺.


Output: 𝐹 .

Remarkably, this algorithm is actually guaranteed to find a minimum cost spanning


tree.
Theorem 70.
70 For every weighted graph 𝐺, Kruskal’s Greedy Algorithm finds a mini-
mum cost spanning tree in 𝐺. □
This is not at all obvious, and very unexpected! Usually, in designing algorithms, a
greedy approach does not give optimum solutions. Repeatedly maximising short-term
benefit seldom gives maximum long-term benefit. For example, if you want to find the
shortest path between two vertices in a weighted graph, with edge weights representing
lengths or travel times, then a greedy approach will not work, in general, and may even
fail to find any path. (Try it on some examples.)
We will not prove Theorem 70, but finding a proof is a good challenge for a strong,
keen student.
The Greedy Algorithm’s combination of simplicity, efficiency and optimality makes
it a very powerful tool for programmers and problem-solvers. Although it is rare for
12.5 P L A N A R i T Y 429

greedy algorithms to find optimal solutions, the benefits when that does happen can be
enormous. So it is important to be able to recognise those situations. A whole branch of
mathematics, matroid theory, is based on the study of structures for which the greedy
algorithm always finds optimum solutions.

12.5 PLANARiTY

Graphs often need to be presented on a flat, two-dimensional surface.

• To help people study graphs visually, they are usually displayed on a screen or
drawn/printed on paper.

• Transport networks involve laying out routes between locations in a geographical


area.

• Electrical circuits are laid out on flat surfaces such as circuit boards or wafers, with
wires routed along these surfaces.

If possible, it is desirable to avoid, or at least minimise, edge-crossings. This is because


they introduce costs. In diagrams, the relationships among components may be obscured
by edge-crossings, so these crossings impose a cognitive cost on human readers. In
transport networks, edge-crossings may need to be implemented as bridges, tunnels or
intersections, which may be expensive to build or operate. In electrical circuits, edge-
crossings introduce extra manufacturing costs in order to prevent short-circuits between
wires.
It is therefore important to understand graphs that can be represented on a flat,
two-dimensional surface with no crossings at all. Geometrically, this means drawing
them on the plane such that

• the vertices are represented as distinct points on the plane;

• each edge {𝑣, 𝑤} is drawn as a curve between 𝑣 and 𝑤 which does not intersect
itself anywhere, meets 𝑣 and 𝑤 only at the ends of the curve, and meets no other
vertices at all;

• the curves representing different edges cannot meet each other at all, except if the
two edges are incident at a common vertex, in which case they only meet at that
vertex.

A graph that can be drawn in this way is planar


planar. Such a drawing in the plane is
called a planar drawing of the graph, or a plane graph.
graph These definitions extend to
multigraphs too.
If you review the graphs illustrated in this and earlier chapters, you will find that
most of them are planar, and are drawn without crossings, so that the illustrations give
planar drawings of them. A few particular cases are worth comment.
430 G R A P H T H E O RY i i

• The drawing of 𝐾4 on the left of Figure 11.3 is not planar, since it has an edge-
crossing. But the graph 𝐾4 itself is planar. Can you find a planar drawing of
it?

• The drawings of two bipartite graphs in Figure 11.5 each have edge-crossings, so
they are not planar drawings. But both graphs are planar. Can you show this,
using an appropriate drawing?

• The multigraph in Figure 11.8 is planar, although the drawing given there is not a
planar drawing since there is an edge-crossing in the middle. How can you redraw
it so that there are no crossings?
Null graphs are of course planar, since they have no edges to form crossings with. Paths,
cycles and trees are always planar. What about two other special families: complete
graphs, and complete bipartite graphs?
• complete graphs: 𝐾1 , 𝐾2 , 𝐾3 and 𝐾4 are all planar. For 𝐾1 and 𝐾2 , this is because
they do not have enough edges to form edge-crossings. For 𝐾3 , it is a special case
of the fact that all cycles are planar. Hopefully you have just shown that 𝐾4 is
planar, using a different drawing to the one given on the left in Figure 11.3. But
what about 𝐾5 ?

• complete bipartite graphs: 𝐾1,𝑘 , the star graph, is a tree, and therefore planar.
𝐾2,2 is really the same as the 4-cycle, 𝐶4 , and is therefore planar. Hopefully you
have now redrawn 𝐾2,3 (see Figure 11.5(b)) to show that it is planar. But what
about the next two cases, 𝐾2,4 and 𝐾3,3 ?
Figure 12.4 shows 𝐾5 and 𝐾3,3 .

𝐾5 𝐾3,3

Figure 12.4: 𝐾5 (left) and 𝐾3,3 (right).

The planarity, or otherwise, of 𝐾5 and 𝐾3,3 has been the basis of numerous puzzles.
12.5 P L A N A R i T Y 431

• Five towns are planning a rail network in which each pair of towns has a direct
route between them. Can this be done without building any crossings?

• An old puzzle based on 𝐾3,3 has the vertices on the left side representing houses
and those on the right side representing utility services, typically water, electricity
and gas. The puzzle is to connect each house to each utility so that none of the
pipes or wires carrying the services to the houses cross each other.

We will soon be able to determine whether or not these two specific graphs are planar
without doing an exhaustive search of the many different ways of drawing them.

Every planar drawing of a graph 𝐺 divides the plane up into regions. Informally,
these are the areas of the plane that are surrounded by vertices and edges but have no
vertices and edges within them. You can imagine cutting the plane along all the edges
(including their endpoints), leaving the plane to fall apart into separate “pieces”. These
pieces are the regions of the plane graph. Each region is referred to as a face of the
plane graph.
Each face has the property that any two points that lie inside a face (and not on any
of the vertices or edges that surround the face) can be joined by a curve that also lies
entirely within the face. So the curve does not meet any vertex or edge of 𝐺. In fact, a
face may be defined formally as a maximal subset of points in the plane such that any
two points in the subset can be joined by a curve that does not meet any vertices or
edges of 𝐺.
The region of the plane “outside” the graph is considered to be a face just as much as
the other regions. It is called the outer face,
face and must not be forgotten when counting
the faces in the graph.
Consider the plane graph in Figure 12.5. This graph divides the rest of the plane
into four faces, shown in the figure as 𝐹1 , 𝐹2 , 𝐹3 and 𝐹4 . Here, 𝐹4 is the outer face.

𝑐
𝑔

𝐹2
𝐹1 𝐹3
𝑎 𝑒 𝑓
𝑏


𝑑 𝐹4

Figure 12.5: A plane graph with four faces: 𝐹1 , 𝐹2 , 𝐹3 and the outer face 𝐹4 .
432 G R A P H T H E O RY i i

Each face has a boundary consisting of those vertices and edges that are next to it.
You can walk along this boundary, all around the face, much as you might walk around
the boundary of a park or field. You can do so in either of two directions: one direction
keeps the face on your left as you walk around it, while the other keeps the face on
your right as you walk around it. Either way, and wherever you start, this walk will
eventually return to your starting point, and it actually constitutes a closed walk in the
graph.
The boundaries of the faces in Figure 12.5 are as follows.
face boundary comment
𝐹1 𝑎, 𝑐, 𝑒, 𝑑, 𝑏, 𝑑, 𝑎 uses edge {𝑏, 𝑑} twice
𝐹2 𝑐, 𝑔, 𝑓, 𝑒, 𝑐 this boundary is actually a cycle
𝐹3 𝑓, 𝑔, ℎ, 𝑓 so is this
𝐹4 𝑎, 𝑑, 𝑒, 𝑓, ℎ, 𝑔, 𝑐, 𝑎 outer face; also a cycle in this case.

Consider any edge 𝑒 in a plane graph, and imagine walking along it. As you do so,
you can look to your left or to your right.
• On your left is a face that has that edge in its boundary. We imagine a small
portion of that face that lies alongside the edge on your left, and call it a side of
that edge.
• On your right there is also a face with that same edge in its boundary. Again,
imagine a small portion of that face lying alongside the edge on your right, and
call it a side of that edge.
Often, the faces we see on either side of an edge are two different faces, as in Fig-
ure 12.6(a). But they can also be the same face, as in Figure 12.6(b). In any case, each
edge has two sides, and these sides may belong to different faces or to the same face.
Observe that each edge belongs to at most two faces. If we construct a boundary
walk around every face, then each edge appears exactly twice in the full set of boundary
walks. This could happen in either of two ways:
• The edge might appear once in the boundary walk around one face, and once in
the boundary walk around a different face.
• The edge might appear twice in the boundary walk around a single face.
In each case, the edge appears in no other boundary walk around any other face.
You can, if you wish, check that each edge of the plane graph in Figure 12.5 appears
twice in the set of boundaries of the four faces, using the list of boundary walks we gave
above. For example, edge {𝑒, 𝑓} appears once in the boundary of 𝐹2 and once in the
boundary of 𝐹4 , while edge {𝑏, 𝑑} appears twice in the boundary of 𝐹1 .

The numbers of vertices, edges and faces in a plane graph satisfy an equation which,
again, is due to Euler.
12.5 P L A N A R i T Y 433

𝑎 𝑒 𝑎 𝑒

one side one side


𝑐 𝑑 𝑐 𝑑
another side another side

𝑏 𝑓 𝑏 𝑓
(a) (b)

Figure 12.6: Two plane graphs, each showing the two sides of an edge {𝑐, 𝑑}. (a) The two sides
are in different faces. (b) The two sides are in the same face.

Theorem 71 (Euler’s Theorem).


Theorem) Let 𝐺 be a connected plane graph, and let 𝑛, 𝑚
and 𝑓 be its numbers of vertices, edges and faces, respectively. Then

𝑛 − 𝑚 + 𝑓 = 2.

Proof. We prove by induction on 𝑚 that, for any graph,

𝑛 − 𝑚 + 𝑓 = 2, (12.3)

with 𝑛, 𝑚, 𝑓 defined as in the statement of the theorem.

Inductive Basis:
We don’t need to consider any 𝑚 < 𝑛 −1, since no graph with fewer than 𝑛 −1 edges
can be connected. This is because trees are minimal connected graphs on a given vertex
set, and they have 𝑛 − 1 edges by Theorem 68.
If 𝑚 = 𝑛 − 1, then 𝐺 must be a tree, since it is connected. When a tree is drawn in
the plane, the only face is the outer face, consisting of the entire plane except for the
points representing the vertices and the edges of the tree. So, in this case, 𝐺 has one
face, so 𝑓 = 1. Then, using 𝑚 = 𝑛 − 1 and 𝑓 = 1, we have

𝑛 − 𝑚 + 𝑓 = 𝑛 − (𝑛 − 1) + 1 = 𝑛 − 𝑛 + 1 + 1 = 2.

So (12.3) holds in this case.

Inductive Step:
Let 𝑚 ≥ 𝑛 − 1.
434 G R A P H T H E O RY i i

Assume that, for every graph with 𝑚 edges,

(# vertices) − (# edges) + (# faces) = 2.

This is our Inductive Hypothesis.


Let 𝐺 be any connected graph with 𝑛 vertices, 𝑚 + 1 edges and 𝑓 faces. We need to
prove that 𝑛 − (𝑚 + 1) + 𝑓 = 2.
Since 𝐺 is connected and has more than 𝑛−1 edges, it cannot be a tree and therefore
must contain a cycle. Let 𝐶 be a cycle in 𝐺 and let 𝑒 be any edge of 𝐶. Now, 𝐶 divides
the plane into two parts, the “inside” of 𝐶 and the “outside” of 𝐶.3 This means that the
two sides of the edge 𝑒 are in two separate faces of 𝐺 (otherwise it would be possible
to move from one side of 𝑒 to the other side of 𝑒 without crossing 𝐶, which would be a
contradiction).
Consider now the graph 𝐻 formed from 𝐺 by deleting the edge 𝑒 but keeping its
endpoints and all other vertices and edges of 𝐺. Its drawing in the plane is unchanged
except that the edge 𝑒 has disappeared. The absence of 𝑒 means that the two separate
faces that were on either side of 𝑒 are now merged into a single face. So

# faces of 𝐻 = (# faces of 𝐺) − 1 = 𝑓 − 1.

Since we deleted one edge from 𝐺 to get 𝐻 , the number of edges of 𝐻 is also one less
than the number of edges of 𝐺, so

# edges of 𝐻 = (# edges of 𝐺) − 1 = (𝑚 + 1) − 1 = 𝑚.

Although the deletion of 𝑒 has reduced the numbers of edges and faces, it has not changed
the number of vertices:

# vertices of 𝐻 = # vertices of 𝐺 = 𝑛.

Now, since 𝐻 has only 𝑚 edges, we can apply the Inductive Hypothesis to it. So we
have
(# vertices of 𝐻 ) − (# edges of 𝐻 ) + (# faces of 𝐻 ) = 2.
Substituting the expressions we have derived for these three quantities, we have

(# vertices of 𝐺) − ((# edges of 𝐺) − 1) + ((# faces of 𝐺) − 1) = 2.

Therefore
(# vertices of 𝐺) − (# edges of 𝐺) + (# faces of 𝐺) = 2,
so (12.3) holds for 𝐺 as well. This completes the Inductive Step.

3 We are treating this fact as intuitively obvious and not justifying it further. But it is actually surprisingly
deep, and is not trivial to prove rigorously. It is called the Jordan Curve Theorem.
12.5 P L A N A R i T Y 435

Conclusion:
Therefore, by Mathematical Induction, (12.3) holds for all 𝑚.

Corollary 72.
72 For any planar graph 𝐺 with 𝑛 ≥ 3 vertices and 𝑚 edges,

𝑚 ≤ 3𝑛 − 6.

Proof. Since 𝐺 is planar, it has a planar drawing. Let 𝑓 be the number of faces in some
planar drawing of 𝐺.
We are going to use the faces of 𝐺, together with (i) what we can say about the sizes
(i.e., numbers of edges) of each of those faces, and (ii) Euler’s Theorem, to derive an
inequality between 𝑚 and 𝑛.
Since 𝐺 is simple, every face has at least three sides. So, for each face, we can pick
any three of its sides and mark them. A side can only be marked from the face it belongs
to, so each side gets marked at most once. Each edge has exactly two sides, and the
number of sides is twice the number of edges. So the number of sides that get marked
by this process is ≤ 2𝑚. Since the number of marks is 3𝑓, it follows that

3𝑓 ≤ 2𝑚,

which gives
2𝑚
𝑓 ≤ .
3
But Theorem 71 tells us that 𝑛 − 𝑚 + 𝑓 = 2, which after rearranging gives

−𝑛 + 𝑚 + 2 = 𝑓.

Combining these, we have


2𝑚
−𝑛 + 𝑚 + 2 ≤ ,
3
which is equivalent to
−3𝑛 + 3𝑚 + 6 ≤ 2𝑚 .
From this we obtain
𝑚 ≤ 3𝑛 − 6.

This fact is already strong enough to settle our earlier question about the planarity
or otherwise of 𝐾5 .
Corollary 73.
73 𝐾5 is nonplanar.
Proof. 𝐾5 has five vertices and ten edges: 𝑛 = 5 and 𝑚 = 10. We have 3𝑛−6 = 3⋅5−6 =
15 − 6 = 9 < 10, so
𝑚 > 3𝑛 − 6.
436 G R A P H T H E O RY i i

Therefore, by Corollary 72, 𝐾5 is not planar.

But Corollary 72 is not strong enough to answer our question about 𝐾3,3 .
We can get a stronger bound on the number of edges in a planar graph when the
graph has no triangles.

Corollary 74.
74 For any triangle-free planar graph 𝐺 with 𝑛 ≥ 3 vertices and 𝑚 edges,

𝑚 ≤ 2𝑛 − 4.

Proof. The proof is very similar to the proof of Corollary 72. The key difference is that,
when we mark edges around each face, we know that each face has at least four sides,
because 𝐺 has no triangles, so we mark four of them. Instead of the inequality 3𝑓 ≤ 2𝑚,
we have
4𝑓 ≤ 2𝑚,
which simplifies to
2𝑓 ≤ 𝑚.
Combining this with Euler’s Theorem, 𝑛 − 𝑚 + 𝑓 = 2, gives

𝑚 ≤ 2𝑛 − 4.

We are now in a position to settle our question about 𝐾3,3 .

Corollary 75.
75 𝐾3,3 is nonplanar.

Proof. 𝐾3,3 has six vertices and nine edges: 𝑛 = 6 and 𝑚 = 9. We have 2𝑛−4 = 2⋅6−4 =
12 − 4 = 8 < 9, so
𝑚 > 2𝑛 − 4.
Note also that, since 𝐾3,3 is bipartite, it has no odd cycles, by Corollary 63. In particular,
it has no cycles of length 3, i.e., it has no triangles. Therefore, by Corollary 74, 𝐾3,3 is
not planar.

12.6𝜔 GAMES ON GRAPHS

We finish our exploration of graph theory with some games you can play on graphs.

12.6.1𝜔 Shannon’s Switching Game

The Shannon Switching Game is named after Claude Shannon, one of the most important
and influential scientists of the twentieth century. His first major contribution was in
his Masters thesis, where he introduced the algebra of switching. He went on to lay
12.6𝜔 G A M E S O N G R A P H S 437

the foundations of information theory, including the theory of how to communicate


information clearly and efficiently in the presence of noise. Then, in related work, he
laid the mathematical foundations of cryptography.
Throughout his work, he was driven by curiosity and maintained a great sense of
fun. This game is one of his many inventions.
We start with any connected graph in which there are two special vertices designated
𝑠 and 𝑡. There are two players, Cut and Join, who take turns. They do different things
on their turns, and have different aims. On their turn:

• Cut crosses out an edge, with the aim of ensuring that there is no path between
𝑠 and 𝑡 in 𝐺. Once an edge is crossed out, it cannot be used in such a path.

• Join thickens an edge, with the aim of ensuring that there is a path of thickened
edges between 𝑠 and 𝑡 in 𝐺.

Once an edge is crossed out, it can never be thickened, and once an edge is thickened,
it can never be crossed out.
The game ends in one of two ways.

• As soon as there is a path of thickened edges between 𝑠 and 𝑡, Join wins. (It is not
required that all the thickened edges form such a path, but only that the thickened
edges include a thickened path between 𝑠 and 𝑡.)

• As soon as there is no possibility of such a path — because 𝑠 and 𝑡 are in different


components of the graph of uncrossed-out edges — Cut wins.

As soon as either player wins, the game stops.


We have not stipulated who goes first. This can be agreed between the players in
advance.
Let’s illustrate one possible play of the game, on the following graph, with Join going
first.

𝑠 𝑡

𝑏
438 G R A P H T H E O RY i i

Join’s first move: thicken {𝑠, 𝑎}. Cut’s first move: cross out {𝑎, 𝑡}.

𝑎 𝑎

X
𝑠 𝑡 𝑠 𝑡

𝑏 𝑏

Join’s second move: thicken {𝑎, 𝑏}. Cut’s second move: cross out {𝑏, 𝑡}.

𝑎 𝑎

X X
𝑠 𝑡 𝑠 𝑡
X
𝑏 𝑏

At this point, Cut wins the game, because once edges {𝑎, 𝑏} and {𝑏, 𝑡} are deleted,
the terminals 𝑠 and 𝑡 are in different components of the remaining graph.
But Join played poorly in this game. In particular, their second move was a serious
blunder! What should they have done? Could they have won? Do they have a winning
strategy in this game?

The theory of connectivity (§ 11.9) tells us that there is always a winner in this
game. There is no possibility of a draw. Suppose the players keep playing until all edges
are either thickened or crossed out, and consider the graph consisting of all the original
vertices but only the thickened edges. Either there is a path beteen 𝑠 and 𝑡 in this graph,
or there isn’t. If there is such a path, then since this path only has thickened edges,
Join wins. If there is no such path, then Cut wins. As soon as a thickened-edge path
is created or becomes impossible, the game is over, so the game could finish before all
edges are used, but in any case, there is always a winner.
At any stage while playing the game, there is at least one move for the player whose
turn it is that is best possible for them. It may be that one of the players has a winning
strategy. What would it mean for the player who starts the game to have a winning
strategy on a particular graph? This means that:
12.6𝜔 G A M E S O N G R A P H S 439

they have a 1st move such that:


for each of their opponent’s 1st moves,
they have a 2nd move such that,
for each of their opponent’s 2nd moves,
they have a 3rd move …
…and so on, …

eventually leading to a win for the starting player no matter what their opponent does.
This is not saying that they can force a win from every conceivable configuration in the
game, but rather that they can force a win, eventually, from the starting configuration
in the game (in which no edges are thickened or crossed out). It’s also not saying that
they will win even if they play badly. A player with a winning strategy may well lose if
they depart from that strategy.
On the other hand, it’s possible that the second player to move (i.e., the one who
does not start the game) has a winning strategy. This would mean that:

for each of their opponent’s possible 1st moves,


they have a 1st move in reply, such that
for each of their opponent’s possible 2nd moves,
they have a 2nd move in reply, such that
for each of their opponent’s possible 3rd moves, …
…and so on, …

eventually leading to a win for the second player no matter what their opponent (the
first player) does.
Since every play of the game always results in a win for one of the two players, it
follows that if one of the two players does not have a winning strategy, then the other
does. This may, or may not, depend on who goes first. So, for any given graph, there
may be four possibilities:
• Join has a winning strategy, whether they go first or second.

• Cut has a winning strategy, whether they go first or second.

• The first player has a winning strategy, whether that’s Join or Cut.

• The second player has a winning strategy, whether that’s Join or Cut.
For each of the following graphs (or multigraphs), see if you can determine which of
those four cases apply.

𝑠 𝑡
440 G R A P H T H E O RY i i

𝑠 𝑡

𝑠 𝑡

𝑠 𝑡

𝑠 𝑡

Try making some other graphs, and playing the Shannon Switching Game on them,
and try to determine who has a winning strategy in each.
After you have done this for the above graphs and maybe some others, you will
be able to classify each graph into one of our four categories, according to who has a
winning strategy: Join, Cut, first player, or second player. You should notice that there
is one category with no graphs: there is actually no graph for which the second player
always has a winning strategy regardless of whether Join or Cut starts the game.
This is not just an empirical observation from the graphs you have tried. (Such
observations are important in formulating conjectures, but of course they are not proofs.)
12.6𝜔 G A M E S O N G R A P H S 441

It is, in fact, a general property of this game. To see this, imagine that the second player
always has a winning strategy, regardless of who starts. Then what can the first player
do? Consider the following strategy for the first player: make any initial move, and
from then on, play as if they are the second player, which should result in them winning
if they use one of the winning strategies for the second player. This gives a contradiction:
we started by assuming that the second player can force a win, and then showed how
this same strategy could be exploited by the first player to force a win. So, in fact, there
can be no graph for which the second player always has a winning strategy.
The reason this argument works is that, in this game, it is never a disadvantage
to move. For Join, it can never be a disadvantage to thicken an edge (compared with
doing nothing at all), and for Cut, it can never be a disadvantage to cross out an edge.
In other words, Join thickening an edge can never be better, for Cut, than Join doing
nothing, and Cut crossing out an edge can never be better, for Join, than Cut doing
nothing. (In this game, you are not allowed to do nothing on your move, but it wouldn’t
matter if you were allowed to do nothing, because if you play sensibly, you would always
choose to make an actual move — joining or cutting, depending on your role — rather
than doing nothing.)
This argument won’t work in all two-player games. For example, in Chess, Draughts
(Checkers), Reversi (Othello) and Backgammon, situations can arise where any move
by a player is disadvantageous for them. Other games where moving is never a disad-
vantage include Noughts-and-Crosses (Tic-tac-toe). In that particular case, draws are
possible, so we cannot argue that there is always a winning strategy for one of the two
players.

You can just enjoy playing the game for fun. You can also try to develop some theory
for it. If you’d like a challenge, try to characterise those graphs that belong to each of our
three categories: Join wins, Cut wins, or first player wins. This is not easy, but the char-
acterisation uses some concepts discussed in this chapter. Such a characterisation should
be based purely on the structure of the graph, without examining all possible plays of
the game on the graph (of which there is, in general, a huge number). But, of course,
it’s ok to play the game as many times as you like, to help explore how it works and to
help discover what it is, about the structure of graphs, that helps a particular player win.

Here is a much more complex graph you can play the game on if you wish.
442 G R A P H T H E O RY i i

𝑠 𝑡

A commercial version of this game, using this last graph, has been produced. This
is the game Bridg-It, invented by David Gale and manufactered by Hasenfeld in 1960.

http://abstractstrategy.com/bridg-it.html

12.6.2𝜔 Slither

The game of Slither was invented by David L. Silverman and described by Martin
Gardner in 1972.4
The game can be played on any graph, as follows. Two players take turns choosing
edges so that the set of chosen edges always forms a path in the graph. Initially, no
edges are chosen. The first player chooses any edge they like, which is a path of length 1.
Then the second player must choose an edge which is incident with the first edge and
4 Martin Gardner, Mathematical games, Scientific American 226 (no. 6) (June 1972) 114–118.
12.6𝜔 G A M E S O N G R A P H S 443

creates a path of length 2. Then the first player chooses any edge that extends the path
created so far. And so on, for as long as possible. In order to extend the path chosen so
far, each new edge chosen must satisfy the following conditions:

• One of its endpoints must be one of the two endpoints of the path chosen so far.

– Either of those endpoints can be used, at each turn. Players are not required
to keep extending the path at the same end.

• The other endpoint of the new edge must be a vertex that does not yet belong to
the path. That new vertex then becomes an end of the path.

The game ends when no legal move is possible. When that happens, the last player to
have made a legal move wins. So, the first player who cannot make a legal move loses.

We now give a possible play of the game, on the following graph.

𝑐
𝑒
𝑎

𝑓
𝑑

The chosen edges are shown as thick green edges.

First player’s first move: {𝑎, 𝑒}. Second player’s first move: {𝑒, 𝑓}.

𝑏 𝑏

𝑐 𝑐
𝑒 𝑒
𝑎 𝑎

𝑓 𝑓
𝑑 𝑑
444 G R A P H T H E O RY i i

First player’s second move: {𝑎, 𝑐}. Second player’s second move: {𝑐, 𝑑}.

𝑏 𝑏

𝑐 𝑐
𝑒 𝑒
𝑎 𝑎

𝑓 𝑓
𝑑 𝑑

In this play of the game, the game stops at this point because no further legal
move is possible. The path cannot be extended at either end without destroying the
path property. The player to move (the first player, in this case) cannot choose {𝑑, 𝑓},
because that creates a cycle of chosen edges: 𝑎, 𝑒, 𝑓, 𝑑, 𝑐, 𝑎. Similarly, neither {𝑎, 𝑓} nor
{𝑎, 𝑑} can be chosen, because they also create cycles of chosen edges (triangles, in fact).
The edges {𝑏, 𝑐}, {𝑎, 𝑏} and {𝑏, 𝑒} cannot be chosen because they are not incident with
an endpoint (𝑑 or 𝑓) of the chosen path.
Could the first player have done better? Does the first player have a winning strat-
egy when the game is played on this graph?

This game must end in a win for one or the other of the two players. For each graph,
we have one of two possibilities:

• The first player has a winning strategy.

• The second player has a winning strategy.

Try the game on a few small graphs, and classify them as to whether you think the
first or second player has a winning strategy. This time, you should find that there are
graphs of each type: some for which the first player has a winning strategy, and others
for which the second player has a winning strategy.
Then try and investigate how to determine, just from the graph itself and without
doing an exhaustive search of all possible plays of the game, who has a winning strategy
on a given graph.

One important difference between this game and the Shannon Switching Game is
that, in this game, the two players have the same role, since their effect on an edge,
when they choose it, is the same. (This contrasts with the Shannon Switching Game,
where one player thickens edges and the other player crosses them out.) Of course, the
details of the games are very different, too. But this point about the different roles of
the players is a very fundamental one in the theory of games. A game is said to be
12.7 E X E R C i S E S 445

impartial if, in any given configuration, the options available to each player would be
the same if it were their turn to move. A game is partisan if it is not impartial. So,
with this terminology, the Shannon Switching Game is a partisan game, while Slither is
an impartial game.

12.7 EXERCiSES

1. Draw all trees that have six vertices. Your set of drawings should be comprehensive,
so that every tree on six vertices can be seen to be identical to one of those in your set
(potentially after some relabelling and redrawing).

2. The diameter of a graph is the length of the longest path in it. This is the
maximum, over all pairs of vertices 𝑣, 𝑤, of the length of any path between 𝑣 and 𝑤.
Prove that every tree of diameter ≥ 3 has a vertex 𝑣 such that

2 ≤ deg(𝑣) ≤ 𝑛 − 2,

where 𝑛 is the number of vertices of the graph.

3. If a tree has 𝑘 vertices of degree 3 and all other vertices are leaves, how many
leaves does it have?

4.
(a) Prove that every tree with at least three vertices has a vertex of degree at least 2.
(b) Is there a number 𝑁 such that every tree with at least 𝑁 vertices has a vertex of
degree at least 3?

5. Prove that a graph 𝐺 is a tree if and only if for every pair 𝑢, 𝑣 of vertices, there is
a unique path between them in 𝐺.

6. Give an upper bound on the average degree of a tree.

7. How many spanning trees does the following graph have?

𝑏
𝑎

𝑓
𝑒
𝑐
𝑑
446 G R A P H T H E O RY i i

8. Apply Kruskal’s Greedy Algorithm to find a minimum weight spanning tree in


the following weighted graph.

𝑏 1 𝑑 6 𝑓

9 2 7
3 4

𝑎 8 𝑐 5 𝑒

9. Prove that 𝐾2,𝑛 is planar for all 𝑛.

10. Check that Euler’s Theorem holds for the following graphs:
(a) the cycle 𝐶𝑛 ;

(b) the wheel graph 𝑊𝑛 , which is obtained from 𝐶𝑛 by adding one new vertex (the
“hub”) that is adjacent (via “spokes”) to every vertex in 𝐶𝑛 (the “rim”);

(c) 𝐾4 ;

(d) the cube 𝑄3 (Exercise 11.11);

(e) the octahedron;

(f) the dodecahedron;

(g) the icosahedron.

11.
(a) Give an upper bound on the average degree of a planar graph.

(b) Give an upper bound on the average degree of a planar graph with no triangles.

(c) Give an upper bound on the average degree of a bipartite planar graph.

12. Prove that every planar graph has a vertex of degree ≤ 5.

13.
Let 𝑑 ∈ ℕ. A graph is 𝑑-regular if every vertex has degree 𝑑.
12.7 E X E R C i S E S 447

(a) Find the 3-regular planar graph with the fewest vertices.

(b) Find the 3-regular bipartite planar graph with the fewest vertices.

(c) Find the 3-regular nonplanar graph with the fewest vertices.

(d) Describe an infinite family of connected 4-regular planar graphs.

(e) What is the minimum number of vertices that a 5-regular planar graph can have?
Can you find such a graph?

(f) Does a 6-regular planar graph exist? Justify your answer.

You might also like