Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
11 views189 pages

The Calculus Lambda Calculus

The document discusses the λ-calculus, highlighting its foundational significance in mathematics and computer science, particularly in functional programming and denotational semantics. It aims to provide a clearer and more accessible introduction to the subject for upper-year mathematics undergraduates, addressing common challenges in existing literature. The text outlines the formal system of λ-calculus, including definitions, examples, and key concepts such as substitution and equivalence relations.

Uploaded by

stellm98
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views189 pages

The Calculus Lambda Calculus

The document discusses the λ-calculus, highlighting its foundational significance in mathematics and computer science, particularly in functional programming and denotational semantics. It aims to provide a clearer and more accessible introduction to the subject for upper-year mathematics undergraduates, addressing common challenges in existing literature. The text outlines the formal system of λ-calculus, including definitions, examples, and key concepts such as substitution and equivalence relations.

Uploaded by

stellm98
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 189

The λ-calculus

Peter Hoffman
This is quite a rich subject, but rather under-appreciated in
many mathe- matical circles, apparently. It has ‘ancient’
connections to the foundations of both mathematics and also of
computability, and more recent connections to functional
programming languages and denotational semantics in computer
science. This write-up is largely independent of the lengthier
Computability for the Mathematical—[CM], within which it will
also appear.
Do not misconstrue it as negative criticism, but I find some
of the litera- ture on the λ-calculus to be simultaneously quite
stimulating and somewhat hard to read. Reasons for the latter
might be:
(1)neuron overload caused by encyclopaediae of closely
related, but subtly different, definitions (needed for deeper
knowledge of the subject perhaps, or maybe a result of the experts
having difficulty agreeing on which concepts are central and
which peripheral—or of me reading only out-of-date literature!);
(2) authors whose styles might be affected by a formalist
philosophy of mathematics (‘combinatory logicians’), while I
am often trying to ‘picture actual existing abstract objects’, in
my state of Platonist original sin; and
(3) writing for readers who are already thoroughly imbued
(and so, fa- miliar with the jargon and the various ‘goes without
saying’s) either as profes- sional universal algebraists/model theorists
or as graduate students/professionals in theoretical computer
science.
We’ll continue to write for an audience assumed to be fairly
talented up- per year undergraduates specializing in
mathematics. And so we expect a typical (perhaps unexamined)
Platonist attitude, and comfortable knowledge of functions,
equivalence relations, abstract symbols, etc., as well as, for ex-
ample, using the ‘=’ symbol between the names of objects only
when wishing to assert that they are (names for) the same
object.
1
The text [HS] is one of the definite exceptions to the ‘hard for
me to read’ comment above. It is very clear everywhere. But it
requires more familiarity with logic and model theory (at least
the notation) than we need below. It uses model theoretic
language in doing the material we cover, and tends to be more
encyclopaedic, though nothing like [Ba]. So the following 111
or so pages will hopefully serve a useful purpose, including re-
expressing things such as combinatorial completeness and the
‘equivalence’ between combina- tors and λ-calculus in an
alternative, more familiar language for some of us

2
beginners. Also it’s nice not to need to be continually
reminding readers that “=” is used for identicalness of objects,
with the major exception of the objects most discussed (the
terms in the λ-calculus), where a different symbol must be
used, because “=” has been appropriated to denote every
equivalence relation in sight! At risk of boring the experts who
might stumble upon this paper, it will include all but the most
straightforward of details, and plenty of the latter as well (and
stick largely to the extensional case).
Contents.
1. The Formal System Λ.———2
2. Examples and Calculations in Λ.———8
3. So—what’s going on?———17
4. Non-examples and Non-calculability in Λ—undecidability.———24
5. Solving equations, and proving λ-definable.———25
RC
6. Combinatorial Completeness; the Invasion of the Combinators.—45
7. λ-Calculus Models and Denotational Semantics.———63
8. Scott’s Original Models.———76
9. Two (not entirely typical) Examples of Denotational Semantics.—91

VII-1 The Formal System Λ .


To get us off on the correct path, this section will introduce the
λ-calculus strictly syntactically as a formal system. Motivation will
be left for somewhat later than is customary.
Definition of Λ . This is defined inductively below to be a
set of non- empty finite strings of symbols, these strings called
terms. The allowable symbols are

λ || • || ) || ( || x0 || x1 || x2 || x3 · · ·
All but the first four are called variables. The inductive definition
gives Λ as the smallest set of strings of symbols for which (i)
and (ii) below hold:
(i) Each xi Λ (atomic terms).

(ii) If A and B are in Λ, and x is a variable, then

3
(AB) ∈ Λ and (λx • A) ∈ Λ .

4
Definition of free and bound occurrences of variables.
Notice that the definition below is exactly parallel to the
corresponding one in 1storder logic, [LM], Section 5.1. That
is, ‘λx•’ here behaves the same as the quantifiers ‘∀x’ and ‘Ix’ in
logic.
(i) The occurrence of xi in the term xi is free.
(ii) The free occurrences of a variable x in (AB) are its free
occurrences in A plus its free occurrences in B.
(iii) There are no free occurrences of x in (λx A) .

(iv) If y is a variable different than x, then the free
occurrences of y in (λx A) are its free occurrences in A

(v) A bound occurrence is a non-free occurrence.

Definition of A[x→C], the substitution of the term


C for the variable x in the term A.
In effect, the string A[x→C] is obtained from the string A by
replacing each free occurrence of x in A by the string C. To be
able to work with this mathematically, it is best to have an
analytic, inductive definition, as below.
This has the incidental effect of making it manifest that, as long as A and
C
are terms, such a replacement produces a string which is a term.
(i) If x = xi, then i := C .
x[x→C]
(ii) If x xi, then := xi .
x[x→C] i
(iii) (AB)[x→C] := (A[x→C]B[x→C]) .
(iv) (λx • A)[x→C] := (λx • A) .
(v) If y /= x, then (λy • A)[x→C] := (λy • A[x→C]) .
But we really only bother with substitution when it is ‘okay’, as follows:

Definition of “A[x→C] is okay”.


This is exactly parallel to the definition of substitutability in
1 order logic. You can give a perfectly tight inductive
st

definition of this, consulting, if necessary, the proof in VI-3 that


SUB is recursive. But somewhat more informally, it’s just that,
when treated as an occurrence in A[x→C], no free
occurrence of a variable in C becomes a bound occurrence in any of the
5
copies of C substituted for x.
(In CS texts, it is often convenient to force every
substitution to be okay. So they have a somewhat more
complicated definition than (i) to (v) above,

6
from which ‘okayness’ [Or should it be ‘okayity’?] follows automatically.)

Bracket Removal Abbreviations.


From now on, we usually do the following, to produce
strings which are not in Λ, but which are taken to be names for
strings which are in fact terms in the λ-calculus.
(i) Omit outside brackets on a non-atomic term.
(ii) A1A2 · · · An := (A1A2 · · · An−1)An (inductively), i.e.

ABC = (AB)C ; ABCD = ((AB)C)D ; etc.

(iii) λx • A1A2 · · · An := λx • (A1A2 · · · An) , so, for example,


λx • AB /= (λx • A)B .

(iv) For any variables yi,

λy1y2 · · · yn • A := λy1 • λy2 • · · · λyn • A := λy1 • (λy2 • (· · · (λyn • A) · · ·))

The second equality in (iv) is the only possibility, so there’s


nothing to mem- orize there. The first equality isn’t a bracket
removal. Except for it (and so in the basic λ-calculus without
abbreviations), it is evident
• that there is no need at all for the
symbol ‘ ’. After all, except for some software engineers, who
happily invent jargon very useful to them ∀ (but
• which confuses

syntax with semantics), no logician sees the need to use x F in
place of xF . But the abbreviation given by the first equality in
(iv) does force its use, since, for example,
λxy • x /= λx • yx .
If we had dispensed with ‘ ’ right from the beginning, the left-

hand side in this display would be λxλyx, and would not be
confused with its right-hand side, namely λxyx .
There is another statement needed about abbreviations, which
‘goes with- out saying’ in most of the sources I’ve read. But I’ll
say it:
(v) If A is a term, and S, T are (possibly empty) strings of
symbols such that SAT is a term, and if R is an abbreviated
string for A as given by some of (ii) to (iv) just above, then SRT
:= SAT ; that is, the abbreviations in
7
(ii) to (iv) apply to subterms, not just to entire terms. But (i) doesn’t of

8
course; one must restore the outside brackets on a term when
it is used as a subterm of a larger term.
We should comment about the definition of the occurrences of
subterms. The implicit definition above is perfectly good to
begin with: a subterm occurrence is simply a connected
substring inside a term, as long as the substring is also a term
when considered on its own.
However, there is a much more useful inductive definition
just below. Some very dry stuff, as in Appendix B of [LM], shows
that the two definitions agree. The inductive definition is:
(0) Every term is a subterm occurrence in itself.
(i) The atomic term xi has no proper subterm occurrences.
(ii) The proper subterm occurrences in (AB) are all the
subterm occur- rences in A together with all the subterm
occurrences in B .
(iii) The proper subterm occurrences in (λx A) are all the

subterm occurrences in A .
Finally we come to an interesting definition! But why this is so
will remain a secret for a few more pages.
Definition of the equivalence relation ≈ .
This is defined to be the smallest equivalence relation on Λ for
which the following four conditions hold for all A, B, A', B', x and y:
(α) : λx • A ≈ λy • A[x→y]
if A[x→y] is okay and y has no free occurrence in A.
(β) : (λx • A)B ≈ A[x→B] if A[x→B] is okay.
(η) : λx • (Ax) ≈ A if A has no free occurrence
of x . (Congruence Condition) :
A ≈ A' and B ≈ B' =⇒ AB ≈ A'B' and λx • A ≈ λx • A' .
Exercise. Show that
λx • A ≈ λx • A' =⇒ A ≈ A' .
Remarks. The names (α), (β) and (η) have no significance as
far as I know, but are traditional. The brackets on the left-hand
9
side of (η) are not

10
necessary. Most authors in this subject seem to use A B for

our A = B, and then use A = B for our A B . I strongly prefer to

reserve ‘=’ to mean ‘is the same string as’, or more generally,
‘is the same thing as’, for things from Plato’s world. Think
about the philosophy, not the notation, but do get the notation
clear.
And most importantly, this definition more specifically says
that two terms are related by ‘ ’ if and only if there is a finite

sequence of terms, starting with one and ending with the
other, such that each step in the sequence (that is, successive
terms) comes from (α), (β) or (η) being applied to a subterm of
some term, where those three ‘rules’ may be applied in either
direction.
The last remark contains a principle, which again seems to
go without saying in most expositions of the λ-calculus:
Replacement Theorem VII-1.1. If A, B are terms, and S, T
are (pos- sibly empty) strings of symbols such that SAT is a
term, then
(i) SBT is also a term;
(ii) if A ≈ B then SAT ≈ SBT .
This is not completely trivial, but is very easy by induction on the term
SAT . See Theorem 2.1/2.1∗ in [LM] for an analogue.
Here are a few elementary results.
Proposition VII-1.2. For all terms E and Ei for 1 ≤ i ≤ n
such that no variable occurs in two of E, E1, · · · En, we have
(λy1y2 · · · yn • E)E1E2 · · · En = E [ y →1 E 1 ] [ y →2 E2 ]···[yn→En ]
.
Proof. Proceed by induction on n, using (β) liberally. (The
condition on non-common variable occurrences is stronger than
really needed.)
Proposition VII-1.3. If A and B have no free occurrences of x, and if
Ax ≈ Bx, then A ≈ B .
Proof. Using the ‘rule’ (η) for the first and last steps, and
the replace- ment theorem (or one of the congruence
conditions) for the middle step,

11
A ≈ λx • Ax ≈ λx • Bx ≈ B .

Exercise. Show that dropping the assumption (η), but


assuming instead the statement of VII-1.3 , the resulting
equivalence relation agrees with ≈ .

12
Now we want to consider the process of moving from a left-
hand side for one of (α), (β) or (η) to the corresponding right-
hand side, but applied to a subterm, often proper. (However,
for rule (α), the distinction between left and right is irrelevant.)
Such a step is called a reduction.
If one such step gets from term C to term D, say that C reduces to D.
Now, for terms A and B, say that A B if and only if there is a

finite sequence, starting with A and ending with B, such that
each term in the sequence (except the last) reduces to the next
term in the sequence.
Of course, A B implies that A B but the converse is false.

To show that not all terms are equivalent in the –sense, one

seems to need a rather non-trivial result, namely the following
theorem.
Theorem VII-1.4.(Church-Rosser) For all terms A, B and C, if A ≥ B
and A ≥ C, then there is a term D such that B ≥ D and C ≥ D.
We shall postpone the proof (maybe forever); readable ones are given in
[Kl], and in [HS], Appendix 1.
Say that a term B is normal or in normal form if and only if no
sequence of reductions starting from B has any step which is an
application of (β) or (η) (possibly to a proper subterm). An
equivalent statement is that B contains no subterm of the form
of the left-hand sides of (β) or (η), i.e.
(λx A)D ,

or
λx (Ax) if A has no free occurrence of x.

Note that we don’t require “A[x→D] is okay” in the first one.
Effectively, we’re saying that neither reductions (β) nor (η) can
ever be applied to the result of making changes of bound
variables in B.
The claims just above and below do constitute little
propositions, needed in a few places below, particularly
establishing the following important corol- lary of the Church-
Rosser theorem :
For all A, if A B and A C where B and C are both normal,

then B and C can be obtained from each other by applications
of rule (α), that is, by a change of bound variables.
13
So a given term has at most one normal form, up to renaming bound variables.
In particular, no two individual variables, regarded as terms,
are related by , so there are many distinct equivalence classes.

As terms, variables are normal, ≈ and are not related by ‘ ’ to
any other normal term, since there aren’t any bound variables
to rename ! But also, of course, you can

14
find infinitely many closed terms (ones without any free variables)
which are normal, no two of which are related by ‘ ’.

Note also that if B is normal and B C, then C B .

But a term B with this property is not necessarily normal, as
the example just below shows.

VII-2 Examples and Calculations in Λ .


First, here is an example of a term which has no normal form:

(λx • xx)(λx • xx) .


Note that reduction (β) applies to it, but it just reproduces
itself. To prove rigorously that this has no normal form, one
first argues that, in a sequence of single applications of the
three types of reduction, starting with the term above, every
term has the form

(λy • yy)(λz • zz) ,


for some variables y and z. Then use the fact (not proved here)
that, if A has a normal form, then there is a finite sequence of
reductions, starting from A and ending with the normal form.
Though trying to avoid for the moment any motivational
remarks which might take away from the mechanical
formalistic attitude towards the ele- ments of Λ (not Λ itself)
which I am temporarily attempting to cultivate, it is impossible
to resist remarking that the existence of terms with no normal
form will later be seen to be an analogue of the existence of pairs,
(algorithm, input), which do an infinite loop!
Another instructive example is, where x and y are different

variables, (λx • y)((λx • xx)(λx • xx)) .


For this one, there is an infinite sequence of reductions, leading nowhere,
so to speak. Just keep working inside the brackets ending at
the far right over-and-over, as with the previous example. On
the other hand, applying rule (β) once to the leftmost λ, we just
get y, the normal form. So not every sequence of reductions
leads to the normal form, despite one existing. This example
15
also illustrates the fact, refining the one mentioned above, that
by always reducing with respect to the leftmost λ for which a (β) or
(η)-reduction

16
exists (possibly after an (α)-‘reduction’ is applied to change bound
variables, and doing the (β)-reduction when there is a choice
between (β)and (η) for the leftmost λ), we get an algorithm which
produces the normal form, if one exists for the start-term. It
turns out that having a normal form is undecidable, though it is
semi-decidable as the ‘leftmost algorithm’ just above shows.
Now comes a long list of specific elements of Λ . Actually they
are mostly only well defined up to reductions using only (α),
that is, up to change of bound variables. It is important only that
they be well≈defined as equivalence classes under . We give
them as closed terms, that is, terms with no free variables, but
leave somewhat vague which particular bound variables are to be
used. Also we write them as normal forms, all except Y . For
example,
Definitions of T and F .
T := λxy • x := (λx • (λy • x)) ; F := λxy • y := (λx • (λy • y)) ,
where x and y are a pair of distinct variables. The second
equality each time is just to remind you of the abbreviations.
Since terms obtained by altering x and y are evidently
equivalent to those above, using (α), the classes of T and F under

are independent of choice of x and y. Since all the propositions
below concerning T and F are ‘equations’ using ‘ ’, not ‘=’, the

choices above for x and y are irrelevant.
We shall continue for this one section to work in an
unmotivated way, just grinding away in the formal system, by
which I mean roughly the processing of strings by making
reductions, as defined a few paragraphs above. But the
notation for some of the elements defined below is quite
suggestive as to what is going on. As Penrose [Pe], p. XXXX
has said, some of this, as we get towards the end of this section,
is “magical”. . . and . . . “astonishing”! See also the exercise in
the next subsection.

VII-2.1 For all terms B and C, we have


(a) T BC ≈ B ; and (b) F BC ≈ C .

17
Proofs. For the latter, note that
F B = (λx • (λy • y))B ≈ (λy • y)[x→B] = λy • y .
Thu
s
F BC = (FB)C ≈ (λy • y)C ≈ y[y→C] = C .

18
For VII-2.1(a), choose some variable z not occurring in B,
and also different from x. Then

T B = (λx • (λy • x))B ≈ (λx • (λz • x))B ≈ (λz • x)[x→B] = λz • B .

Thus
T BC = (TB)C ≈ (λz • B)C ≈ B[z→C] = B .
Definition of ¬ . Now define
¬ := λz • z F T = λz • ((z F ) T ) /= (λz • z) F T ,
for some variable z. Once again, using rule (α), this is independent, up to
≈, of the choice of z.
VII-2.2 We have ¬ T ≈ F and ¬ F ≈ T .
Proof. Using VII-2.1(a) for the last step,

¬ T = (λz • z F T )T ≈ (z F T )[z→T ] = T F T ≈ F .

Using VII-2.1(b) for the last step,

¬ F = (λz • z F T )F ≈ (z F T )[z→F ] = F F T ≈ T .
Definitions of ∧ and ∨ . Let
∧ := λxy • (x y F ) and ∨ := λxy • (x T y) ,
for a pair of distinct variables x and y.

VII-2.3 We have

∧T T≈T ; ∧T F≈∧F T≈∧F F≈F ,

an
d ∨T T≈∨T F≈∨F T≈T ; ∨F F≈F .

Proof. This is a good exercise for the reader to start becoming a λ-phile.

19
Definitions of 1st, rst and [A, B] .
1st := λx • x T ; rst := λx • x F ; [A, B] := λx • xAB ,
where x is any variable for the first two definitions, but, for the
latter defini- tion, must not occur in A or B, which are any terms.
Note that [A, B] is very different than (AB)
VII-2.4 We have
1st[A, B] ≈ A and rst[A, B] ≈ B .

Proof. Choosing y not occurring in A or B, and different


than x, 1st[A, B] = (λx • x T )(λy • yAB) ≈ (x T )
[x→(λy•yAB)]

= (λy • yAB)T ≈ T AB ≈ A ,
using VII-2.1(a). The other one is exactly parallel.

Definitions of ith and [A1, A2, · · · , An] .


Inductively, for n ≥ 3, define
[A1, A2, · · · , An] := [A1, [A2, · · · , An]] ,
so [A, B, C] = [A, [B, C]] and [A, B, C, D] = [A, [B, [C, D]]], etc.
Thus, quoting VII-2.4,
1st[A1, A2, · · · , An] ≈ A1 and rst[A1, A2, · · · , An] ≈ [A2, A3, · · · , An] .
Here [A] means A, when n = 2. We wish to have, for 1 ≤ i < n ,
(i + 1)th[A1, A2, · · · , An] ≈ Ai+1 ≈ ith[A2, · · · , An] ≈ ith(rst[A1, A2, · · · , An]) .
So it would be desirable to define ith inductively so that
(i + 1)th E ≈ ith (rst E)
for all terms E. But we can’t just drop those brackets and
‘cancel’ the E for this !! However, define

(i + 1)th := B ith rst ,

20
wher
e S := λxyz • (xz)(yz) and B := S (T S) T .
(Of course 1th is the same as 1st, and maybe we should have
alternative names 2nd and 3rd for 2th and 3th !) This is all we
need, by the second ‘equation’ below:

VII-2.5 We have, for all terms A, B and C,

SABC ≈ (AC)(BC) and BABC ≈ A(BC) .

Proof. The first equation is a mechanical exercise, using (β)


three times, after writing S using bound variables that don’t
occur in A, B or C. Then the second one is a good practice in
being careful with brackets, as follows:

BABC = S (T S) T ABC = (S (T S) TA)BC ≈ (T S)A(TA)BC


= (T S A)(TA)BC ≈ S(TA)BC ≈ (TA)C(BC) = (TAC)(BC) ≈ A(BC) .
We have twice used both the first part and VII-
2.1(a).

Don’t try to guess what’s behind the notation S and B—they


are just underlined versions of the traditional notations for
these ‘operators’. So I used them, despite continuing to use B
(not underlined) for one of the ‘general’ terms in stating results.
For those who have already read something about combinatory
algebra, later we shall also denote T alternatively as K, so that S
and K are the usual notation (underlined) for the usual two
generators for the combinators. The proof above is a foretaste of
combinatory logic from three subsections ahead. But we could
have just set

B := λxyz • x(yz) ,
for our purposes here, and proved that BABC ≈ A(BC) directly.
Definitions of I and An . Define, for any term A,

21
A0 := I := S T T ,

22
and
inductively An+1 := B A An .
VII-2.6 For all terms A and B, and all natural numbers i and j, we have

A0B = I B ≈ B ; I ≈ λx • x ; and Ai(AjB) ≈ Ai+jB .


Also A1 ≈ A.
Proof. This is a good exercise, using induction on i for the 3rd
displayed identity.

Exercise. (i) Sometimes A2 AA . For example, try A = T or F .


/
(ii) Do the following sometimes give four distinct equivalence
classes of terms :
A3 ; AA2 ; AAA = (AA)A ; A(AA) ≈ A2A ?
Definitions of s, isz and n . Define, for natural numbers n,

n := λuv • unv ; s := λxyz • (xy)(yz) ; isz := λx • x(λy • F )T .


Here, u and v are variables, different from each other, as are x, y and z.
VII-2.7 We have, for all natural numbers n,

s n≈ n+1 ; isz 0 ≈ T ; isz n ≈ F if n > 0 .


Proof. For any n, using four distinct variables,

isz n = (λx • x(λy • F )T )(λuv • unv) ≈ (λuv • unv)(λy • F )T


≈ (λv • (λy • F ) nv)T ≈ (λy • F )n T

(∗) So when n = 0, we get

isz 0 ≈ (λy • F )0T ≈ T ,


since A0B ≈ B quite generally.
Note that (λy • F )B ≈ F for any B, merely because F is a closed term.
So when n > 0, by (∗) and VII-2.6 with i = 1 and j = n − 1,
isz n ≈ (λy • F )((λy • F )n−1T ) ≈ F .

23
The latter ‘ ’ is from the remark just before the display.

Finally, for any closed term B and many others, the
definition of s and rule (β) immediately give

s B ≈ λyz • By(yz)

So

s n ≈ λyz • (λuv • unv)y(yz) ≈ λyz • (λv • ynv)(yz)


≈ λyz • y n(yz) ≈ λyz • y z) ≈ n + 1 .
The penultimate step uses VII-2.6 with i = n and j = 1. and
depends on knowing that A1 ≈ A.
Exercise.
(a) Define + := λuvxy • (ux)(vxy) . Show that + k l ≈ k + l .
(m) Define × := λuvy • u(vy) . Show that × k l ≈ kl .
Definitions of pp and p . For distinct variables x, y, z, u and w, define

pp := λuw • [ F , 1stw(rstw)(u(rstw)) ] ,

an
d p := λxyz • rst(x(pp y)[ T , z ]) .
VII-2.8 (Kleene) For all natural numbers n > 0, we have p n ≈ n − 1 .
Proof. First we show

pp x [ T , y ] ≈ [ F , y ] and pp x [ F , y ] ≈ [ F , xy ] .

Calculate :

pp x [ T , y ] ≈ (λuw • [ F , 1stw(rstw)(u(rstw)) ]) x [ T , y ]
≈ (λw • [ F , 1stw(rstw)(x(rstw)) ]) [ T , y ]
≈ [ F , 1st[ T , y ](rst[ T , y ])(x(rst[ T , y ])) ]
≈ [ F , T y(xy) ] ≈ [ F , y ] .

24
The last step uses VII-2.1(a).
Skipping the almost identical middle steps in the other one, we get

pp x [ F , y ] ≈ · · · ≈ [ F , F y(xy) ] ≈ [ F , xy ] .
Next we deduce

(pp x)n [ F , y ] ≈ [ F , xny ] and (pp x)n [ T , y ] ≈ [ F , xn−1y ] ,


the latter only for n > 0. The left-hand identity is proved by
induction on n, the right-hand one deduced from it, using VII-

2.6 with i = n 1 and j = 1 twice and once, respectively.
For the left-hand identity, when n = 0, this is trivial, in view
of the fact that A0B B. When n = 1, this is just the right-hand

identity of the two proved above, in ≈
view of the fact that A1 A .
Inductively, with n 2, we get

(pp x)n [ F , y ] ≈ (pp x)n−1(pp x [ F , y ]) ≈ (pp x)n−1[ F , xy ]


≈ [ F , xn−1(xy) ] ≈ [ F , xny ] .
(Since xy isn’t a variable, the penultimate step looks fishy, but
actually, all these identities hold with x and y as names for
arbitrary terms, not just
variables.)
For the right-hand identity,

(pp x)n [ T , y ] ≈ (pp x)n−1(pp x [ T , y ]) ≈ (pp x)n−1[ F , y ] ≈ [ F , xn−1y ] .


Now to prove Kleene’s striking (see the next subsection)
discovery, using the (β)-rule with the definition of p, we see that,
for all terms A,

p A ≈ λyz • rst(A(pp y)[ T , z ]) .


Thus

p n ≈ λyz•rst((λuw•unw)(pp y)[ T , z ]) ≈ λyz•rst((λw•(pp y)nw)[ T , z ])


≈ λyz • rst((pp y)n[ T , z ]) ≈ λyz • rst[ F , yn−1z ])
≈ λyz•yn−1z ≈ n − 1 .

25
Exercise. Show that p 0 ≈ 0 .
Exercise. Do we have p s ≈ I or s p ≈ I ?
Definition of Y .
Y := λx • (λy • x(yy))(λy • x(yy)) .
Of course, x and y are different from each
other.
It is a very powerful concept, one that in a sense has inspired many math-
ematical developments, such as recursion theory, indeed also some literary
productions, such as Hofstadter’s popular book.
Erwin Engeler [En]

VII-2.9 For all terms A, there is a term B such that AB B . In



fact, the term B = Y A will do nicely for that.
Proof. Using the (β)-rule twice, we see that
Y A = ( λx • (λy • x(yy))(λy • x(yy)) )A ≈ (λy • A(yy))(λy • A(yy))
≈ A((λy • A(yy))(λy • A(yy))) ≈ A (Y A) ,
as required. The last step used the ‘ ’ relation between the

first and third terms in the display.
It is interesting that Curry’s Y apparently fails to give either
A(Y A) ≥ Y A, or Y A ≥ A(Y A). It can be useful to replace it by
Turing’s
Z := (λxy • y(xxy))(λxy • y(xxy)) ,
another so-called fixed point operator for which at least one does have
ZA ≥ A(ZA) for all A in Λ.
Recall that a closed term is one with no free variable occurrences.
VII-2.10 Let x and y be distinct variables, and let A be a
term with no free variables other than possibly x and y. Let F :=

Y (λyx A) . Then, for any closed term B, we have

FB ≈ A[y→F ][x→B] .

26
The ‘okayness’ of the substitutions below follows easily from the
restrictions on A and B, which are stronger than strictly needed,
but are satisfied in all the applications.
Proof. Using the basic property of Y from VII-2.9 for the first step,

FB ≈ (λyx • A)FB ≈ (λx • A)[y→F ]B ≈ (λx • A[y→F ])B ≈ A[y→F ][x→B] .

VII-2.11 For all terms G and H, there is a term F such that

F 0 ≈ G and F n + 1 ≈ H[ Fn , n ] for all natural numbers n .


(My goodness, this looks a lot like primitive recursion!)
Proof. Let
F := Y (λyx • A) ,
as in the previous result, where x and y are distinct variables
which do not occur in G or H, and where

A := (isz x) G (H[ y(px) , px ]) .

Then, for any term B, using VII-2.10 for the first step,

FB ≈ A[y→F ][x→B] ≈ (isz B) G (H[ F (pB) , pB ] ) .

First take B = 0. But isz 0 ≈ T by VII-2.7, and T GJ ≈ G by


VII-2.1(a), so this time we get F 0 ≈ G , as required.
Then take B = n + 1. But isz n + 1 F by VII-2.7,

and FGJ J by
VII-2.1(b), so here we get

F n + 1 ≈ H[ F (p n + 1) , p n + 1] ≈ H[ F n , n ] ,

as required, using VII-2.8 for the second step.

VII-3 So—what’s going on?


Actually, I’m not completely certain myself. Let’s review
the various examples just presented.

27
At first we saw some terms which seemed to be modelling
objects in propositional logic. Here are two slightly curious
aspects of this. Firstly, T and F are presumably truth values in a
sense, so come¬ ∧ from∨ the semantic side, whereas , and are
more like syntactic objects. Having them all on the same footing
does alter one’s perceptions slightly, at least for non-CSers.
Secondly, we’re not surprised to see the latter versions of the
connectives acting like functions which take truth values as
input, and produce
∧ truth values as output. But everything’s on
a symmetric footing, so writing down a term like F now seems
like having truth values as functions which can take
connectives as input, not a standard thing to consider. And F F
seems even odder, if interpreted as the truth value ‘substituted
into itself’!
But later we had terms n which seemed to represent
numbers, not func- tions. However, two quick applications of
the β-rule yield
n¯ A B ≈ AnB ≈ A(A(A · · · (AB)) · · ·)) .
So if A were thought of as representing a function, as explained
a bit below, the term n¯ may be thought of as representing that
function which maps A to its n-fold iterate An.
Now the n¯ , along with s and p as successor and
predecessor functions, or at least terms representing them,
give us the beginnings of a numeral system sitting inside Λ. The
other part of that numeral system is isz , the “Is it 0?”–
predicate. Of course predicates may be regarded as functions.
And the final result goes a long way towards showing that
terms exist to represent all primitive recursive functions. To
complete the job, we just need to amplify VII-2.11 to functions
of more than one variable, and make some observations about
composition. More generally, in the subsection after next, we
show that all (possibly partial) recursive functions are definable in
the λ- calculus. It is hardly surprising that no others are,

thinking about Church’s thesis.
Exercise. Show that k l lk . Deduce that 1 is alternatively
inter- pretable as a combinator which defines the exponential
28
function. (See the exercise before VII-2.8 for the addition and
multiplication functions.)
Show that 9 9 9 9 n , where n > 2m, where m is greater than

the mass of the Milky Way galaxy, measured in milligrams !
It is likely clear to the reader that [A , B ] was supposed to
be a term which represents an ordered pair. And we then
produced ordered n-tuple

29
terms in Λ, as well as terms representing the projection functions.
So we can maybe settle on regarding each term in Λ as
representing a function in some sense, and the construction
(AB) as having B fed in as input to the function represented by
A, producing an ‘answer’ which again represents a function. But
there is sense of unease (for some of us) in seeing what appears
to be a completely self-contained system of functions, every
one of which has the exact same set of all these functions
apparently as both its domain and its codomain. (A caveat here
—as mentioned below, the ex- istence of terms without normal
forms perhaps is interpretable as some of these functions being
partial.) That begins to be worrisome, since except for the 1-
element set, the set of all functions from X to itself has strictly
larger cardinality than X. But nobody said it had to be all
functions. However it still seems a bit offputting, if not
inconsistent, to have a bunch of functions, every one of which
is a member of its own domain! (However, the formula for Y
reflects exactly that, containing ‘a function which is substituted
into itself’—but Y was very useful in the last result, and that’s only
the beginning of its usefulness, as explained in the subsection
after next. The example of a term with no normal form at the
beginning of Subsection VII-2 was precisely ‘a function which is
substituted into itself’.) The “offputting” aspect above arises
perhaps from the fact that, to re-interpret this stuff in a consistent
way within axiomatic 1storder set theory, some modification
(such as abandon- ment) of the axiom of foundation seems to be
necessary. See however Scott’s constructions in Subsection VII-
8 ahead.
In fact, this subject was carried on from about 1930 to 1970 in a very
syntactic way, with little concern about whether there might exist
mathemat- ical models for the way Λ behaved, other than Λ
itself, or better, the ≈set of equivalence classes Λ/ . But one at
least has the Church-Rosser theorem, showing ≈ that, after
factoring out by the equivalence relation , the whole thing
doesn’t reduce to the triviality of a 1-element set. Then around
1970 Dana Scott produced some interesting such models, and not
just because of pure mathematical interest. His results, and later
30
similar ones by others, are now regarded as being very
fundamental in parts of computer science. See Subsections VII-
7, VII-8 and VII-9 ahead.
Several things remain to be considered. It’s really not the
terms them- selves, but rather their corresponding equivalence

classes under which might be thought of as functions. The terms
themselves are more like recipes for calculating functions. Of
what does the calculation consist? Presumably

31
its steps are the reductions, using the three rules, but
particularly (β) and (η). When is a calculation finished? What is
the output? Presumably the an- swer is that it terminates when
the normal form of the start term is reached, and that’s the
output. Note that the numerals n themselves are in normal
form (with one exception). But what if the start term has no
normal form? Aha, that’s your infinite loop, which of course must
rear its ugly head, if this scheme is supposed to produce some
kind of general theory of algorithms. So perhaps one should
consider the start term as a combination of both input data and
procedure. It is a fact that a normal form will eventually be
pro- duced, if the start term actually has a normal form, by always
concentrating on the leftmost occurrence of λ for which an
application of (β) or (η) can do a reduction (possibly preceded
by an application of (α) to change bound variables and make the
substitution okay). This algorithm no doubt qualifies intuitively as
one, involving discrete mechanical steps and a clear ‘signal’, if
and when it’s time to stop computing. The existence of more
than one uni- versal Turing machine, more than one ATEN-
command for computing the universal function, etc. . . .
presumably correspond here to the possibility of other definite
reduction procedures, besides the one above, each of which
will produce the normal form of the start term (input), if the
start term has one. As mentioned earlier, such reduction
schemes are algorithms showing the semidecidability of the
existence of a normal form. Later we’ll see that this question is
undecidable.
We have probably left far too late a discussion of what the
rules (β) and (η) are doing. And indeed, what is λ itself? It is
often called the abstraction-operator. The symbol string ‘λx ’ is to

alert the computer (hu- man or otherwise) that a function of
the variable x is about to be defined. And of course it’s the free
occurrences of x in A which give the ‘formula’ for the function

which λx A is supposed to be defining.
So now the explanations of (β) and (η) are fairly clear :
The rule (β) just says that to evaluate the function above
on B, i.e. (λx A)B , you substitute B for the free occurrences of x

32

in A (of course!). And rule (η) just says that, if x doesn’t occur
freely in A, then the function defined by λx (Ax) is just A itself
(of course—‘the function obtained by
evaluating A on its argument!).
[As an aside, it seems to me to be not unreasonable to ask
why one shouldn’t change the symbol ‘λ’ to a symbol ‘ ’. After
'
all, that’s what we’re talking about, and it has already been
noted that the second basic

33
symbol ‘ ’ is quite unneeded, except for one of the

abbreviations. So the string (λx A) would be replaced by (x
• '
A). Some things would initially be more readable for ordinary λ-
inexperienced mathematicians. I didn’t want to do that in the last
section, because that would have given the game away too
early. And I won’t do it now, out of respect for tradition.
' Also
we can think of ‘λ’ as taking place in a formal language,
whereas ‘ ’ is a concept from the metalanguage, so maybe that
distinction is a good one to maintain in the notation. Actually, λ-
philes will often use a sort of ‘Bourbaki-λ’ in their
metalanguages !]
This has become rather long-winded, but it seems a good place
to preview Subsection VII-6, and then talk about the history of
the subject, and its successful (as well as aborted) applications.
There is a small but infinite subset of Λ called the set of
combinators (up to “ ”, just the set of closed terms), some of

whose significance was discovered by Schonfinkel around
1920, without explicitly dealing with the λ-calculus. We shall
treat it in some detail in Subsection VII-6, and hopefully explain
more clearly than above about just which functions in some
abstract sense are being dealt with by the subject, and what it
was that Schonfinkel discovered.
Independently reinventing Schonfinkel’s work a few years
later, Curry attempted to base an entire foundations of
mathematics on the combinators, as did Church soon after, using
the λ-calculus, which he invented. But the system(s) proposed
(motivated by studying very closely the processes of
substitution occurring in Russell & Whitehead’s Principia
Mathematica) were soon discovered to be inconsistent, by
Kleene and Rosser, students of Church. There will be nothing
on this (largely abortive) application to foundations here
(beyond the present paragraph). It is the case that, for at least
45 years after the mid-1930’s when the above inconsistency
was discovered (and possibly to the present day), there
continued to be further attempts in the same direction by a
small group, a subject known as illiative combinatory logic. [To
34
get some feeling for this, take a look at [Fi], but start reading
at Ch.1. Skip the Introduction, at least to begin, and also the
Preface, or at least don’t take too seriously the claims there,
before having read the entire book and about 77 others,
including papers of Peter Aczel, Solomon Fefferman and Dana
Scott.] It’s not totally unreasonable to imagine

35
a foundational system in which “Everything is a function!” might
be attractive. After all, our 1storder set theory version of
foundations is a system in which “Everything is a set!” (and
G¨odel seems to tell us that, if it is consistent, we can never be
very sure of that fact). On the other hand, it is also hardly
surprising that there might be something akin to Russell’s
paradox, which brought down Frege’s earlier higher order
foundations, but in an attempted system with the combinators.
The famous fixed point combinator Y (see VII- 2.9), or an analogue,
played a major role in the Kleene-Rosser construction showing
inconsistency.

In current mathematics . . . the notion of set is more fundamental than


that of function, and the domain of a function is given before the function
itself. In combinatory logic, on the other hand, the notion of function is
fundamental; a set is a function whose application to an argument may
sometimes be an assertion, or have some other property; its members are
those arguments for which the application has that property. The
function is primary; its domain, which is a set, is another function. Thus it
is simpler to define a set in terms of a function than vice versa; but the idea
is repugnant to many mathematicians, . . .
H.B. Curry [BKK-ed]

What I always found disturbing about combinatory logic was what seemed
to me to be a complete lack of conceptual continuity. There were no functions
known to anyone else that had the extensive properties of the combinators
and allowed self-application. I agree that people might wish to have such
functions, but very early on the contradiction found by Kleene and Rosser
showed there was trouble. What I cannot understand is why there was not
more discussion of the question of how the notion of function that was be-
hind the theory was to be made even mildly harmonious with the “classical”
notion of function. The literature on combinatorial logic seems to me to
be somehow silent on this point. Perhaps the reason was that the hope of
“solving” the paradoxes remained alive for a long time—and may still be
alive.
D. Scott [BKK-ed]

And that is a good lead-in for a brief discussion of the


history of the λ-calculus as a foundation for computability,
indeed the very first founda- tion, beating out Turing machines
36
by a hair. Around 1934, Kleene made great progress in showing
many known computable functions to be definable

37
in the λ-calculus. A real breakthrough occurred when he saw
how to do it for the predecessor function, our VII-2.8. It was
based on Kleene’s re- sults that his supervisor, Church, first
made the proposal that the intuitively computable functions be
identified with the mathematically defined set of λ-definable
functions. This is of course Church’s Thesis, later also called
the Church-Turing Thesis, since Turing independently proposed
the Turing computable functions as the appropriate set within a
few months, and gave a strong argument for this being a
sensible proposal. Then he proved the two sets to be the same,
as soon as he saw Church’s paper. The latter pa- per proved
the famous Church’s Theorem providing a negative solution to
Hilbert’s Entscheidungsproblem, as had Turing also done
independently. See the subsection after next for the definition
of λ-definable and proof that all recursive functions are λ-
definable.
Finally, I understand that McCarthy, in the late 1950’s,
when putting forth several fundamental ideas and proposing what
have come to be known as functional programming languages (a bit of
which is the McSELF pro- cedures from earlier), was directly
inspired by the λ-calculus [Br-ed]. This lead to his invention of
LISP, the first such language (though it’s not purely functional,
containing, as it does, imperative features). It is said that all
such languages include, in some sense, the λ-calculus, or even
that they are all equivalent to the λ-calculus. As a miniature
example, look at Mc- SELF in [CM]. A main feature
differentiating functional from imperative programming
languages (of which ATEN is a miniature example) is that each
program, when implemented, produces steps which (instead of
altering a ‘store’, or a sequence of ‘bins’ as we called it) are
stages in the calculation of a function, rather like the reduction
steps in reducing a λ-term to normal form, or at least
attempting to do so. Clearly we should expect there to be a
theorem saying that there can be no algorithm for deciding
whether a λ-term has a normal form. See the next subsection
for a version of this theorem entirely within the λ-calculus.

38
Another feature differentiating the two types of languages
seems to be a far greater use of the so-called recursive (self-
referential) programs in the functional languages. In the case
of our earlier miniature languages, we see that leaving that
feature out of McSELF would destroy it (certainly many
computable functions could not be programmed), whereas one
of the main points of Subsection IV-9 was to see that any
recursive command could be replaced by an ordinary ATEN-
command.

39
We shall spend much time in the subsection after next with
explaining in detail how to use the Y -operator to easily produce
terms which satisfy equations. This is a basic self-referential
aspect of computing with the λ- calculus. In particular, it
explains the formula which was just ‘pulled out of a hat’ in the
proof of VII-2.11.
We shall begin the subsection after next with another, much
simpler, technical idea, which implicitly pervades the last
subsection. This is how the triple product ABC can be regarded
as containing the analogue of the if-then-else-construction which
is so fundamental in McSELF.
First we present some crisp and edifying versions within the λ-
calculus of familiar material from earlier in this work.
VII-4 Non-examples and Non-calculability in Λ—undecidability.
Thinking of terms in Λ as being, in some sense, algorithms,
here are a few analogues of previous results related to the non-
existence of algo- rithms/commands/Turing machines/recursive
functions. All this clearly de- pends on the Church-Rosser
theorem, since the results below would be man- ifestly false if,
for example, all elements of Λ had turned out to be related
under “≈” . Nothing later depends on this subsection.
VII-4.1 (Undecidability of .) There is no term E Λ
≈ ∈
such that, for all
A and B in Λ,
( T if A ≈ B ;
E AB F if A /≈ B .

First Proof. Suppose, for a contradiction, that E did exist, and define
u := λy • E(yy)0 1 0 .
By the (β)-rule, we get uu ≈ E(uu)0 1 0 . Now either uu ≈ 0 or uu /≈ 0 .
But in the latter case, we get uu ≈ F 1 0 ≈ 0 , a contradiction.
And in the former case, we get uu ≈ T 1 0 ≈ 1,
contradicting uniqueness of normal form, which tells us that 1
/≈ 0.
VII-4.2 (Analogue of Rice’s Theorem.) If R ∈ Λ is such that
40
∀A ∈ Λ either R A ≈ T or R A ≈ F ,
then either ∀A ∈ Λ , R A ≈ T or ∀A ∈ Λ , R A ≈ F .

41
Proof. For a contradiction, suppose we can choose terms B and
C such that RB ≈ T and RC ≈ F . Then define
M := λy • R y C B and N := Y M.
By VII-2.7, we get N ≈ MN . Using the (β)-rule for the second ≈ ,
RN ≈ R(MN ) ≈ R(RNCB) .
Since RN ≈ either T or F , we get either
T ≈ R (T CB) ≈ R C ≈ F ,
or
F ≈ R (F CB) ≈ R B ≈ T ,
both of which are rather resounding contradictions to uniqueness
of normal form.
Second Proof of VII-4.1. Again assuming E exists, fix any B
in Λ and define R := EB. This immediately contradicts VII-4.2,
by choosing any C with C /≈ B , and calculating RB and RC .
Second Corollary to VII-4.2. (Undecidability of the existence
of normal form.) There is no N ∈ Λ such that

( T if A has a normal form ;


N A F otherwise .

This is immediate from knowing that some, but not all, terms
do have a normal form.

Third Proof of VII-4.1. As an exercise, show that there is


a ‘ 2- variable’ analogue of Rice’s Theorem : that is, prove the
non-existence of a nontrivial R2 such that R2AB always has
normal form either T or F . Then VII-4.1 is immediate.

42
VII-5 Solving equations, and proving RC ⇐⇒ λ-definable.
Looking back at the first two results in Subsection VII-2, namely
T AB ≈ A and F A B ≈ B ,
we are immediately reminded of if-then-else, roughly : ‘a true
condition says to go to A, but if false, go to B.’
Now turn to the displayed formula defining A in the proof of VII-2.11,
i.e.
A := (isz x) G (H[ y(px) , px ]) .
This is a triple product, which is more-or-less saying
‘if x is zero, use G, but if it is non-zero, use H[ y(px) , px ]’ .
This perhaps begins to explain why that proof works, since we are
trying to get a term F which, in a sense, reduces to G when x is
zero, and to something involving H otherwise. What still needs
more explanation is the use of Y , and the form of the term H[
y(px) , px ] . That explanation in general follows below; both it
and the remark above are illustrated several times in the
constructions further down.
The following is an obvious extension of VII-2.10.
Theorem VII-5.1 Let z, y1, , yk be mutually distinct
·
variables and let A be any term in which no other variables
occur freely. Define F to be the closed term Y (λzy1 · · · yk • A) .
Then, for all closed terms V1, · · · , Vn , we
ha
FV 1 · · · Vn ≈ A[z→F ][y →V ] · · ·[y →V ] .
1 1 k k

Proof. (This is a theorem, not a proposition, because of its


importance, not the difficulty of its proof, which is negligible!)
Using the basic property of Y from VII-2.9 for the first step,
and essentially VII-1.2 for the other steps,
FV 1 · · · Vk ≈ (λzy1 · · · yk • A)FV1 · · · Vk
≈ (λy1 · · · yk • A)[z→F ]V1 · · · Vk ≈ (λy1 · · · yk • A[z→F ])V1 · · · Vk
· · · ≈ · · · ≈ A[z→F ][y →V ] · · ·[y →V ] .
1 1 k k

Why is this interesting? Imagine given an ‘equation’

43
F n1 · · · nk ≈ − − − F − − − − − −F − − − F − − − ,

44
where the right-hand side is something constructed using the
λ-calculus, except that F is an unknown term which we wish to
find, and it may occur more than once on the right-hand side.
We need to construct a term F which satisfies the equation.
The right-hand· side will probably also have the numerals n1 nk
appearing. Well, the above theorem tells us how to solve it.
Just produce the term A by inventing “k + 1” distinct variables
not occurring bound in the right-hand side above, and use
them to replace the occurrences of F and of the numerals in
that right-hand side. That will produce a term, since that’s
what we meant by saying that the right-hand side was
“constructed using the λ-calculus”. Now the theorem tells us
the formula for F in terms of A.
Of course the theorem is more general, in that we can use
any closed terms Vi in place of numerals. But very often it’s
something related to a function of k-tuples of numbers where
the technique is used. In the specific example of the proof of
VII-2.11, the function f to be represented can be written in the
if-then-else-form as
“f (x) , if x = 0, is g, but otherwise is h(f (pred(x)), pred(x)) .”
Here “pred” is the predecessor function. Also g is a number
represented by the term G (which is presumably a numeral),
and h is a function of two variables represented by H . This
immediately motivates the formula for A in the proof of VII-
2.11 .
What we called the “basic property of Y from VII-2.9”
says that it is a so-called fixed point operator. There are
many other possibilities for such an operator. But the detailed
form of Y is always irrelevant to these constructions in
existence proofs. However, when getting right down to the
details of compiling a functional programming language, those
details are undoubtedly essential, and there, presumably, some
Y ’s are better than others. We gave another one, due to
Turing, soon after the introduction of Curry’s Y in Subsection
VII-2.

45
Definition of the numeral equality predicate term, eq .
Here is a relatively simple example, producing a useful little
term for testing equality of numerals (but only numerals—testing
for ≈ in general is

46
unecidable, as we saw in the previous subsection!); that is, we require

( T if n = k ;
eq n k F if n /= k .

An informal McSELF-like procedure for the actual predicate would be
EQ(n, k) : if n = 0

then if k = 0
then
true
else
false
else if k = 0
then
false
else EQ(n 1, k 1)

(This is not really a McSELF procedure for two reasons—
the equality predicate is a primitive in McSELF, and the values
should be 1 and 0, not true and false. But it does show that we
could get ‘more primitive’ by beginning McSELF only with
‘equality to zero’ as a unary predicate in place of the binary
equality predicate.)
What the last display does is to tell us a defining equation
which eq must satisfy:

eq n k ≈ (isz n)((isz k)T F )((isz k)F (eq(p n)(p k))) .


Thus we merely need to take
eq := Y (λzyx • A) ,
where
A := (isz y)((isz x)T F )((isz x)F (z(p y)(p x))) .
[Purely coincidentally, the middle term in that triple
product, namely (isz x)T F could be replaced by simply isz x ,
since T T F ≈ T and F T F ≈ F .]

λ-definability
First here is the definition. In the spirit of this work, we go
47
directly to the case of (possibly) partial functions, rather than
fooling around with totals first.

48
Definition. Let D ⊂ Nk and let f : D → N . Say that the
function f is λ-definable if and only if there is a term F in Λ such
that the term Fn 1 · · · nk has a normal form for exactly those (n1,
, nk) which are in D, the domain ·
of f , and, in this case, we have
F n1 · · · nk ≈ f (n1, · · · , nk) .

We wish to prove that all (partial) recursive functions are λ-


definable. This definitely appears to be a somewhat involved
process. In particular, despite the sentiment expressed just
before the definition, it seems most efficient to deal with the λ-
definability of total recursive functions first. So the wording of
the following theorem is meant to indicate we are ignoring all
the tuples not in the domains. · · · That is, no· claim is made about
the non- existence of normal forms of Hn1 nk , for (n1, , nk)
not in the domain of h. This theorem is a main step for dealing
with minimization (in proving all recursive functions to be λ-
definable).
Theorem VII-5.2 If g : Dg → N has a Λ-term which computes
it on its domain Dg ⊂ Nk, then so has h, where h(n1, · · · , nk) is
defined to be
min{l | l ≥ n1 ; (m, n2, · · · , nk) ∈ Dg for n1 ≤ m ≤ l and g(l, n2, · · · , nk) =
0}. The function h has domain
Dh = {(n1, n2, · · · , nk) | Il with l ≥ n1 , g(l, n2, · · · , nk) = 0
and (m, n2, · · · , nk) ∈ Dg for n1 ≤ m ≤ l
} . In particular, if g is total and h happens to be total, λ-
definability of g implies λ-definability of h .
Proof. This is another example of using the fixed point
operator, i.e. the term Y . Here is an informal McSELF
procedure for h :
h(n1, · · · , nk) ⇐:
if g(n1, n2, · · · , nk) = 0
then n1
else h(n1 + 1, n2, , nk)
·
So, if G is a Λ-term defining g on its domain, then a defining
equation for a term H for h is
49
Hn1 · · · nk ≈ (isz (Gn1 · · · nk))n1(H(s n1)n2 · · · nk) .

50
Thus we merely need to take
H := Y (λzy1 · · · yk • A) ,
wher
e
A := (isz (Gy1 · · · yk))y1(z(s y1)y2 · · · yk) .
It’s surprising how easy this is, but ‘that’s the power of Y ’ !

But to repeat: the non-existence of a normal form for Hn1 nk


·
away from the domain of h is quite likely false! For the next
theorem, we can actually give a counter-example to establish
that the proof really only gives closure of the set of total λ-
definable functions under composition, despite the fact that, as
we’ll see later, the result is true in the general case of (possibly
strictly) partial functions.

Theorem VII-5.3 The set of total λ-definable functions is


closed under composition. That is, given λ-definable total
functions as follows :
g, of “a” variables ; and h1, · · · , ha, each of “b” variables ; their composition
(n1, · · · , nb) '→ g(h1(n1, · · · , nb), − − −, ha(n1, · · · , nb))
is also λ-definable. More generally, we can again assume the
functions are partial and produce a Λ-term for the composition,
using ones for the ingre- dients; but no claim can be made about
lack of normal form away from the domain.
Example. We have that the identity function, g : N → N, is λ-
defined by G = λx • x . And H = (λx • xx)(λx • xx) defines the
empty function h : ∅ → N. The composite g ◦ h is also the
empty function. But the term D1,1GH constructed in the
following proof does not λ-define it; in fact, it
defines the identity function.
Proof. Let terms G and H1, , Ha represent the given
·
functions, so that
G m1 · · · ma ≈ g(m1, · · · , ma)
an
d Hi n1 · · · nb ≈ hi(n1, · · · , nb) ,
51
each holding for those tuples for which the right-hand side is defined.

52
Now there is a term Da,b such that

Da,bGH1 · · · HaN1 · · · Nb ≈ G(H1N1 · · · Nb)(H2N1 · · · Nb)−−−(HaN1 · · · Nb)

(∗) holds for any terms G, H1, · · · , Ha, N1, · · · , Nb .


Assuming this for the moment, and returning to the G, H1, · · · , Ha earlier,
the term Da,bGH1 · · · Ha will represent the composition. Just calculate :
Da,bGH1 · · · Ha n1 · · · nb ≈ G(H1 n1 · · · nb)(H2 n1 · · · nb) −−−(H a n1 · · · nb)
≈ G h1(n1, · · · , nb) h2(n1, · · · , nb) − − − ha(n1, · · · , nb)
≈ g(h1(n1, · · · , nb), − − −, ha(n1, · · · , nb)) .

To prove ( ), we give two arguments, the first being facetious:



Firstly, it’s just another one of those equations to be solved
using the fixed point operator, except that the number of times
the unknown term Da,b appears on the right-hand side of its
defining equation is zero. So this is a bit of a phoney
application, as we see from the second proof below. In any
case, by this argument, we’ll set

Da,b := Y (λzuy1 · · · yax1 · · · xb • A) ,

for a suitable set of “2 + a + b” variables, where

A := u(y1x1 · · · xb)(y2x1 · · · xb) − − − (yax1 · · · xb) .

The other argument is a straightforward calculation, directly defining

Da,b := λuy1 · · · yax1 · · · xb • u(y1x1 · · · xb)(y2x1 · · · xb) − − − (yax1 · · · xb) .

This more than completes the proof.


The assertion in ( ) is one instance of a general phenomenon

called com- binatorial completeness, which we study in the next
subsection. There will be a choice between two proofs there as
well, exactly analogous to what we did above. The identities in
VII-2.5 are other examples of this phenomenon.

53
Theorem VII-5.4 Any total recursive function is λ-definable.
Proof. We use the fact that any such function can be
obtained from the starters discussed below, by a sequence of
compositions and minimizations which use as ingredients (and
produce) only total functions. See [CM]. The starters are shown
to be λ-definable below, and VII-5.3 deals with composition.
So it remains only to use VII-5.2 to deal with minimization.
Suppose that p : Nn+1 → N is a total function which is λ-defined
by H. And assume that for all (k1, · · · , kn), there is a k with p(k, k1,
· · · , kn) = 0.
Define the total function mp : Nn → N by
mp(k1, · · · , kn) := min{k | p(k, k1, · · · , kn) = 0} .
Then, if h(g) is the h produced from g in the statement of VII-5.2,
it is straightforward to see that mp = h(p) (zeron, π1, , πn),
◦ ·
where the last tuple of “n”-variable functions consists of the
zero constant function and the projections, all λ-definable. But
h(p) is λ-definable by VII-5.2. So, by closure under composition,
the proof is complete.

Let us now have a discussion of the λ-definability of various


simple func- tions, including the starters referred to in this last
proof.
It is a total triviality to check that the constant function,
mapping all numbers to zero, is defined by the term λx 0 . We

already know that the successor function is defined by s.
The ith projection function is not λ-defined by ith, because we
are not bothering to convert to genuine multi-valued functions,
defined on tuples, but rather treating them as successively
defined using adjointness. (See the initial paragraph of the
Digression in Subsection VII-7 for more explanation, if desired.)
There is a λ-calculus formalism for going back-and-forth between
the two viewpoints, called “currying” and “uncurrying”, very
handy for func- tional programming,
··· • I believe, but we don’t
need it here. The ith projection function is defined by λx1x2 xn
xi , as may be easily checked.

54
At this point, if we wish to use the starter functions just
above, the proof that every total recursive function is λ-definable
would be completed by showing that the set of λ-definable
functions is closed under primitive recursion. The latter is
simply the many-variable case of the curried version of VII-2.11,
and can be safely left to the reader as yet another (by now
mechanical) exercise with Y .

55
On the other hand, we don’t need to bother with primitive
recursion if we go back to our original definition of the recursive
functions, and also accept the theorem that a total recursive
function can be built from starters using only total functions at
all intermediate steps (as we did in this last proof). But we do
need ≥to check that the addition and multiplication functions, and
the ‘ ’-predicate, all of two variables, are λ-definable, since they
were the original starters along with the projections. These are
rather basic functions which everybody should see λ-defined. So
we’ll now do these examples of using the fixed point operator,
plus one showing how we could also have gotten a different
predecessor function this way. In each case we’ll write down a
suitable (informal) McSELF-procedure for the function
(copying from earlier), then use it to write down the λ-calculus
equation, convert that into a formula for what we’ve been
calling A, and the job is then done as usual with an application
of the fixed-point operator Y .
Here is that list of terms for basic functions mentioned just
above, where we’ve just stolen the informal McSELF procedures
from an earlier section.
The addition function. (See also the exercise before VII-2.8)
ADD(m, n) : if n = 0

then m
else ADD(m + 1, n − 1)
So the term add must satisfy

add m n ≈ (isz n)m(add(s m)(p n)) .


And so we take add := Y (λzxy • A) where A := (isz y)x(z(s x)(p y)) .

The multiplication function. (See also the exercise before VII-2.8)


MULT(m, n) : if n = 0

then 0
else ADD(m, MULT(m, n − 1))
So the term mult must satisfy
56
mult m n ≈ (isz n)0(add m(mult m(p n)) .
And so we take mult := Y (λzxy•A) where A := (isz y)0(add x(z x(p y)) .

57
The ‘greater than or equal to’ predicate.
GEQ(m, n) : if n = 0

then 1
else if m = 0
then 0
else GEQ(m − 1, n − 1)
So the term geq must satisfy

geq m n ≈ (isz n)1((isz m)0(geq(p m)(p n))) .


And so we take
geq := Y (λzxy • A)
wher
e A := (isz y)1((isz x)0(z(p x)(p
y))) .

This predicate takes values 0 and 1, rather than F and T . To


make that alteration will be left as an exercise.
The new, non-total, predecessor function.
We get the new predecessor function PRED (undefined at 0 and
otherwise mapping n to n − 1) by PRED(n) = PREPRED(0, n) ,
where
(
n−1 if m < n ;
PREPRED(m, n) =
err if m ≥ n .
The latter has McSELF-procedure
PREPRED(m, n) : if m + 1 = n

then m
else PREPRED(m + 1, n)
To distinguish them from the earlier, total, functions p and pp,
we’ll name the terms which represent these functions as q and
qq. Then the term qq must satisfy
take
And so we
58
q := Y (λzxy • A)
q

(
e
q
(
s

m
)
n
)
m
(
q
q
(
s

m
)

n
)

q
q

59
where
A := (eq(s x)y)x(z(s x) y) .

And now for the predecessor itself :


q n ≈ qq 0 n , so define q := λx • qq 0 x .
These are simpler than p and pp, aren’t they?

Now let’s get serious about partial recursive functions. The


theorem works for any numeral system with a few basic
properties, as may be seen by checking through the proofs above
and below. So that’s how it will be stated:

Theorem VII-5.5. For any choice of numeral system, that is, of terms
s, p, isz, and n for each n ≥ 0, which satisfy
s n ≈ n + 1 ; isz 0 ≈ T ; if n > 0 , then isz n ≈ F and p n ≈ n − 1 ,
every (partial) recursive function is λ-definable.
But we’ll just prove it for the earlier Church numeral
system, reviewed below. Were it not for needing a term which
‘does a loop’, i.e. has no normal form, in ‘all the right places’, we
would at this point have done what was necessary to prove this
theorem. For obtaining the needed result about the non-
existence of normal forms, it is surprising how many technicalities
(just below), and particularly what difficult syntactic theorems
(see the proof of the theorem a few pages ahead), seem to be
needed .
We begin with a list of new and old definitions, and then
four lemmas, before completing the proof.
Confession. Just below there is a slight change from an earlier definition, and
also a ‘mistake’—a small fib, if you like. We come clean on this
right at the end of this subsection, using sf print. This seems the
\
best way, and it doubles the profit, as we explain there. So most
readers, if they discover one or both of these two minor anomalies,
should just press on in good humour. But if that is a psychological
impossibility, go to the end of the subsection to get relief.
Define inductively, for Λ-terms X and Y :
60
X Y :=
(
m Y if m = 0 ;
X (XY )
m−1
if m > 0 .

61
Thus XmY = X(X(· · · · · · (X(XY )) · · ·)) = XC for C = Xm−1Y
. Recall the definitionm := λxy • xmy .
Note that xmy, despite appearances, doesn’t have the form Ay
for any Λ- term A, so we cannot apply (η)-reduction to

reduce this to λx xm . In fact, m is in normal form. (In both
cases, m = 1 is an exception.) We have

m A B ≥ A mB ,
as is easily checked for any terms A and B, though you do have
to check that (xmy)[x→A][y→B] = AmB , which is not entirely, though
almost, a no-brainer.
Recall that s := λxyz • (xy)(yz) . One can easily directly show that
s m ≥ m + 1 , though it follows from the earlier fact that sm ≈ m+1
,
since the right-hand side is in normal form. We have also

IX ≥ X and KXY ≥ X,
where now K := λxy • x is denoted just K .
Define D := λxyz • z(Ky)x . Then we have
DAB 0 ≥ 0(KB)A ≥ (KB)0A = A ,
and, for i > 0 ,

DAB i ≥ i(KB)A ≥ (KB)iA = KBC ≥ B , with C = (KB)i−1A .

Define
T := λx • D0(λuv • u(x(sv))u(sv)) ,
and then define, for any Λ-terms X and Y ,

PXY := TX(XY )(TX)Y .


If we only needed “ ” and not “ ” below, we could take P as an

actual term λxy Tx(xy)(Tx)y . Note that P will never occur on its

own, only in the form PXY .

62
Lemma A. For X and Y in Λ with XY ≥ i , we have

( Y if i = 0 ;
PXY PX(sY ) if i > 0 .

Also, in each case, the “ ” may be realized by a sequence of (β)-

reductions, the first of which obliterates the leftmost
occurrence of λ .
Proof. Using, in the first step, the definitions of PXY and of
T , and a leftmost (β)-reduction,

PXY ≥ D0(λuv•u(X(sv))u(sv))(XY )(TX)Y


≥ D0(λuv • u(X(sv))u(sv))i(TX)Y ··· (to be
continued) Now when i = 0 this continues

≥ 0(TX)Y [since DAB 0 ≥ A]


≥ (TX) 0Y = Y .
When i > 0 it continues

≥ (λuv • u(X(sv))u(sv))(TX)Y [since DAB i ≥ B]


≥ TX(X(s Y ))(TX)(sY ) = PX(s Y ) ,
as required, completing the proof.

Now suppose that g : N N and h : Nn+1 N



are total functions, and that G, H in Λ are such that

Gj ≥ g(j) and Hk1k2 · · · knk ≥ h(k1, k2, · · · kn, k)


for all j, k1, k2, · · · kn, k in N. Define f : dom(f ) → N by
f (k1, k2, · · · kn) := g(min{ k | h(k1, k2, · · · kn, k) = 0 }) ,

so

dom(f ) = { (k1, k2, · · · kn) | Ik with h(k1, k2, · · · , kn, k) = 0 } .

63
Lemma B. Given m such that h(k1, k2,kn, l) 0 for 0 ≤ l < m , we
have, for these l, ·
(i) P (Hk1k2 k )l P (Hk1k2 k )l + 1 ,
··· n ≥ · n
where each such “” may be realized by a sequence of (β)-

reductions at least one of which obliterates the leftmost
occurrence of λ ; and
(ii) P (Hk1k2 · · · kn)0 ≥ P (Hk1k2 · · · kn)m .
Proof. (ii) is clearly immediate from (i). As for the latter, with
X = Hk1k2 · · · kn and Y = l, we have
XY = Hk1k2 · · · knl ≥ h(k1, k2, · · · kn, l) = (say) i ,
where i > 0 . And so

P (Hk1k2 · · · kn)l ≥ P (Hk1k2 · · · kn)(sl) by Lemma A (second part)


≥ P (Hk1k2 · · · kn)l + 1 since sl ≥ l + 1 .

Lemma C. If (k1, k2, · · · kn) is such that there is a k with


h(k1, k2, · · · kn, k) = 0 , and we define
m := min{ k | h(k1, k2, · · · kn, k) = 0 } ,
then P (Hk1k2 · · · kn)0 ≥ m .
Proof. Using Lemma B(ii), and then using Lemma A (first part) with
X = Hk1k2 · · · kn and Y = m, which can be done since
XY = Hk1k2 · · · knm ≥ h(k1, k2, · · · kn, m) = 0,

we
get P (Hk1k2 · · · kn)0 ≥ P (Hk1k2 · · · kn)m ≥ m .

64
Main Lemma D. (a) Let

L := λx1 · · · xny • P (Hx1 · · · xn)y and F := λx1 · · · xn • G(Lx1 · · · xn0) .

Then

∀(k1, · · · , kn) ∈ dom(f ) we have Fk 1 k 2 · · · kn ≥ f (k1, k2, · · · kn) (∗)


(b) For any F satisfying (∗), define
E := λx1 · · · xn • P (Hx1 · · · xn)0 I(Fx 1 · · · xn) .
Then (i) the reduction (∗) also holds with E in place of F ; and
(ii) for all (k1, · · · , kn) /∈ dom(f ), there is an infinite sequence
of (β)- reductions, starting with Ek1k2 kn , and such that,
infinitely often, it is ·
the leftmost λ which is obliterated.
Proof. (a) We have

Fk 1 k 2 · · · kn ≥ G(Lk1k2 · · · kn0) [definition of F ]


≥ G(P (Hk1k2 · · · kn)0) [definition of L]
≥ Gm [where m = min{ k | h(k1, k2, · · · kn, k) = 0 } by Lemma C]
≥ g(m) = g(min{ k | h(k1, k2, · · · kn, k) = 0 }) = f (k1, k2, · · · kn) .
(b)(i) We have

Ek1k2 · · · kn ≥ P (Hk1k2 · · · kn)0I(Fk1k2 · · · kn) [definition of E]


≥ mI(Fk 1 k 2 · · · kn) [m as above, by Lemma C]
m
≥ I (Fk
m
1 k 2 · · · kn) [since mXY ≥ X Y ]
≥ Fk 1 k 2 · · · kn [since IX ≥ X]
≥ f (k1, k2, · · · kn) [by (∗) for F ] .

6
(b)(ii) We have

Ek1k2 · · · kn ≥ P (Hk1k2 · · · kn) 0 I(Fk 1 k 2 · · · kn) [definition of E]


≥ P (Hk1k2 · · · kn) 1 I(Fk 1 k 2 · · · kn) [repeatedly applying
≥ P (Hk1k2 · · · kn) 2 I(Fk 1 k 2 · · · kn) Lemma B(i), and
≥ P (Hk1k2 · · · kn) 3 I(Fk 1 k 2 · · · kn) noting that each step
≥ P (Hk1k2 · · · kn) 4 I(Fk 1 k 2 · · · kn) here is doable with
≥ P (Hk1k2 · · · kn) 5 I(Fk 1 k 2 · · · kn) at least one leftmost
≥ P (Hk1k2 · · · kn) 6 I(Fk 1 k 2 · · · kn) (β)−reduction, giving
≥ P (Hk1k2 · · · kn) 7 I(Fk 1 k 2 · · · kn) infinitely many,
≥ P (Hk1k2 · · · kn) 8 I(Fk 1 k 2 · · · kn) • • • as
required.] Proof of Theorem VII-5.5—Any (partial) recursive
function is λ-definable. Let f be such a function, and use Kleene’s
theorem (see [CM], Section IV)
to write

f (k1, k2, · · · kn) = g(min{ k | h(k1, k2, · · · kn, k) = 0 }) ,

for primitive recursive functions g and h. These two are, in particular, total
recursive functions. So we can, by VII-5.4, find G and H in Λ such that

Gj ≈ g(j) and Hk1k2 · · · knk ≈ h(k1, k2, · · · kn, k)

for all j, k1, k2, kn, k in N. But since the right-hand sides are
·
already in normal form, by the normal form theorem we can

change “ ” to “ ” in both places above. But now, using parts
of Lemma D, first get an F as in
part (a), and then use it to get an E as in part (b). Part (b)(i)
gives the “≈” equation required for (k1, · · · , kn) ∈ dom(f ) (and
more, since it even gives a specific way to reduce Ek1k2 kn to f
(k1, k2, kn), dependent on having · · ·similar reductions
· for G and H
—see the exercises below as well). For (k1, · · · , kn) /∈ dom(f ), part
(b)(ii) gives an infinite reduction sequence
starting with Ek1k2 kn , and such that, infinitely often, it is the leftmost
·
λ which is obliterated. Thus it is a “quasi-leftmost reduction” in the sense of
[Kl], because of leftmost (β)-reductions occurring arbitrarily far out. Thus,

6
by Cor 5.13, p.293 of that thesis, Ek1k2 kn has no normal form (in the
·
extensional λ-calculus), as required.

It seems interesting how much is involved in this proof. I’ve not


seen any essentially different proof; in fact I’ve not seen any
other detailed proof at all in the extensional case. This one is
essentially that in [HS], except they deal with the intensional λ-
calculus. And they are acknowledged experts. One can well
imagine manoeuvering to avoid Kleene’s theorem. But Klop’s
quasi-leftmost normalization theorem seems to be essential,
and its proof apparently dates from only around 1980. It is stated
as a problem by Hindley [Hi] in 1978. Furthermore, it seems
always to be the extensional λ-calculus with which the CSers
work in hard-core denotational semantics. I would assume that
a definition of “computable” phrased in terms of the extensional λ-
calculus was one which was long accepted before 1980. This
may be a situation somewhat analogous to that with the
Littlewood-Richardson rule from algebraic
combinatorics/representation theory. Those two only stated the
rule in the 1930’s as an empirically observed phenomenon. A
proof or proofs with large gaps appeared long before 1970, but
only around then did acceptable proofs begin to appear. A joke
I heard from Ian Macdonald has astronauts landing on the
moon and (fortunately) returning, by use of technology
dependent on the Littlewood-Richardson rule, apparently well
before a genuine proof was known!
In any case, let me own up again—we have, in this write-up, depended on
three basic, difficult syntactic properties of the extensional λ-calculus
(unproved here) :
the Church-Rosser theorem and its consequence that
normal forms are unique;
the normalization theorem, that when M has a normal form
N , the se- quence of leftmost reductions starting with M
terminates with N ;
and Klop’s theorem above, implying that, if an infinite quasi-
leftmost reduction starting from M exists, then M has no normal
form (which theorem also says that we can weaken the word

6
“leftmost” to “quasi-leftmost” in the preceding statement about
those M ’s which do have normal forms).

Coming clean. The minor change, from earlier definitions in VII-2,


made just after the “Confession”, is that here in VII-4 we defined
XmY a bit differently. The new definition agrees with the old up to “≈”,
so everything stated earlier still

6
holds using the new definition. Since they are defined using xmy, the
numerals m have now been changed slightly, so there are actually
quite a few earlier results to which that comment applies. It just
seemed to the author that in VII-2 it would avoid fuss if Xm
actually had a meaning on its own as a term. From the
“Confession” onwards, that definition should be ignored—Xm never
occurs without being followed by a term Y .
The mistake referred to in the “Confession” is that, in the
extensional λ- calculus, though all the other numerals are, the
numeral 1 is NOT in normal form, despite the claim. It can be (η)-
reduced to the term we are denoting as I. There is only one place
where this affects the proofs above, as indicated in the paragraph
after next.
But first let’s double our money. The reader may have
noticed that (η)- reduction never comes in anywhere in these
proofs—it’s always (β)-reduction. Furthermore, the numeral 1, in
the intensional λ-calculus, IS in normal form. So, using the
corresponding three syntactic theorems from just a few paragraphs
above (whose proofs are actually somewhat easier in the
intensional case), we have a perfectly good and standard proof that
every recursive function is λ- definable in the intensional λ-calculus.
The only place where the non-normality of the numeral 1 is
problematic, in the proofs above for the extensional λ-calculus, is in
the proof of VII-5.5, where we go from asserting “ ” to asserting “
”, when it happens ≈ that the right- hand side is 1. But just use the
above result for the intensional λ-calculus to see that we can get
defining terms for which these reductions exist. So that takes care
of the small anomaly (and it did seem simplest to ignore it and to
be a bit intellectually dishonest until now).

Sketch of the proof that λ-definable =⇒ RC.


This can be done following the pattern which we used much
earlier in [CM] to show BC =⇒ RC (i.e. Babbage computable
implies recursive). Let A ∈ Λ . Define functions fn,A : Dn,A → N as
follows:
Dn,A := { →ν | Aν1 · · · νn ≈ l for some l ∈ N } ⊂
Nn ; and since, by Church-Rosser, such an l is unique
(given →ν ) , define
6
fn,A (→ν) := l .

7
Every λ-definable function has the form fn,A for some (n, A)
[though the function fn,A itself might not actually be definable
using the term A, if A is not chosen so that Aν1 · · · νn “loops” for
all →ν /∈ Dn,A ] . However, we prove that all fn,A are recursive.
In fact, as in the case of -computable functions (but using
B
lower-case names so as not to confuse things), one is able to
write

f n,A (→ν) = print(min{h | kln(gA, →ν , h) = 1 }) .


The functions kln and print will be defined below in such a way
that it is manifestly believable that they are primitive recursive.
This will use G¨odel numbering for terms in Λ ; and gA above is
the G¨odel number of A . To change “believable” to true is a tedious
but unsubtle process, analogous to much earlier material,
especially the proof of primitive recursiveness for KLN and PRINT .
The proof here for kln and print will be omitted. Via the
primitive recursiveness of kln and print, the recursiveness of fn,A
follows immediately from the above display. [Note that this will
also give a new proof of Kleene’s normal form theorem for
recursive functions, using the λ-calculus in place of ATEN.]
So here at least are the needed definitions. The argument
using the definitions below works in the intensional λ-calculus,
just as well as in our case, the extensional λ-calculus.
As for G¨odel numbering of terms, define, somewhat
arbitrarily, using the coding of finite strings of natural numbers
from the Appendix before IV-3 in [CM] :

Go¨d(x i ) =< 1, i > ; Go¨d(AB) =< 2, Go¨dA, Go¨dB > ; Go¨d(λxi •A) =< 3, i,
Go¨dA > .
Now define kln to be the relation for which

kln(g, →ν , h) = 1 ⇐⇒ IA ∈ Λ satisfying the following :


(i) g = Go¨d(A) ;
(ii) Aν1 · · · νn has a finite leftmost reduction leading to a term of the form l;

7
(iii) h is the code of the history of that leftmost
reduction : that is, if the reduction is

Aν1 · · · νn = B1 ≥ B2 ≥ B3 ≥ · · · ≥ Bk = l ,
then h =< b1, , bk >, where bi = Go¨d(B i ) .
·
Finally print : N N may be suitably defined (see the next

exercise) so that
print(< b1, · · · , bk >) = l if bk = Go¨d(l) .

Exercises to define print.


Take l explicitly to be λx2 • (λx1 • x lx1).
2
(i) Show that

Go¨d(3) = < 3, 2, < 3, 1, < 2, < 1, 2 >, < 2, < 1, 2 >, < 2, < 1, 2 >, < 1, 1
>>>>>> .

(ii) Find a primitive recursive function Φ so that Φ(< 1, 1 >)


= 0 and Φ(< a, b, x >) = 1 + Φ(x) for all a, b and x.
(iii) Define Ψ(k) := Φ(CLV (CLV (k, 3), 3)) . Prove : ∀l, Ψ(Go¨d(l)) = l .
(iv) Defining
print(x) := Ψ(CLV (x, CLV (x, 0))) ,
check that you have a primitive recursive function with the
property displayed just before the exercise.

7
Both the λ-calculus and the theory of combinators were originally devel-
oped as foundations for mathematics before digital computers were
invented. They languished as obscure branches of mathematical logic until
rediscov- ered by computer scientists. It is remarkable that a theory
developed by logicians has inspired the design of both the hardware and
software for a new generation of computers. There is an important lesson
here for peo- ple who advocate reducing support for ‘pure’ research: the
pure research of today defines the applied research of tomorrow.



David Turner proposed that Sch¨onfinkel and Curry’s combinators
·
could be used as machine code for computers for executing functional pro-
gramming languages. Such computers could exploit mathematical properties
of the λ-calculus · · ·



We thus see that an obscure branch of mathematical logic underlies im-
portant developments in programming language theory, such as:
(i) The study of fundamental questions of computation.
(ii) The design of programming languages.
(iii) The semantics of programming languages.
(iv) The architecture of computers.

Michael Gordon [Go1]

7
VII-6 Combinatorial Completeness
and the Invasion of the Combinators.
Let Ω be a set with a binary operation, written as
juxtaposition. We shall be discussing such objects a lot, and
that’s what we’ll call them, rather than some subname of hemi-
demi-semi-quasiloopoid/groupoid/algebraoid, or indeed
applicative set, though the latter is suggestive of what we are
n
trying to understand here. The set ΩΩ consists of all functions

of “n” variables from Ω, taking values in Ω, i.e. functions Ω n Ω .
It has a rather small subset consisting of those functions
‘algebraically definable just using the operation’—for example,
(ω1, ω2, ω3) '→ (ω2((ω3ω3)(νω1)))ω2 ,
where ν is some fixed element of Ω. [We’ll express this precisely
below.] The binary operation Ω is said to be combinatorially
complete if and only if every such “algebraically definable”
function can be given as left multiplication by at least one
element of Ω (i.e. ‘is representable’). For example, there would
have to be an element ζ such that, for all (ω1, ω2, ω3) we have

ζω1ω2ω3 = (ω2((ω3ω3)(νω1)))ω2 .

We are using the usual convention here on the left, that is,

ζω1ω2ω3 := ((ζω1)ω2)ω3 .
n
Note that the cardinality of ΩΩ is strictly bigger than that of Ω
except in the trivial case that Ω has only one element, so we
cannot expect this definition to lead anywhere without some
restriction on which functions are supposed to be realizable by
elements ζ .
We shall show quite easily in the next section that Λ/ is

combinatori- ally complete, where the binary operation is the one
inherited from Λ. Then we introduce an ostensibly simpler set Γ,
basically the combinators extended with variables. This produces
the original combinatorially
∼ complete binary operation, namely
Γ/ , due to Schonfinkel, who invented the idea. He didn’t prove
it this way, since the λ-calculus hadn’t been invented yet, but we
≈ 7
do it in the second section below by showing that the two
binary operations Λ/ and Γ/ are isomorphic. So we don’t
really have two different examples here, but the new description
using Γ is simpler in many respects.

7
Reduction algorithms for “ ” in Γ have actually been used in

the design of computers, when the desire is for a machine well
adapted to functional programming languages [Go1].
Combinatorial Completeness of Λ/ ≈ .
First let’s give a proper treatment of which functions we are
talking about, that is, which are algebraically definable. The
definition below is rather for- mal. It makes proving facts about
these functions easy by induction on that inductive definition.
However, for those not hidebound by some construc- tivist
philosophy, there are simpler ways of giving the definition—“. .
. the smallest set closed under pointwise multiplication of
functions, and contain- ing all the projections and all constant
functions. . . ”. See the axiomatic construction of combinatorially
complete binary operations near the begin- ning of Subsection
VII-8 for more on this.
If Θ is any set, then the free binary operation generated by
Θ, where we use to denote the operation, is the

smallest set, FREE(Θ), of ∪non-empty
{ ∗ finite strings of symbols
}
from Θ (, , ) (a disjoint union of course!)
such that each
'→ ∗element of Θ is such a (length 1)∗string, ∗ ∗ and ∗
the
set is closed under (g, h) (g h) . So it consists of strings
such as ((θ3 ((θ1 θ2) θ4)) θ3). Now suppose given a binary
operation on Ω as in the introduction, so
here the operation is denoted by juxtaposing. Let {ν1, ν2, · · ·} be a
sequence of ‘variables’, all distinct, and disjoint from Ω, and,
for each n ≥ 1, take Θ to be the set Ω ν1, ν2, , νn .
Now define a ∪ function
{ ··· }
n
FUN = FUN n : FREE(Ω ∪ {ν1, ν2, · · · , νn}) → ΩΩ

by mapping each element of Ω to the corresponding constant


function; map- ping each νi to the ith projection function; and
finally requiring that

FUN (A ∗ B) = FUN (A)FUN (B) .


On the right-hand side, we’re using the operation in Ω . (So
actually FUN is the unique morphism of binary operations
7
which acts as specified on the length 1 strings, the generators
of the free binary operation, and where the binary operation in
n
ΩΩ is pointwise multiplication using the given opera- tion in Ω.)
Then the functions algebraically definable using that operation are

7
defined to be those which are in the image of the FUN n for

some n 1 . In the example beginning this section, the function
given is
FUN 3 [ (ν2 ∗ ((ν3 ∗ ν3) ∗ (ν ∗ ν1))) ∗ ν2 ] .
So let’s formalize the earlier definition:
Definition. The binary operation Ω is combinatorially complete if
and only if, for all n and for all f ∈ FREE(Ω ∪ { ν1 , ν2 , · · · , νn
}) , there is a ζ ∈ Ω such that, for all (ω1, ω2, · · · , ωn) we have
ζω1ω2 · · · ωn = FUN n (f )(ω1, ω2, · · · , ωn) .

Theorem VII-6.1. The binary operation on Λ/ , induced by



the juxtaposition operation on Λ, is combinatorially complete.
Proof. Fix n, and proceed by induction on f in the definition
above. The initial cases require us to find ζ’s which work for the
constant functions and for the projections, something which we
did in the last section.∗For the inductive step, let f = (g h) ,
where we assume that we have ζg and ζh which work for g and h
respectively. But now the right-hand side in the definition
above is just

FUN N (g ∗ h)(ω1, ω2, · · · , ωn) =


FUN N (g)(ω1, ω2, · · · , ω n )FUN N (h)(ω1, ω2, · · · , ωn) =
(ζgω1ω2 · · · ωn)(ζhω1ω2 · · · ωn) .
So we must be able to solve the following equation for the unknown ζ, where
ζg and ζh are fixed, and the equation should hold for all (ω1, ω2, · · · , ωn) :
ζω1ω2 · · · ωn = (ζgω1ω2 · · · ωn)(ζhω1ω2 · · · ωn) .
Perhaps your initial reaction is to do this as just another
application of the fixed-point operator; and that will work
perfectly well, but is overkill, since the unknown occurs exactly
zero times on the right-hand side. A straight- forward solution
is simply
ζ = λx1x2 · · · xn • (ζgx1x2 · · · xn)(ζhx1x2 · · · xn) .
7
This is a bit sloppy, confusing work in Λ with work in Λ/ ≈ . We
should have said that ζ is the equivalence class of Z, where, if Zg
and Zh are elements in the equivalence classes ζg and ζh ,
respectively, we define
Z = λx1x2 · · · xn • (Zgx1x2 · · · xn)(Zhx1x2 · · · xn) .
Verification is a straightforward λ-calculation, which is a bit
quicker if you apply VII-1.2 .

Combinators.
Let S and K be two distinct symbols, disjoint from our
original sequence x1, x2, · · · of distinct variables (which, though it
won’t come up directly, are quite distinct from the variables νi of
the last section).
Definition. The binary operation Γ is written as
juxtaposition, and is defined to be the free binary operation
{ ··
generated by the set S, K, x1, x2, . Its elements are called by
various names : combinatory terms [S¨o][Ko], com- binations [S¨o],
CL-terms [Ba], combinatorial expressions [Go]. We won’t need a
name for them as individuals. The combinators are the elements
{ }
of the subset of Γ which is the free binary operation generated
by S, K . Thus
combinators are just suitable strings of S’s and K’s and lots of
brackets. We shall again use the convention that in Γ, the
string P1P2P3 · · · Pn really means ((· · · ((P1P2)P3) )Pn) .
Finally, let ∼ be the equivalence relation on Γ generated by the following
conditions, where we are requiring them to hold for all elements A, B, A1,
etc. in Γ :
SABC ∼ (AC)(BC) ; KAB ∼ A ;
A1 ∼ B1 and A2 ∼ B2 =⇒ A1A2 ∼ B1B2 ;
an
d
Ax ∼ Bx =⇒ A ∼ B , if the variable x appears in neither A nor B .
Remarks. (i) See the proof of VII-6.12 below for a more
explicit de- scription of .

7
(ii) Don’t be fooled by the ‘argument’, that the last
(extensionality) con- dition is redundant, which tells you to just
multiply on the left by K and apply the second and third
conditions : the point is that K(Ax) is a lot different than KAx.

8
(iii) The requirement that x isn’t in A or B cannot be removed
from the last condition defining . For we have K(xx)x xx,
~ ∼
but K(xx) x. If the latter was false, it would follow from the
results below ∼ in several ways that K(BB) B for any B.
Taking
~ B = I = SKK, it would follow that KI I, and then
multiplying on the ∼ right by A, that I A for any A. This would
show that relates any ~ two elements. That contradicts the main
result below that the -classes are in 1-1 correspondence ≈ with
the -classes from Λ : The Church-Rosser theorem tells us that
there are tons of -classes.
(iv) Apologies to the λ-experts for avoiding a number of arcane
subtleties, both here and previously. In particular, whether to
assume that last condi- tion, or something weaker such as
nothing at all, occupies a good deal of the literature. This is the
great intensionality versus extensionality debate.
To make the ascent from mathematics to logic is to pass from the
object language to the metalanguage, or, as it might be said without
jargon, to stop toying around and start believing in something . . . A
function could be a scheme for a type of process which would become
definite when presented with an argument . . . Two functions that are
extensionally the same might be ‘computed’, however, by quite different
processes . . . The mixture of abstract objects needed would obviously have
to be very rich, and I worry that it is a quicksand for foundational studies
. . . Maybe after sufficient trial and error we can come to agree that
intensions have to be believed in, not just reconstructed, but I have not
yet been able to reach that higher state of mind.
Dana Scott ([SAJM-ed], pp. 157-162)

But, for the intended audience here, that debate is simply an unnecessary
complication on first acquaintance with the subject, I believe.
Theorem VII-6.2. The binary operation on Γ/ , induced by

the juxtaposition operation on Γ, is combinatorially complete.
More examples arising from combinatorial completeness.
The top half of the following table is already familiar. On all
lines, that the λ-version gives the effect is immediate. The
reader might enjoy the following exercises :
(1) Show that the definition gives the effect; that is, work
directly with combinator identities rather than the λ-calculus.

8
(2) Use the λ-versions of the ingredients in the definition
column, and reduce that λ-term to normal form, which should
be the entry in the 3rd column, up to re-naming bound
variables.

8
It should be understood that the variables in each entry of
the third column are distinct from each other.

combinator definitio normal λ-version effec


n t
K λxy • Kαβ ≥ α
x
S λxyz•(xz)(yz) Sαβγ ≥ (αγ)(βγ)
I SKK λx • x Iα ≥ α
B S(KS)K λxyz•x(yz) Bαβγ ≥ α(βγ)
W SS(KI) λxy•xyy Wαβ ≥ αββ
C S(BBS)(KK) λxyz•xzy Cαβγ ≥ αγβ
G B(BS)B λxyzw•x(yw)(zw) Gαβγδ ≥ α(βδ)(γδ)
E B(BW (BC))(BB(BB)) λxyzw•x(yz)(yw) Eαβγδ ≥ α(βγ)
(βδ)

Exercises. (i) Find a combinator whose normal λ-version is λxy yx .



(i) Try to find a combinator whose normal λ-version is the term

ADD = λuνxy • (ux)(νxy) .

Recall that ADD k l = k + l , from the exercise before VII-2.8,


where it had a different name. Referring to the second part of
that exercise, find also a combinator for multiplication.

As mentioned earlier, the first proof of VII-6.2 will not be


direct. Rather we’ll show that Γ/ is isomorphic to Λ/ . That is,
~
there is a bijective function between them which preserves the
binary operations. Since combi- natorial completeness is clearly
invariant up to isomorphism, by VII-6.1 this is all we need (and of
course it gives us the extra information that we haven’t, strictly
speaking, produced a new example of combinatorial completeness).
To do this we need only define functions

Ψ : Γ → Λ and Φ:Λ→
Γ such that the following five results hold :

8
VII-6.3. If P ∼ Q then Ψ(P ) ≈ Ψ(Q).
VII-6.4. If A ≈ B then Φ(A) ∼ Φ(B).
VII-6.5. For all P ∈ Γ, we have ΦΨ(P ) ∼ P .
VII-6.6. For all A ∈ Λ, we have ΨΦ(A) ≈ A.
VII-6.7. For all P and Q in Γ, we have Ψ(PQ) ≈ Ψ(P )Ψ(Q).
That this suffices is elementary general mathematics which
the reader can work out if necessary. The first two give maps
back and forth between the sets of equivalence classes, and the
second two show that those maps are inverse to each other. The
last one assures us that the maps are morphisms.
Definition of Ψ. Define it to be the unique morphism of
binary opera- tions which maps generators as follows : all the
variables go to themselves; K goes to K := T := λxy • x; and S
goes to S := λxyz • (xz)(yz).
Remarks. (i) Since Ψ, by definition, preserves the
operations, there is nothing more needed to prove VII-6.7
(which is understated—it holds with
= , not just .

(ii) The sub-binary operation of Λ generated by S , K and all
the variables is in fact freely generated by them. This is
equivalent to the fact that Ψ is injective. But we won’t dwell on
this or prove it, since it seems not to be useful in establishing
that the map induced by Ψ on equivalence classes is injective.
But for concreteness, the reader may prefer to identify Γ with
that subset of Λ, namely the image of Ψ . So you can think of
the combinators as certain kinds of closed λ-expressions,
closed in the sense of having no free variables.
The main job is to figure out how to simulate, in Γ, the
abstraction operator in Λ.
Definition of µx • in Γ . For each variable x and each P ∈ Γ, define
µx • P as follows, by induction on the structure of P :
If P is an atom other than x, define µx • P := KP .
Define µx • x := SKK := I .
Define µx • (QR) := µx • QR := S(µx • Q)(µx • R) .

8
Definition of Φ . This is again inductive, beginning with the atoms,
i.e. variables x :
Φ(x) := x ; Φ(AB) := Φ(A)Φ(B) ; Φ(λx • A) := µx • Φ(A) .
It should be (but seldom is) pointed out that the correctness of
this definition depends on unique readability of the strings
which make up the set Λ .
The first result is the mirror image of the last part of that definition.

VII-6.8. For all P ∈ Γ, we have Ψ(µx • P ) ≈ λx • Ψ(P ) .


Proof. Proceed by induction on P .
When P is the variable x, the left-hand side is
Ψ(µx • x) = Ψ(SKK) = S K K ;
whereas the right-hand side is λx x . When applied to an

arbitrary B Λ these two ‘agree’ (as suffices even with B just a
variable):
S K K B ≈ K B (K B) ≈ B ≈ (λx • x)B .
The middle is just VII-2.1(a), since K = T , and the first one

is VII-2.5. [Notice the unimpressive fact that we could have
defined I to be SKZ for any Z Γ .]

When P is an atom other than x, the left-hand

side is Ψ(KP ) = Ψ(K)Ψ(P ) =

K Ψ(P ) .

But if x is not free in A [and it certainly isn’t when we take A =


Ψ(P ) here], for any B we have
K A B ≈ A ≈ (λx • A)B .
Finally, when P is QR, where the result is assumed for Q
and R, the left-hand side is
Ψ(S(µx•Q)(µx•R)) = S Ψ(µx•Q)Ψ(µx•R) ≈ S (λx•Ψ(Q))(λx•Ψ(R)) .
The right-hand side is λx Ψ(Q)Ψ(R) . Applying these to suitable

B yields respectively (use VII-2.5 again) :
8
Ψ(Q)[x→B]Ψ(R)[x→B] and (Ψ(Q)Ψ(R))[x→B] ,

8
so they agree, completing the proof.

Any λ-expert reading the previous proof and next result will
possibly find them irritating. We have made extensive use of
extensionality in the last proof. But the results actually hold
(and are very important in more encyclopaedic
≈ versions of this
subject) with ‘ ’ replaced by ‘ in’, where the latter is defined
using only rules (α) and (β), and congruence [that is, drop rule
(η)]. So an exercise for the reader is to find proofs of these slightly
more delicate facts.

Proof of VII-6.6. Proceed by induction on the structure


of A : When A is a variable x, we have ΨΦ(x) = Ψ(x) = x .
When A = BC, where the result holds for B and C,
ΨΦ(BC) = Ψ(Φ(B)Φ(C)) = ΨΦ(B)ΨΦ(C) ≈ BC .
Finally, when A = λx • B, where the result holds for B, using
VII-6.8, ΨΦ(λx • B) = Ψ(µx • Φ(B)) ≈ λx • ΨΦ(B) ≈ λx •
B ,
as required.

Proof of (most of) VII-6.5. Proceed by induction on the structure of


P : The inductive step is trivial, as in the second case
above. When A is a variable, this is also trivial, as in the
first case above. When A = K, it is a straightforward
calculation : For any B,
ΦΨ(K)B = Φ(K)B = Φ(λx • (λy • x))B = (µx • (µy • Φx))B
= (µx • Kx)B = S(µx • K)(µx • x)B = S(KK)IB ∼
KKB(IB) ∼ K(SKKB) ∼ K(KB(KB)) ∼ KB .
Taking B to be a variable, we get ΦΨ(K) K , as required.

When A = S, there is undoubtedly a similar, but very messy,
calculation. But the author doubts whether he will ever have
the energy to write it out, or, having done so, much confidence
that it is free of error. So we postpone the remainder of the
8
proof until just after that of VII-6.13 below. But we have the
good luck that nothing between here and there depends on
the

8
present result, VII-6.5. [The method there also gives an
alternative to the above calculation for showing ΦΨ(K) ∼ K .]

For λ-experts we are avoiding some delicate issues here by


imposing ex- tensionality. A quick calculation shows that
Φ(Ψ(K)) = S(KK)I —in fact, just drop the B in the first two
displayed lines above to see this—but S(KK)I and K∼are not
related under
∼ the equivalence relation ‘ in’ defined as with ‘ ’
except that extensionality (the last condition) is dropped. One
needs a version of the Church-Rosser theorem to prove this.

Proof of VII-6.3. By the definition of , and basics on



equivalence relations (see the proof of VII-6.12 ahead for what
I mean by this, if neces- sary), it suffices to prove one fact for
each of the four conditions generating
the relation ∼ , those facts being
Ψ(SABC) ≈ Ψ(AC(BC)) ; Ψ(KAB) ≈ Ψ(A) ;
Ψ(A1) ≈ Ψ(B1) and Ψ(A2) ≈ Ψ(B2) =⇒ Ψ(A1A2) ≈ Ψ(B1B2) ;
and

Ψ(Ax) ≈ Ψ(Bx) ⇒ Ψ(A) ≈ Ψ(B), if the variable x appears in neither A nor B.

The latter two are almost trivial, and the first two are simple
consequences of earlier identities : Using VII-2.1(a),

Ψ(KAB) = Ψ(K)Ψ(A)Ψ(B) = T Ψ(A)Ψ(B) ≈ Ψ(A) ,

as required. Using VII-2.5,

Ψ(SABC) = Ψ(S)Ψ(A)Ψ(B)Ψ(C) = S Ψ(A)Ψ(B)Ψ(C) ≈


Ψ(A)Ψ(C)(Ψ(B)Ψ(C)) = Ψ(AC(BC)) ,
as required.

The proof of VII-6.4 seems to be somewhat more involved,


apparently needing the following results.

8
VII-6.9. If x isn’t in P, then µx • P ∼ KP .
Proof. Proceeding by induction on P , the atomic case
doesn’t include P = x, so we get equality, not just . The

inductive step goes as follows : For any T ,

(µx•QR)T = S(µx•Q)(µx•R)T ∼ S(KQ)(KR)T ∼


KQT (KRT ) ∼ QR ∼ K(QR)T ,
and so, µx • QR ∼ K(QR) , as required.

VII-6.10. If P ∼ Q then µx • P ∼ µx • Q.
Proof. By the definition of , and basics on equivalence

relations, it suffices to prove one fact for each of the four
conditions generating the relation ∼ , those facts being
µx • SABC ∼ µx • AC(BC) ; µx • KAB ∼ µx • A ;
µx • A1 ∼ µx • B1 and µx • A2 ∼ µx • B2 =⇒ µx
• A1A2 ∼ µx • B1B2 ; and finally, if the variable z appears in
neither A nor B,

µx • Az ∼ µx • Bz =⇒ µx • A ∼ µx • B .
However we first eliminate need to deal with the last of
these when z = x by proving the case of the entire result where
x is not in P or Q. In that case, using VII-6.9,

µx • P ∼ KP ∼ KQ ∼ µx • Q .
As for the last fact when z /= x, for any R we have
(µx • Az)R = S(µx • A)(µx • z)R ∼ (µx • A)R((µx • z)R) ∼
(µx•A)R(KzR) ∼ (µx•A)Rz .
The same goes with B replacing A, so taking R as a variable
different from z and which is not in A or B, we cancel twice to
get the result.
For the second fact above, we have, for any C,

9
(µx • KAB)C = S(µx • KA)(µx • B)C ∼ (µx • KA)C((µx • B)C) ∼
S(µx • K)(µx • A)C((µx • B)C) ∼ (µx • K)C((µx • A)C)((µx • B)C) ∼
KKC((µx • A)C)((µx • B)C) ∼ K((µx • A)C)((µx • B)C) ∼ (µx • A)C ,
as suffices.
The first one is similar but
messier. The third fact is
quick:

µx • A1A2 = S(µx • A1)(µx • A2) ∼ S(µx • B1)(µx • B2) = µx • B1B2 .

VII-6.11. The variables occurring in µx P are exactly those



other than x which occur in P .
Proof. This is just a book-keeping exercise, by induction on P .

VII-6.12. R ∼ P implies R[x→Q] ∼ P [x→Q]


Proof. A basis for this and a couple of earlier proofs is the
following explicit description of the relation ‘ ’ :

RP there is a finite sequence of ordered pairs of
~
elements of Γ, whose last term is the pair (R, P ), and each of
whose terms has at least one of the following seven properties.
It is :
(1) (A, A), for some A ; or
(2) (B, A), where (A, B) occurs earlier in the sequence; or
(3) (A, C), where both (A, B) and (B, C) occur earlier for some B; or
(4) (KAB , A), for some A and B ; or
(5) (SABC , (AC)(BC)), for some A, B and C ; or
(6) (A1A2 , B1B2), where both (A1, B1) and (A2, B2) occur earlier; or
(7) (A, B), where (Az, Bz) occurs earlier in the sequence, and
where z is some variable not occurring in A or B.
[This is true because the condition above does define an
equivalence rela- tion which satisfies the conditions defining ‘ ’ ,

and any pair that is related under any relation which ∼ satisfies
the conditions defining ‘ ’, is also neces- sarily related under
this relation just above.]
9
Call any sequence of ordered pairs as above a verification for R ∼ P .

9
Now proceed by contradiction, assuming that R P has a

shortest pos- sible verification among all pairs for which the
result fails for some x and some Q—shortest in the sense of
sequence length.
Then the pair (R, P ) cannot have any of the forms as in (1) to
(5) with respect to its “shortest verification”, because (R[x→Q], P
[x→Q]
) would have the same form (with respect to another
verification in the case of (2) and (3), by
“shortest”).
It also cannot have the form in (6), since (again by “shortest”) we could
concatenate verifications for ~ B[x→Q] and [x→Q] 2
~ [x→Q] to prove
B 2
1 1 A
[x→Q]
A
that (A1A2)[x→Q] ∼ (B1B2)[x→Q] .
So (R, P ) must have the form in (7), i.e. we have (R, P ) = (A, B) where
(Az, Bz) occurs earlier in that shortest sequence, for some z not
in A or B; and yet A[x→Q] /∼ B[x→Q] .
Now necessarily z /= x, since otherwise x does not occur in A or B, so

A[x→Q] = A ∼ B = B[x→Q] .

Note that if w is a variable not occurring anywhere in a


verification, applying [y→w] for any y to all terms in the pairs in
the verification will produce another verification.
Now choose the variable w different from x, so that w is
not in Q nor in any term in pairs in the above “shortest
verification”. Applying [z→w] to everything up to the (Az, Bz) term
in that verification shows that
Aw Bw by a derivation shorter than the “shortest” one above. Thus we

have (Aw)[x→Q] (Bw)[x→Q] ; that is, A[x→Q]w B[x→Q]w . But now we
~
can just erase w since it doesn’t occur in A[x→Q] nor in B[x→Q],
yielding the contradiction A[x→Q] B[x→Q] , i.e. R[x→Q] P [x→Q] , and
completing the proof. ~
[Of course this isn’t really a proof by contradiction; rather it is one
by induc- tion on the length of the shortest verification for P ∼
Q.]

VII-6.13. For all P Γ, we have (µx P )x P, and, more


∈ •
generally (noting that ‘okayness’ of substitution is not an issue
in Γ) the following analogue of the (β)-rule:

9
(µx • P )Q ∼ P [x→Q] .

9
[Note that the first part plus ‘extensionality’ quickly give the
direct analogue in Γ for the (η)-rule from Λ, namely

µx • Rx ∼ R if x is not in R .
To see this, just apply the left side to x.]
Proof. Proceeding by induction on P for the first identity,
the atomic case when P = x just uses Ix ∼ x . When P is an
atom other than x,
(µx • P )x = KPx ∼ P .

For the inductive step,

(µx • QR)x = S(µx • Q)(µx • R)x ∼ (µx • Q)x((µx • R)x) ∼ QR .

Then the second identity follows using the first one, using VII-
6.12, and using the previous fact in VII-6.11 about x not
occurring in µx • P :
(µx • P )Q = ((µx • P )x)[x→Q] ∼ P [x→Q] .

Completion of the proof of VII-6.5. To prove the remaining


fact, namely ΦΨ(S) S, first we show that
~
ΦΨ(S)BCDSBCD for any B, C, D . This is straightforward, using VII-
6.13 three times :

ΦΨ(S)BCD = Φ(S)BCD = (µx•(µy•(µz •xz(yz))))BCD ∼


(µy • (µz • xz(yz)))[x→B]CD = (µy • (µz • Bz(yz)))CD ∼
(µz • Bz(yz)) [y D = (µz • Bz(Cz))D ∼
(Bz(Cz))[z→D] = BD(CD) ∼ SBCD .
Now just take B, C, D as three distinct variables and cancel.

VII-6.14. If y doesn’t occur in Q and x /= y, then


(µy • P )[x→Q] ∼ µy • (P [x→Q]) .

Proof. Proceeding by induction on P , the atomic cases are as follows:

9
P = y : Both sides give I .
P = x : Both sides give KQ, up to ∼ . (Use VII-6.9 .)
P is any other atom : Both sides give KP .
The inductive step uses nothing but
definitions : (µy • TR)[x→Q] = (S(µy • T )(µy • R))
[x→Q]
=
S(µy • T )[x→Q](µy • R)[x→Q] ∼ S(µy • T [x→Q])(µy • R[x→Q]) =
µy •(T [x→Q]R[x→Q]) = µy •((TR)[x→Q]) .

VII-6.15. If x is not free in A, then it does not occur in Φ(A) .


Proof. This is an easy induction on A, using VII-6.11 and definitions.

VII-6.16. If A[x→B] is okay, then Φ(A[x→B]) ∼ Φ(A)[x→Φ(B)] .


Proof. Proceeding by induction on A, the atomic cases are as follows:
A = x : Both sides give Φ(B) .
A = y /= x : Both sides give
Φ(y) . The inductive steps are
as follows:
(I) A = CD : Since C[x→B] and D[x→B] are both okay as long as (CD)
[x→B]
is, Φ((CD)[x→B]) = Φ(C[x→B])Φ(D[x→B]) ∼ Φ(C)[x→Φ(B)]Φ(D)[x→Φ(B)]

=
(Φ(C)Φ(D))[x→Φ(B)] = Φ(CD)[x→Φ(B)] .
(II) A = λx • C : We have,
Φ((λx • C)[x→B]) = Φ(λx • C) = µx • Φ(C) ,
whereas, using VII-6.11,
(Φ(λx • C))[x→Φ(B)] = (µx • Φ(C))[x→Φ(B)] = µx • Φ(C) .

(III) A = λy C for y = x : The first ‘ ’ below uses the inductive


• /
hypoth- esis and VII-6.10. The second one uses VII-6.14,

9
observing that, by VII- 6.15, the variable y doesn’t occur in
Φ(B) because it is not free in B, the

9
latter being the case because (λy • C)[x→B] is okay.
Φ((λy•C)[x→B]) = Φ(λy•(C[x→B])) = µy•(Φ(C[x→B])) ∼
µy•(Φ(C)[x→Φ(B)]) ∼ (µy•Φ(C))[x→Φ(B)] = (Φ(λy•C))[x→Φ(B)] .

VII-6.17. If y isn’t in P, then •µx ∼ P •µy (P [x→y]).


[Note that this is the direct analogue in Γ for the (α)-rule from Λ.]
Proof. We leave this to the reader—a quite straightforward
induction on P , using only definitions directly.

Proof of VII-6.4. Proceeding inductively, and looking at the


inductive procedure in the definition of Λ first : If A1 ≈ A2 and
B1 ≈ B2, then
Φ(A1B1) = Φ(A1)Φ(B1) ∼ Φ(A2)Φ(B2) = Φ(A2B2) ,
an
d Φ(λx • A1) = µx • Φ(A1) ∼ µx • Φ(A2) = Φ(λx • A2) ,
using VII-6.10 in this last line.
So the result will follow from three facts, corresponding to the
three basic reduction laws in the definition of ≈ :
(η) Φ(λx • Ax) ∼ Φ(A) if x is not free in
'

A . (β)' Φ((λx • A)B) ∼ Φ(A[x→B]) if A[x→B] is


okay .
(α)' Φ(λx • A) ∼ Φ(λy • A[x→y]) if y is not free in A and A[x→y] is
okay. Twice below we use VII-6.15.
To prove (η)', for any Q, we have
Φ(λx • Ax)Q = (µx • Φ(A)x)Q ∼ (Φ(A)x)[x→Q] = Φ(A)Q ,
as suffices, using VII-6.13 for the middle step.
To prove (β)', the left-hand side is
(µx • Φ(A))Φ(B) ∼ Φ(A)[x→Φ(B)] ∼ Φ(A[x→B]) ,
using VII-6.13 and VII-6.16 .

9
To prove (α)', the left-hand side is
µx • Φ(A) ∼ µy • (Φ(A)[x→y]) ∼ µy • Φ(A[x→y]) ,

which is the right-hand side, using, respectively, [VII-6.17 plus y


not occur- ring in Φ(A)], and then using VII-6.16 .

This completes the proof that the combinators do produce a


combina- torially complete binary operation, not different from
that produced by the λ-calculus (in our so-called extensional
context).
Schonfinkel’s original proof is different, and not hard to follow.
Actually with all this intricacy laid out above, we can quickly give
his direct
∼ proof that Γ/ is combinatorially complete. More-or-
less≈just copy the proof that Λ/ is combinatorially
• complete,

replacing ‘λx ’ everywhere by ‘µx ’ , as follows.
For the constant function of ‘n’ variables with value P , pick
distinct vari- ables yi not in P and let

ζ = µy1 • µy2 • · · · • µyn • P .


For the ith projection, let ζ = µx1 µx2 µxn xi .
• •···• •
For the inductive step, let

ζ = µx1 • µx2 • · · · • µxn • (ζgx1x2 · · · xn)(ζhx1x2 · · · xn) ,

as in the proof of VII-6.1 .

Schonfinkel’s theorem is a bit more general than this, as below.


The ideas in the proof of the substantial half are really the same
as above, so the main thing will be to set up the machinery.
We’ll leave out details analogous to those in the build-up
above.

Theorem VII-6.18. (Schonfinkel) Let Ω be any binary


operation (writ- ten as juxtaposition with the usual bracket
conventions). Then Ω is combi- natorially complete if and only
if it has elements κ and σ such that, for all ωi ∈ Ω, we have

κω1ω2 = ω1 and σω1ω2ω3 = (ω1ω3)(ω2ω3) .

9
For example, as we saw above, when Ω = Γ/ ∼ , we could take κ and σ
to be the equivalence classes [K]∼ and [S]∼ , respectively.
Proof. Assume it is combinatorially complete, and with F =
FUN 2 (ν 1 ) and G = FUN3((ν1 ν3) (ν1 ν3)), let κ and σ respectively
∗ ∗ ∗
be correspond- ing elements ζ from the definition of
combinatorial completeness. Thus, as required,

κω1ω2 = F (ω1, ω2) = ω1 and σω1ω2ω3 = G(ω1, ω2, ω3) = (ω1ω3)(ω2ω3) .

For the converse, here is some notation :


[
Ω ⊂ Ω(0) = FREE(Ω) ⊂ Ω+ = FREE(Ω ∪ {ν1, ν2, · · ·}) = Ω(n) ,
+ +
n≥0

where
(n
:= FREE(Ω ∪ {ν1, ν2, · · · , νn})
Ω)
.+

As before, the operation in Ω+ will be written ∗ , to contrast


with juxtapo- sition in Ω (and the νi’s are distinct and disjoint
from Ω) .
Note that morphisms χ : Ω+ → Ω are determined by their
effects on the set of generators, Ω ∪ {ν1, ν2, · · ·} . We’ll only
consider those which map each element of Ω to itself, so
when restricted to Ω(0), all such χ agree. +
Furthermore, we have, with Ai ∈ Ω ,
FUN n (f )(A1, · · · , An) = χ(f [v1→A1 ][v2→A2 ]···[vn→An ] ) .
This is because, for fixed (A1, · · · , An), each side is a morphism Ω(n) → Ω,
“of f ” , and they agree on generators. +

Now, for variables y = νi and P Ω , define µy P


∈ • +∈
Ω+ inductively much as before
:
µy • P := k ∗ P for P ∈ Ω or if P is a variable νj /= y ;
µy • y := s ∗ k ∗ k ;
µy • (Q ∗ R) := s ∗ (µy • Q) ∗ (µy • R) .
It follows as before that the variables occurring in µy • P are
precisely those other than y in P . Also, define an equivalence
relation on Ω+ by

1
a ∼ b ⇐⇒ ∀χ , χ(a) = χ(b)

1
(referring to χ which map elements of Ω by the identity map).
Then we have, proceeding by induction on P as in VII-6.13,

(µy • P ) ∗ y ∼ P ,
and, substituting Q for y,

(µy • P ) ∗ Q ∼ P [y→Q] ,
and finally, by induction on n,

(µy1 • µy2 • · · · µyn • P ) ∗ Q1 ∗ · · · ∗ Qn ∼ P [y1→Q1 ][y2→Q2 ]···[yn→Qn ] .


The proof is then completed easily : Given f ∈ Ω(n), define
+
ζ := χ(µν1 • µν2 • · · · µνn • f ) ∈ Ω .
Then, for all Ai ∈ Ω , we have Ai = χ(Ai) , so
ζA1 · · · An = χ(µν1•µν2 •· · · µνn •f )χ(A1) · · · χ(An) =
χ((µν1 • µν2 • · · · µνn • f ) ∗ A1 ∗ · · · ∗ An) =
χ(f [v1→A1 ][v2→A2 ]···[vn→An ] ) = FUN n (f )(A1, · · · , An) ,
as required.

VII-7 Models for λ-Calculus, and Denotational Semantics.


This will be a fairly sketchy introduction to a large subject,
mainly to provide some motivation and references. Then, in the
last two subsections, we give many more details about a
particular family of λ-models.
As far as I can determine, there is still not complete agreement
on a single best definition of what is a ‘model for the λ-calculus’.
For example, not many pages along in [Ko], you already have a
choice of at least ten definitions, whose names correspond to
picking an element from the following set and erasing the
commas and brackets :

{ proto , combinatorial , Meyer , Scott , extensional }×{ lambda }×{ algebra ,


model } .

1
And that’s only the beginning, as further along we have
“categorical lambda models”, etc., though here the first adjective
refers to the method of construc- tion, rather than an axiomatic
definition. In view of all this intricacy (and relating well to the
simplifications of the last subsection), we shall consider in
detail only what are called extensional lambda models . The
·
definition is quite simple : Such an object is any structure (D ,
, κ , σ) , where “ ” is a binary operation on the set D, which set
has specified distinct elements κ and σ causing it to be
combinatorially complete as in VII-6.18, and the operation
must satisfy

(extensionality) for all α and β [(∀γ α · γ = β · γ) =⇒ α = β ] .


Notice that, using extensionality twice and then three times, the elements κ
and σ are unique with respect to having their properties

κω1ω2 = ω1 and σω1ω2ω3 = (ω1ω3)(ω2ω3) .

So they needn’t be part of the structure, just assumed to exist,


rather like the identity element of a group. That is, just assume
combinatorial completeness (along with extensionality), and
forget about any other structure.
The other 10+ definitions referred to above are similar
(once one be- comes educated in the conventions of this type of
model theory/equational- combinatorial logic, and can actually
determine what is intended—but see the lengthy digression in
small print beginning two paragraphs below). Ef- fectively each
replaces extensionality with a condition that is strictly weaker
(and there are many interesting models that are not
extensional, and which in some cases are apparently important
for the application to denotational semantics.) So the reader
should keep in mind that the extensional case, which we
emphasize except in the small print just below, is not the entire
story. Much more on the non-extensional case can be found in
the refer- ences. Except for first reading the digression below,
the clearest treatment I know is that in [HS], if you want to
learn more on models in general, for the λ-calculus and for
combinators.
1
One missing aspect in our definition above is this : Where
did the λ- abstraction operator get to, since, after all, it is the λ-
calculus we are sup- posed to be modelling? Several of the other
definitions address this directly. Below for several paragraphs
we discuss a way of arriving at a general sort of definition of
the term λ-model, a little different from others of which I

1
am aware, but easily seen to be mathematically equivalent to the
definitions usually given. In particular, this will indicate how to
get maps from the λ-calculus, Λ, into any model, and quite
explicitly into extensional ones as above, one such map for
each assignment of variables.
A (not entirely) Optional Digression on λ- models.
To remind you of a very basic point about sets and functions, there is a 1-1 adjoint-
ness correspondence as follows, where AB denotes the set of all functions with (domain,
codomain) the pair of sets (B, A) :

AB×C ←→ (AC)B
f →| [b '→ [c '→ f (b, c)]]
[(b, c) '→ g(b)(c)] ←| g
[Perhaps the λ-phile would prefer λbc f (b, c) and λ(b, c) gbc .] Taking B = C = A, this
• •
specializes to a bijection between the set of all binary operations on A and the set
(AA)A. In particular, as is easily checked, the extensional binary operations on A
correspond to the injective functions from A to AA .

Let us go back to some of the vague remarks of Subsection VII-3, and try to puzzle
out a rough idea of what the phrase “model for λ-calculus” ought to mean, as an object
in ordinary naive set-theoretic mathematics, avoiding egregious violations of the axiom
of foundation. We’ll do this without requiring the reader to master textbooks on model
theory in order to follow along. It seems to the author that the approach below, certainly
not very original, is more natural than trying to force the round peg of λ-calculus into the
square hole of something closely similar to models for 1storder theories.
We want some kinds of sets, D, for which each element in D can play a second role
as also somehow representing a function from D to itself. As noted, the functions so
represented will necessarily form a very small subset, say [D ~ D], of DD, very small

relative to the cardinality of DD.
The simplest way to begin to do this is to postulate a surjective function φ, as below :

φ : D → [D−~ D] ⊂ DD .
So we are given a function from D to the set of all its self-functions, we name that function’s
image [D ~ D], and use φ as the generic name for this surjective adjoint. It is the adjoint

of a multiplication D D D, by the remarks in the first paragraph of this digression.
×
Now let Λ(0) be the set of all closed terms in Λ, those without free variables. A
model, D, such as we seek, will surely involve at least a function Λ (0) D with good

properties with respect to the structure of Λ(0) and to our intuitive notion of what that
structure is supposed to be modelling; namely, function application and function
abstraction. As to the former, the obvious thing is to have the map be a morphism
between the binary operations on the two sets.

1
Because Λ is built by structural induction, it is hard to define anything related to Λ
without using induction on structure. Furthermore, by first fixing, for each variable x,
an element ρ(x) ∈ D, we’d expect there to be maps (one for each “assignment” ρ)
ρ+ : Λ → D ,
which all agree on Λ with the one above. (So their restrictions to Λ(0) are all the
(0)

same map.) These ρ+ ’s should map each variable x to ρ(x) . Thinking about
application and abstraction (and their formal versions in Λ), we are led to the
following requirements.

ρ+ : x '→ ρ(x) ;
ρ+ : MN '→ φ(ρ+ (M ))(ρ+ (N )) ;
ρ+ : λx • M '→ ψ(d '→ ρ[x'→d](M )) .
+
The middle display is the obvious thing to do for a ‘semantic’ version of application.
It’s just another way of saying that ρ+ is a morphism of binary operations.
But the bottom display needs plenty of discussion. Firstly, the assignment
ρ[x'→d] is the assignment of variables which agrees with ρ, except that it assigns d D
∈ but is
to the variable x. (This is a bit like substitution, so our notation reflects that,
[x'→d] [x→d]
deliberately different, rather than ). For the moment, let ψ(f ) vaguely mean
“some element of D which maps under φ to f ”. Thus, the bottom display says that
λx M , which we •
intuit as “the function of x which M gives”, should be mapped by ρ+ to the element
of D which is “somehow represented” (to quote our earlier phrase) by the function D
→ D which sends d to ρ[x'→d](M ) .
Inductively on the
+ structure of M , it is clear from the three displays that, for all ρ,
the values ρ+ (M ) are completely determined, at least once ψ is specified (if the
displays really make sense).
And it is believable that ρ+ (M ) will be independent of ρ for closed terms M . In
fact, one would expect to be able to prove the following :

∀M, ∀ρ, ∀ρ′ [∀x [x is free in M ⇒ ρ(x) = ρ′ (x)] =⇒ ρ+ (M ) = ρ′+(M )] .

This clearly includes the statement about closed M .


We’d also want to prove that terms which are related under the basic relation (or

rather its non-extensional version in) will map to the same element in D.

But before we get carried away trying to construct these proofs, there is one big
problem with the bottom display in the triple specification of ρ+ ’s values :
How do we know, on the right-hand side, that the function d '→ ρ[x'→d](M ) is actually in
+
the subset [D ~ D] ??

This must be dealt with by getting more specific about what the model (D, φ, ψ)
can be, particularly about ψ . Playing around with the difficulty just above, we find
that, besides wanting (φ ◦ ψ)(f ) = f for all f ∈ [D−~ D], we can solve the difficulty as
long as
(i) D is combinatorially complete, and (ii) the composite the other way, namely ψ ◦ φ, is
in [D−~ D] .

1
So now, after all this motivation, here is the definition of λ-model or model for the
λ-calculus which seems the most natural to me :

Definition of λ-model. It is a structure (D ; · ; ψ) such that :


(i) (D ; ·) is a combinatorially complete binary operation with more than one element ;
(ii) ψ : [D−~ D] → D is a right inverse for the surjective adjoint φ : D → [D−~ D] of
the multiplication in (i), such that ψ ◦ φ ∈ [D−~ D] .

Before going further, we should explain how our earlier


definition (on which these notes will entirely concentrate after
this subsection) is a special case. As noted in the first paragraph
of this digression, if the binary operation is extensional, then φ is
injective. But then it is bijective, so it has a unique right
inverse ψ, and this is a 2-sided inverse. But then the last

condition is automatic, since the identity map of D is in [D ~ D]
by combinatorial completeness. So an extensional,
combinatorially complete, and non-trivial binary operation is
automatically a λ-model in a unique way, as required.
It is further than we wish to go in the way of examples and proofs, but it happens
to be an interesting fact that there exist combinatorially complete binary operations for
which the number of ψ’s as in (ii) of the definition is zero, and others where there are
more than one, even up to isomorphism. In other words, λ-models are more than just
combinatorially complete binary operations—there is extra structure whose existence and
uniqueness is far from guaranteed.
Let’s now state the theorem which asserts the existence of the maps ρ+ talked
about earlier.
Theorem. Let (D ; · ; ψ) be a λ-model. Then there is a unique collection
{ ρ+ : Λ → D | ρ : {x1, x2, · · ·} → D }
for which the following hold :
(1) ρ+ (xi ) = ρ(xi) for all i ;
(2) ρ+(MN ) = ρ+ (M ) · ρ+ (N ) for all terms M and N ;
(3) (i) The map (d '→ ρ[x'→d](M )) is in [D−~ D] for all ρ, x, and M ; and
+
(ii) ρ+ (λx • M ) = ψ(d '→ ρ[x'→d](M )) for all x and M .
+
As mentioned earlier, the uniqueness of the ρ+ is pretty clear from (1), (2) and (3)
(ii). And the following can also be proved in a relatively straightforward manner,
inductively on structure: the dependence of ρ+(M ) on only the values which ρ takes
on variables free in M , the invariance of ρ+ with respect to the basic equivalence
relation “ ” on terms, and several other properties. But these proofs are not
≈ just one
or two lines, particularly carrying out the induction to establish existence of the ρ+
with property (3)(i). Combina- torial completeness has a major role. We won’t give the
proof, but the following example should help to make it clear how that induction goes.

1
Example. This illustrates how to reduce calculation of the ρ+ to statement (1)
of the theorem, using (2) and (3)(ii), and then why the three needed cases of (3)(i)
hold.

ρ+ (λxyz • yzx) = ψ(d '→ ρ[x'→d](λyz • yzx)) = ψ(d '→ ψ(c '→ ρ[x'→d][y'→c](λz • yzx)))
+ +
= ψ(d '→ ψ(c '→ ψ(b '→ ρ[x'→d][y'→c][z'→b](yzx)))) = ψ(d '→ ψ(c '→ ψ(b '→ cbd))) .
+
But we want to see, as follows, why each application of ψ makes sense, in that it
applies to a function which is actually in [D ~ D] . Using combinatorial completeness,
choose elements ϵ, ζ1, ζ2 and ζ3 in D−such that, for all p, q, r, s and t in D, we have
ϵp = (ψ ◦ φ)(p) ; ζ1pqr = prq ; ζ2pqrs = p(qsr) ; ζ3pqrst = p(qrst) .
[Note that a neat choice for ϵ is ψ(ψ φ) .]

Now
b '→ cbd = ζ1cdb = φ(ζ1cd)(b) ,
so the innermost one is okay; it is φ(ζ1cd) ∈ [D−~ D] . But then
c '→ ψ(b '→ cbd) = ψ(φ(ζ1cd)) = ϵ(ζ1cd) = ζ2ϵζ1dc = φ(ζ2ϵζ1d)(c) ,
so the middle one is okay; it is φ(ζ2ϵζ1d) ∈ [D−~ D] . But then
d '→ ψ(c '→ ψ(b '→ cbd) = ψ(φ(ζ2ϵζ1d)) = ϵ(ζ2ϵζ1d) = ζ3ϵζ2ϵζ1d = φ(ζ3ϵζ2ϵζ1)(d) ,
so the outer one is okay; it is φ(ζ3ϵζ2ϵζ1) ∈ [D−~ D] .
Meyer-Scott approach.
The element ϵ from the last example determines ψ φ, and therefore ψ, since φ is

surjective. It is easy to show that, for all a and b in D ,
ϵab = ab and [∀c, ac = bc] =⇒ ϵa = ϵb :
ϵ · a · b = (ψ ◦ φ)(a) · b = φ((ψ ◦ φ)(a))(b) = (φ ◦ ψ ◦ φ)(a)(b) = (φ)(a)(b) = a · b .
[∀c, a · c = b · c] i.e. φ(a) = φ(b) =⇒ ψ(φ(a)) = ψ(φ(b)) i.e. ϵ · a = ϵ · b .
Using these two facts, one can quickly recover the properties of ψ. Thus our definition of
λ-model can be redone as a triple (D ; ; ϵ) with (ii) replaced by the two properties just
·
above. A minor irritant is that ϵ is not unique. That can be fixed by adding the condition
ϵϵ = ϵ. So this new definition is arguably simpler than the one given, but I think less
motivatable; but the difference between them is rather slight.
Scholium on the common approach.
The notation ρ+ (M ) is very non-standard. Almost every other source uses the notation
[[M ]]Mρ where M = (D ; ·) .

Both in logic and in denotational semantics (and perhaps elsewhere in computer science),
the heavy square brackets [[ ]] seem to be ubiquitous for indicating ‘a semantic version of

1
the syntactic object inside the [[ ]]’s’. As mentioned earlier, the vast majority of sources
for λ-model (and for weaker (more general) notions such as λ-algebra and proto-λ-algebra)
express the concepts’ definitions by lists of properties of the semantic function [[ ]]. And,
of course, you’ll experience a heavy dosage of “ =” and “ ”. Again [Hind-Seld], Ch. 11
| ▶
(and Ch. 10 for models of combinatory logic), is the best place to start, esp. pp.112-122,
for the list of properties, first the definition then derived laws.
Given such a definition, however weakly motivated, one can get to our definition—
more quickly than the other way round (the theorem above embellished)—by defining ψ
using
ψ(φ(a)) := ρ[y'→a](λx • yx) .
+
Note that λx yx y, but λx yx in y, as expected from this. The properties of ρ+
• ≈ • /≈
vaguely alluded to above assure one that
φ(a) = φ(b) [i.e. ∀d, ad = bd] =⇒ ρ[y'→a](λx • yx) = ρ[y'→b](λx • yx) ,
+ +

whereas ρ (y) /= ρ
[y'→a]
(y) clearly, if a /= b !
[y'→b]
+ +
The combinatorial completeness in our definition will then follow from Schonfinkel’s
theorem, noting that σ := ρ+ (λxyz (xz)(yz)) and κ := ρ+ (T ) (for any ρ) have the
needed properties. •
Again, from the viewpoint which regards [[ ]], or ρ+ , as primary, the element ϵ
previous would be defined by ϵ := ρ+ (λxy xy) for any ρ ;—hardly surprising in view
of the law ϵab = ab . •
Finally, going the other way, a quick definition of ρ+ for an extensional D is just
the composite ρ
Φ ∗

ρ+ = (Λ −→ Γ − D) ,
where we recall from VII-6 that Γ is the free binary operation on S, K, x1, x2, and
the map Φ is the one inducing the isomorphism between ≈Λ/ and { Γ/ , and· ·here, ρ is

defined to be the unique morphism of binary operations taking each xi to ρ(xi), and taking
S and K to the elements σ and κ from Schonfinkel’s theorem (unique by
extensionality).
Term model.
The set of equivalence classes Λ/ in is a combinatorially complete non-trivial
≈ would be if we used “ ” in place of “ in”.
binary operation. It is not extensional, but
≈ is to define ≈
The canonical way to make it into a λ-model, using our definition here,

ψ(φ([M ]≈in )) := [λx • Mx]≈in for any x not occurring in M .

It will be an exercise for you to check that this is well-defined, and satisfies (ii) in the
definition of λ-model. Notice that, as it should, if we used “≈” in place of “≈in”, the
displayed formula would just say that ψ ◦ φ is the identity map.
What does a λ-model do?
(1) For some, using the Meyer-Scott or ‘our’ definition, it is sufficient that it provides an
interesting genus of mathematical structure, one for which finding the ‘correct’ definition

1
had been a decades-long effort, and for which finding examples more interesting than the
term models was both difficult and important. See the next subsection.
(2) For computer scientists, regarding the λ-calculus as a super-pure functional pro-
gramming language, the functions [[ ]] are the denotational semantics of that language.
See the rest of this subsection, the table on p.155 of [St], and the second half of Subsection
VII-9.
(3) For logicians, [[ ]] provides the semantics for a putative form of proto-logic. Or,
coming down from that cloud, one may think of [[ ]] as analogous to the “Tarski definition
of truth” in standard 1storder logic. Again, read on !

Back to regular programming.


Why should one wish to produce models other than the term
models? (There are at least four of those, corresponding to

using “ ” or “ in ” , and to restricting to closed terms or
allowing terms with free variables. But the closed term models
actually don’t satisfy our definition above of λ-model, just one of
the weaker definitions vaguely alluded to earlier.) As far as I
can tell, besides the intrinsic mathematical interest, the
requirements of deno- tational semantics include the need for
individual objects inside the models with which one can work
with mathematical facility. The syntactic nature of the elements
in the term models dictate against this requirement. Further
requirements are that one can manipulate with the models
themselves, and construct new ones out of old with gay
abandon. See the remarks below for more on this, as well as
Subsection VII-9.
So we shall content ourselves, in the next subsection,
‘merely’ to go through Scott’s original construction, which
(fortunately for our pedagogical point of view) did produce
extensional models. First I’ll make some moti- vational and other
remarks about the computer science applications. This
remaining material also ties in well with some of the rather vague
comments in VII-3.
The remainder of this subsection contains what I have
gleaned from an unapologetically superficial reading of some
literature on denotational se- mantics. To some extent, it may
misreprepresent what has been really in the minds of the
pioneers/practitioners of this art. I would be grateful for

1
corrections/criticisms from any knowledgeable source. See
Subsection VII-9 for some very specific such semantics of the
language ATEN from the main part of this paper, and for the
semantics of the λ-calculus itself.
We did give what we called the semantics of BTEN right after the syn-

1
tactic definition of that language. As far as I can make out, that
was more like what the CSers would call an “operational
semantics”, rather than a denotational semantics. The reason
is that we referred to a ‘machine’ of sorts, by talking about
“bins” and “placing numbers in bins after erasing the numbers
already there”. That is a pretty weak notion of a machine. But
to make the “meaning” of a language completely “machine
independent” is one of the main criteria for an adequate
denotational semantics.
See also the following large section on Floyd-Hoare logic. In
it, we use a kind of operational semantics for several command
languages. These as- sign, to a (command, input)-pair, the entire
sequence of states which the pair generates. So that’s an even
more ‘machine-oriented’ form of operational se- mantics than the
above input/output form. On the other hand, Floyd-Hoare logic
itself is sometimes regarded as a form of semantic language
specification, much more abstract than the above forms. There
seems to have been (and still is?) quite a lot of controversy as
to whether the denotational viewpoint is preferable to the above
forms, especially in giving the definitions needed to make sense of
questions of soundness and completeness of the proof systems in
F-H logic. But we are getting ahead of ourselves.
Along with some of the literature on F-H logic, as well as
some of Dana Scott’s more explanatory productions, the two CS
textbooks [Go] and [Al] have been my main sources. The
latter book goes into the mathematical details (see the next
subsection) somewhat more than the former.
In these books, the authors start with very small imperative
programming languages : TINY in [Go], and “simple language” in
Ch.5 of [Al]. These are clearly very close to ATEN/BTEN from
early in this work. They contain some extras which look more
like features in, say, PASCAL. Examples of “extras” are :
(1) the command “OUTPUT E” in [Go] ;
(2)the use of “BEGIN. . . END” in [Al], this being simply a
more readable replacement for brackets ;
(3) the SKIP-command in [Al], which could be just whdo(0 =
1)(any C) or x0 ←: x0 from ATEN .
1
So we may ask : which purely mathematical object
corresponds to our mental image of a bunch of stored natural
numbers, with the number called

1
νi stored in bin number i ? That would be a function
σ : IDE → V AL ,
where σ is called a state, IDE is the (syntactic) set of identifiers, and V AL
is the (semantic) ‘set’ of values.
Now IDE for an actual command language might, for
example, be the set of all finite strings of symbols from the 62-
element symbol set consisting of the 10 decimal digits and the
×
(26 2) upper- and lower-case letters of the Roman alphabet,
with the proviso that the string begin with a letter (and with a
few exclusions, to avoid words such as “while” and{“do” being ··
used as identifiers). For us in BTEN, the set IDE was simply the
set x0, x1, x2,
of variables. (In each case we of course have a countably
infinite set. And in the latter case, there would be no problem
in changing to a set of finite strings from a finite alphabet, for
example the strings x|||···|| . Actually, as
long as the subscripts are interpreted as strings of [meaningless?] digits, as
opposed to Platonic integers, we already have a set of finite
strings from a finite alphabet.)
In referring above to V AL, we used “ ‘set’ ” rather than
“set”, because this is where λ-calculus models begin to come in,
or what are called domains in this context. For us, V AL was just
the set N= {0, 1, 2, · · ·} of all natural numbers. And σ was the
function mapping xi IDE to νi V AL. But ∈ for doing∈denotational
semantics with a real, practical language, it seems to be
essential that several changes including the following are made :
Firstly, the numbers might be more general, such as allowing
all integers (negatives as well).
Secondly, V AL should contain something else, which is often

called in this subject. It corresponds more-or-less to our use of
“err” much earlier.
Thirdly, we should have a (rather weak) partial order on the
±
objects above (much more on this in the next subsection). As to
the actual ordering here, other than requiring b b for all b
±
(part of the definition of partial order), we want c for all
⊥± ∈
numbers c V AL (but no other relation between elements as
±
above). So is a kind of ‘how much information?’ partial order.
1
Finally, for a real command language, V AL would contain other
semantic objects such as the truth values Tr and Fs. So, for
example, in [Al], p.34, we see definitions looking like
Bν = INT + BOOL + Char + · · · ,

1
an
d
V AL = Bν + [V AL + V AL] + [V AL × V AL] + [V AL−~ V AL] .

Later there will be more on that last ‘equation’ and similar


ones. Quite apart from the ’s, [Al]’s index has no reference to
·
Char (though one might guess), and its index has double the page
number, p.68, for Bν, which likely stands for ‘basic values’. Also,
even using only numbers and truth values, there are some
decisions to make about the poset V AL ; for example, should it
contain

Tr Fs 0 1 −1 2 −2 · · Tr Fs 0 1 −1 2 ···
·
\ / \\ / / \\ //
\ / \\ / /· · · , or \\ // ··· ?
\/ \\ merely \\ //
//
⊥BOOL ⊥INT ⊥
\ /
\ /
\/

In any case, it is crucial that we have at least the following :
(1) Some syntactically defined sets, often IDE, EXP and COM ,
where the latter two respectively are the sets of expressions
and commands.
(2) Semantically defined ‘domains’ such as V AL and perhaps

STATE = [IDE−~ V AL] ,

where the right-hand side is a subposet of the poset of all functions from
IDE to the underlying set of the structure V AL.
(3) Semantically defined functions

E : EXP → [STATE−~ V AL] ,

an
d C : COM → [STATE−~ STATE] .

1
The situation in [Go] is a bit messier (presumably of necessity) because is not used
as a value, but separated out, called sometimes “unbound” and sometimes ⊥ “error”, but
systematically.

1
Be that as it may, let us try to explain and in the case of
E C
our simple language BTEN/ATEN. Much more detail on this
appears two subsections ahead.
Here EXP is the set of terms and quantifier-free formulae
from the lan- guage of 1storder number theory. Then is that
E
part of the Tarski definition of truth as applied to the particular
interpretation, N, of that 1storder lan- guage which
(1) assigns to a term t, in the presence of a state ν, the natural number
t . That is, adding in the totally uninformative ‘truth value’ ⊥,
v

E[t](ν) := tv .
(But here we should add that tv = if any νi,

with xi occurring in t, is .)
(2) and assigns to a formula G, in the presence of a state ν,
one the truth values. That is,

E[G](ν)  Tr if G is true at ν ;
:=  Fs if G is false at ν ;


 if any νi, with xi free in G, is ⊥ .
Also, COM is the set of commands in ATEN or BTEN. And
assigns to a command C, in the presence of a state ν, the C new
state C (ν) . See the beginning of Subsection VII-9|| for more
details.
So it appears that there is no great difference between the
‘operational’ semantics already given and this denotational
⊥ semantics, other than allowing as a ‘totally undefined number’
or as a ‘totally uninformative truth value’.
But such a remark unfairly trivializes denotational semantics for
several rea- sons. Before expanding on that, here is a question
that experts presumably have answered elsewhere.
What are the main impracticalities of trying to bypass the
whole enter- prise of denotational semantics as follows ?
(1) Give, at the syntactic level, a translation algorithm of the
relevant practical imperative programming language back into
ATEN. For example, in the case of a self-referential command
(i.e. recursive program), I have already done that in IV-9 of this
1
work. I can imagine that GOTO-commands would present some
problems, but presumably not insuperable ones.
(2) Then use the simple semantics of ATEN to do whatever
needs to be done, such as trying to prove that a program never
loops, or that it is correct

1
according to its specifications, or that an F-H proof system is
complete, or, indeed, such as implementing the language.
This is the sort of thing mentioned by Schwartz [Ru-ed] pp.
4-5, but reasons why it is not pursued seem not to be given
there. Certainly one relevant remark is that, when one (as in
the next section on F-H logic) generalizes ATEN by basing it
on an arbitrary 1storder language and its semantics on an
arbitrary interpretation of that language, the translation as in
(1) may no longer be possible. And so, even when the
interpretation contains the natural numbers in a form which
permits G¨odel numbering for coding all the syntax, the
translation might lack some kind of desirable naturality. And
its efficiency would be a problem for sure.
This is becoming verbose, but there are still several matters
needing ex- planation, with reference to why denotational
semantics is of interest.
First of all, it seems clear that the richer ‘extras’ in real
languages such as PASCAL and ALGOL60, as opposed to the
really basic aspects already seen in these simpler languages, are
where the indispensibility of denotational semantics really
resides. (As we’ve emphasized, though these extras make
programming a tolerable occupation, they don’t add anything
to what is programmable in principle.) Examples here would be
those referred to in (1) just above—self-referential or recursive
commands, and GOTO-statements (which are also self-referential
in a different
B− sense), along with declarations and calls to
procedures with parameters, declarations of variables (related
to our -command in BTEN), declarations of functions, etc.
Here we are completely ignoring parallelism and concurrent
programming, which have become very big topics in recent
years.
But I also get the impression that mechanizing the projects
which use denotational semantics is a very central aspect here.
See the last chapter of [Al], where some of it is made executable,
in his phraseology. The math- ematical way in which we had
originally given the semantics of BTEN is inadequate for this.

1
It’s not recursive enough. It appears from [Go] and [Al], without
being said all that explicitly, that one aspect of what is really
being done in denotational semantics is to translate the language
into a form of the λ-calculus, followed by perhaps some
standard maps (like the ρ+ ear- lier) of the latter into one or
another “domain”. So the metalanguage of the first half of this
semantics becomes quite formalized, and (it appears to me), is a
pure functional (programming?) language. (Perhaps other pure
func-

1
tional programming languages don’t need so much denotational
semantics (beyond that given for the λ-calculus itself) since
they are already in that form?)
For example, lines -5 and -6 of p. 53 of [Al] in our notation
above and applied to BTEN become

C[ite(F )(C)(D)](σ) := (if E[F ](σ) = Tr, then C[C], else C[D])(σ) ,
for all formulas F , commands C and D, and states σ . At first, this
appears to be almost saying nothing, with the if-then-else being
semanticized in terms of if-then-else. But note how the right-
hand side is a statement from the informal version of McSELF,
in that what follows the “then” and “else” are not commands,
but rather function values. In a more formal λ-calculus version
of that right-hand side, we would just write a triple product, as
was explained in detail at the beginning of Subsection VII-5.
Even more to the point here, lines -3 and -4 of p. 53 of [Al]
in our notation above and applied to BTEN become

C[whdo(F )(C)](σ) := (if E[F ](σ) = Tr, then C[whdo(F )(C)]◦C[C], else Id)(σ) ,

First notice how much more compact this is than the early
definition in the semantics of BTEN. And again, the “if-then-
else” on the right-hand side would be formalized as a triple
product. But much more interestingly here, we have a self-
reference, with the left-hand side appearing buried inside the
right-hand side. So here we need to think about solving
equations. The Y -operator does that for us systematically in
the λ-calculus, which is where the right-hand side resides, in one
sense. The discussion of least fixed points ending the next
subsection is clearly relevant. A theorem of Park shows that, at
least in Scott’s models from the next subsection, the same
result is obtained from Tarski’s ‘least fixed points’ operator in
classical lattice theory (see VII-8.12 ending the next
subsection), as comes from the famous Y - operator of Curry
within λ-calculus.
We shall return to denotational semantics after the more
purely mathe- matical subsection to follow on Scott’s construction

1
of λ-calculus models for doing this work.

1
VII-8 Scott’s Original Models.
We wish to approach from an oblique angle some of the
fundamental ideas of Dana Scott for constructing extensional
models of the λ-calculus. There are many other ways to motivate
this for mathematics students. In any case, presumably he won’t
object if this approach doesn’t coincide exactly with his
original thought processes.
One of several basic ideas here, in a rather vague form, is to employ some
mathematical structure on A in order to construct such a model
with underly- ing set A. We wish to define a (relatively small)
subset of AA, which we’ll call [A ~ A], and a bijection between it

and A itself. The corresponding extensional binary operation on
A (via the adjointness discussed in the first paragraph of the
digression in the last subsection) we hope will be combina-
torially complete; that is, all its algebraically definable
functions should be representable. And of course,
representable implies algebraically definable almost by

definition. To pick up on the idea again, can we somehow build
a structure so that [A ~ A] turns out to be precisely the
‘structure-preserving’ functions from A to itself? If this were the
case, and the structure-preserving functions satisfied a few simple
and expected properties (with respect to com- position
particularly), then the proof of combinatorial completeness
would become a fairly simple formality :
representable = algebraically definable

= structure-preserving =

representable . In the next
several paragraphs, the details of how such an object would
give an extensional model for the λ-calculus are given in an
axiomatic style, leaving for later ‘merely’ the questions of what
“structure” to use, and of
self-reflection.
Details of essentially the category-theoretic approach.
Assume the following:
Data
Given a collection of ‘sets with structure’, and, for any two

1
×

such objects, A and B, a subset [A ~ B] BA of ‘structure
preserving functions’. Given also a canonical structure on both
the Cartesian product A B and on [A ~ B] .
Axioms
(1) The ith projection is in [A1 × · · · × An−~ Ai] for 1
≤ i ≤ n ; in

1
particular, taking n = 1, the identity map, idA, is in [A−~ A] .
(2) If f ∈ [A−~ B] and g ∈ [B−~ C] then the composition g ◦ f is
always in [A−~ C] .
(3) The diagonal map δ is in [A−~ A × A] , where δ(a) := (a, a) .
(4) Evaluation restricts to a map eν in [[A ~ B] A ~ B] , where
— ×
eν(f, a) = f (a).
(5) If f1 ∈ [A1−~ B1] and f2 ∈ [A2−~ B2] then the f1 × f2 is
necessarily in [A1 × A2−~ B1 × B2] , where (f1 × f2)(a1, a2) :=
(f1(a1), f2(a2)).
(6) All constant maps are “structure preserving”.
(7) The adjointness bijection, from AB×C to (AC)B, maps [B × C
−~ A] into [B−~ [C−~ A]] .
All this will be relatively easy to verify, once we’ve chosen the
appropriate “structure” and “structure preserving maps”, to
make the following work, which is more subtle. That choosing
will also be motivated by the application to denotational
semantics.
Self-reflective Object
Now suppose given one of these objects A, and a mutually
inverse pair of bijections which are structure preserving :

φ ∈ [A−~ [A−~ A]] and ψ = φ−1 ∈ [[A−~ A]−~ A] .

We want to show how such a “self-reflective” object may be


made canon- ically into an extensional combinatorially complete
binary operation.
Define the multiplication in A from φ, using adjointness—see
the first paragraph of the digression in the last subsection—and
so it is the following composite :

mult : A × φ e
[A−~ A] × A −→ A
A −
×

id

(x, y) '→ (φ(x), y) '→ φ(x)(y) := x · y


By (1),(2),(4) and (5), this map, which we’ll name mult, is in [A × A−~ A].

1
Now [A ~ A] contains idA and all constant maps, and is closed under

pointwise multiplication of functions as follows :

1
A −→ δ A× f A× muLt
− A
A × A
g


x '→ (x, x) '→ (f (x), g(x)) '→ f (x) · g(x)
The fact that φ is injective implies that mult is extensional, as
we noted several times earlier. First we shall check the 1-variable
case of combinatorial completeness of mult.
Definitions. (More-or-less repeated from much earlier.)
Say that f AA is 1-representable when there is a ζ A such that
∈ ∈
f (a) = ζ a for all a A .
· ∈
Define the set of 1-algebraically definable functions in AA to be
the small- est subset of AA containing idA and all constant
functions, and closed under pointwise multiplication of
functions.
1- combinatorial completeness of A is the fact that the two notions
just above coincide.
Its proof is now painless in the form
1-representable = 1-algebraically definable

= structure-preserving =1-representable .

The first implication is because a 1-representable f as in the
definition is the pointwise multiplication of (the constant
function with value ζ) times (the identity function).
The second implication is the fact noted above that the set
[A−~ A] is an example of a set containing idA and all constant
functions, and closed
under pointwise multiplication of functions.
(Of course, the equation in the definition of 1-representability
can be re- written as f = φ(ζ) , so the equivalence of 1-
representability with structure preserving becomes pretty
obvious.)
The third implication goes as follows :
Given f ∈ [A−~ A], define ζ to be ψ(f ) . then
ζ · a = ψ(f ) · a = φ(ψ(f ))(a) = f (a) ,

1
as required.
Now we shall check the 2-variable case of combinatorial
completeness, and leave the reader to pump this up into a proof
for any number of variables.

1
Definitions.
Say that f ∈AA×A is 2-representable when there is a ζ∈A such that
f (b, c) = (ζ b) c for all b and c in A .
·
Define the set of 2-algebraically definable functions in AA×A to be
the smallest subset of AA×A containing both projections and all
constant func- tions, and closed under pointwise multiplication
of functions.
2- combinatorial completeness of A is the fact that the two notions
just above coincide.
Its proof is much as in the 1-variable case :
2-representable = 2-algebraically definable

= structure-preserving =2-representable .

The first implication is because a 2-representable f as in the
definition is a suitably sequenced pointwise multiplication of the
constant function with value ζ and the two projections.
The second implication is the fact that the set [A A ~ A] is an ex-
×
ample of a set containing the projections and all constant
functions, and closed under pointwise multiplication of
functions. The latter is proved by composing:

A×A δ (A × A) × (A × f A× muLt
− A
A) × A
−→ g
→ −
(b, c) '→ ((b, c), (b, c)) '→ (f (b, c), g(b, c))'→ f (b, c) · g(b,
c) The third implication goes as follows :
If f [A A ~ A], the composite ψ adj(f ) is in [A ~ A], using axiom
∈ first
(7) for the × −time. By the part of ◦ the 1-variable − case saying
that structure preserving implies 1-representable, choose ζ so
that, for all b ∈ A, we have ψ(adj(f )(b)) = ζ · b . Then
ζ · b · c = ψ(adj(f )(b)) · c = φ(ψ(adj(f )(b)))(c) = adj(f )(b)(c) = f (b, c) ,

as required.

So now we must figure out what kind of structure will work,


and how we might produce a self-reflective object.
The next idea has already occurred in the verbose discussion of
1
denota- tional semantics of the previous subsection : the
structure referred to above

1
maybe should be a partial order with some extra properties (so
that, in the application, it intuitively coincides with ‘comparing
information content’).
Now I believe (though it’s not always emphasized) that an
important part of Scott’s accomplishment is not just to be first
to construct a λ-model (and do so by finding a category and a
self-reflective object in it), but also to show how to start with an
individual from a rather general species of posets
(in the application, from
{ ⊥
numbers , truth values , at least)
, and show how to embed it (as a poset) into an
extensional λ-model.
So let’s start with any poset (D, ) and see how far we can
±
get before having to impose extra properties. Recall that the
definition of poset requires
(i) a b and b c implies a c ; and
± ±
(ii) a b and b a if and only if a = b .
±
Temporarily define [D ~ D] to consist of all those functions f

which pre- serve order; that is
d ± e =⇒ f (d) ± f (e) .
More generally this defines [D ~ E] where E might not be the

same poset as D.
Now [D ~ D] contains all constant functions, so we can
embed D into − [D ~ D] by φ : d (d' d) . This φD : D [D
D
~ D]— will seldom be '→ surjective.
'→ → d−to the constant
It maps each
function with value d.
Next suppose that D has a minimum element called ⊥ . Then
we can map [D ~ D] back into D by ψD : f f ( ) . It maps
each function
— to '→
its minimum value.
It is a trivial calculation to show that ψD φD = idD, the

identity map of D. By definition, this shows D to be a retract of
[D ~ D] . (Note how this is the reverse of the−situation in the
digression of the last subsection, on the general definition of λ-
model, where [D−~ D] was a retract of D.) In particular, φD is
injective (obvious anyway) and ψD is surjective.
The set [D−~ D] itself is partially ordered by
f ±g ⇐⇒ ∀d , f (d) ± g(d) .

1
This actually works for the set of all functions from any set to any poset.
How do φD and ψD behave with respect to the partial orders
on their domains and codomains? Very easy calculations show
that they do preserve the orders, and thus we have
φD ∈ [D−~ [D−~ D]] and ψD ∈ [[D−~ D]−~ D] .

1
So D is a retract of [D ~ D] as a poset, not just as a set. But [D ~
— −
D] is usually too big—the maps above are not inverses of each
other, only injective and surjective, and one-sided inverses of
each other, as we already said.
Now here’s another of Scott’s ideas : whenever you have a retraction pair
‹→ E , you can automatically produce another retraction pair
D

[D−~ D]
← [E−~ E] .
‹→

If (φ, ψ) is the first pair and (φ', ψ') the second pair, then the
formulae defining the latter are
φ'(f ) := φ ◦ f ◦ ψ and ψ'(g) := ψ ◦ g ◦ φ .
Again a very elementary calculation shows that this is a
retraction pair. [As an aside which is relevant to your
further reading on this subject,
notice how what we’ve just done is purely ‘arrow-theoretic’ :
the only things used are associativity of composition and
behaviour of the identity mor- phisms. It’s part of category theory.
In fact Lambek has, in a sense, iden- tified the theory of
combinators, and of the typed and untyped λ-calculi, with the
theory of cartesian closed categories. The “closed” part is basically
the situation earlier with the axioms (1) to (7) where the set of
morphisms between two objects in the category is somehow
itself made into an object in the category, an object with good
properties.]
Now it is again entirely elementary to check that, for the
category of posets and order-preserving maps, the maps φ' and ψ'
do in fact preserve the order.
Back to the earlier situation in which we have the poset D as
a retract of the poset E = [D ~ D], the construction

immediately above can be iterated : Define
D0 := D , D1 := [D0−~ D0] , D2 := [D1−~ D1] , etc. · · · .
Then the initial retraction
pair D0 ‹→ D

1
gives rise by the purely cate gory-
1
theoretic construction to a sequence of retraction ‹←
→ Dn+ . We’ll
pairs Dn 1
denote these maps as (φn, ψn) .
Ignoring the surjections for the moment, we
have
φ0 φ1 φ2
D0 ‹→ D1 ‹→ D2 ‹→ · · · .

1
Wouldn’t it be nice to be able to ‘pass to the limit’, producing a
poset D∞ as essentially a union. And then to be able to just
set n = ∞ = n + 1, and say that we’d get canonical bijections
back-and-forth between D∞ and [D∞ ~ D∞] ??? (The first is n
+ 1, and the second one is n, so to speak.)
After—all, that was the original objective here !! In fact, going
back to some of the verbosities in Subsection VII-3, this looks
as close as we’re likely to get to a mathematical situation in
which we have a non-trivial structure D∞
which can be identified in a natural way with its set of (structure-
preserving!) self-maps. The binary operation on D∞ would of
course come from the adjointness discussed at the beginning of
this subsection; that is,

a∞ · b∞ := (φ∞(a∞))(b∞) ,

where φ∞ is the isomorphism from D∞ to [D∞ ~ D∞] .



Unfortunately, life isn’t quite that simple; but really not too
complicated either. The Scott construction which works is not
to identify D∞ as some kind of direct limit of the earlier displayed
sequence—thinking of the φ’s as
actual inclusions, that would be a nested union of the Dn’s—but
rather to define D∞ as the inverse limit, using the sequence of
surjections
ψ ψ ψ
0 D1 1 D2 2 ··· .
D0 ← ← ←
To be specific,
let

D∞ := { (d0, d1, d2, · · ·) | for all i, we have di ∈ Di and ψi(di+1) = di } .


Partially order D∞ by
(d0, d1, d2, · · ·) ± (e0, e1, e2, · · ·) ⇐⇒ for all i, we have di ± ei .

This is easily seen to be a partial order (even on the set of all


sequences, i.e. on the infinite product Π ∞i = 0 D i ).
[Notice that, as long as D0 has more than one element, the set D∞ will
be uncountably infinite in cardinality. So despite this all being motivated by
very finitistic considerations, it has drawn us into ontologically sophisticated
mathematics.]
And there’s another complication, but one of a rather edifying
1
nature (in my opinion). In order to make the following results
actually correct, we need

1
to go back and change poset to complete lattice and order
preserving map to continuous map. The definitions impose an
extra condition on both the objects and the morphisms.
Definition. A complete lattice is a poset (D, ), where every
±
subset of D has a least upper bound. Specifically, if A D, there is
⊂ ∈
an element l D such that
(i) a l for all a A ; and
± ∈
(ii) for all d D, if a d for all a A, then l d .
∈ ± ∈
It is immediate that another least upper bound m for A must
satisfy both l ± m and m ± l , so the least upper bound is
unique. We shall use HA to denote it. In particular, a complete
lattice always has a least element H∅, usually denoted ⊥ .
Definition. A function f : D E between two complete

lattices is called continuous if and only if f ( A) = f (A) for all
H H
directed subsets A of D, where f (A) is the usual image of the
subset under f . Being directed means that, for any two elements
a and b of A, there is a c ∈ A with a ± c
and b ± c.
It follows for continuous f that f ( D) = E. Also, such an f preserves

order, using the fact that, if x y, the x y := x, y = y . If f is
± H H{ }
bijective, we call it an isomorphism. Its inverse is automatically
continuous.
Definition. The set [D ~ E] is defined to consist of all

continuous maps from D to E.
Scott’s Theorem. Let D be any complete lattice. Then
there is a complete lattice D∞ which is isomorphic to [D∞ ~ D∞],

and into which D can be embedded as a sublattice.
For the theorem to be meaningful, the set [D∞ ~ D∞] is
made into a complete lattice as indicated in—the first exercise
below. In view of that exercise, this theorem is exactly what we
want, according to our elementary
axiomatic development in the last several pages.
For Scott’s theorem, we’ll proceed to outline two proofs in
the form of sequences of exercises. First will be a rather “hi-
fallutin’ ” proof, not much different than Lawvere’s suggestion in
[La-ed] p.179, but less demanding of sophistication about
categories on the reader’s part (as opposed to categorical
1
sophistication, which must mean sophistication about one and only
one thing,

1
up to isomorphism!).
Both proofs use the following exercise.
Ex. VII-8.1. (a) Verify the seven axioms for the category of
complete lattices and continuous maps, first verifying (and doing
part (b) below simul- taneously) that D E and [D ~ E] are
× —
complete lattices whenever D and E are, using their canonical
orderings as follows :

(d, e) ±D×E (d', e') ⇐⇒ d ±D d' and e ±E e' ;


f ± [ D −~ E ] g ⇐⇒ f (d) ±E g(d) for all d ∈ D .
The subscripts on the “ ±” will be omitted in future, as they are
clear from the context, and the same for the subscripts on the
“ H” below.
(b) Show also that the least upper bounds in these complete
lattices are given explicitly by
HD×E A = (HD p1(A), HE p2(A)) where the pi are the
projections, and (H [D −~ E ] A)(x) = HE {f (x)|f
∈ A} .
(c) Verify also that all the maps φn and ψn are continuous.
(d) Show that D∞ and Q ∞
i = 0 Di are complete lattices, and the
inclusion of the former into the latter is continuous.
(e) Show that all the maps θab : Da Db are continuous, including

the cases when some subscripts are , where the definitions are

as follows (using θ so as not to show any favouritism between φ
and ψ) :
When a = b, use the identity map.
When a < b < ∞, use the obvious composite
of φi’s. When b < a < ∞, use the obvious
composite of ψi’s. Let θ∞,b(d0, d1, d2, · · ·) := db
.
Let θa,∞(x) := (θa,0(x), θa,1(x), · · · , θa,a−1(x), x, θa,a+1(x), · · ·) .
Note that this last element is in D∞ . See also VII-8.6 .
Ex. VII-8.2. Show that D∞ satisfies the ‘arrow-theoretic’
definition of being ‘the’ inverse limit (in the category of complete
lattices and continuous
maps) of the system
1
2
0
D ←ψ — D D ···
0 1 1 2
←ψ — ←ψ —

1
with respect to the maps θ∞n : D∞ → Dn . That is, given a
complete lattice E and continuous maps ηn : E → Dn such that
ηn = ψn ◦ ηn+1 for all n, there is a unique continuous map η∞ : E
D∞ such that ηn = θ∞n η∞ for all n. → ◦
Ex. VII-8.3.(General category theory)
(a) Show quite generally from the arrow-theoretic definition
that, given an infinite commutative ladder
0
A ←α—
A ←α A ←α · · ·
0 1 2
—1 —2
ζ0 ↓ ζ1 ↓ ζ2 ↓
B0 ←—
β
B1 ←—
β B2 ←—
β ···

(i.e. each square commutes) in which both lines have an inverse


limit, there is a unique map
ζ∞ : lim(Ai, αi) —→ lim(Bi, βi)
←− ←−
for which β∞n ◦ζ∞ = ζn ◦α∞n for all n .
Here,
lim(Ai, αi) together with its maps α∞n : lim(Ai, αi) → An
←− ←−

is any choice of an inverse limit (in the arrow-theoretic sense) for


the top line in the display, and similarly for the bottom line.
(b) Furthermore, as long as ζn is an isomorphism for all
sufficiently large n, then the map ζ∞ is also an isomorphism (that
is, a map with a 2-sided inverse with respect to composition).
And so, inverse limits are unique up
to a unique isomorphism.
Crucial Remark. We have such a commutative ladder
ψ0 ψ1 ψ2 ψ3
D0 D1 D2 D3
← ←— ←— ←— ···
who cares? ↓ —
whatever ←— [D —~ =↓ =↓ D ] ←ψ—2 [D=↓
1 3
D ] ←ψ— [D —~ —~ D ] ←ψ— · · ·
0 0 1 1 2 2
Now, D∞, by VII-8.2, is the inverse limit of the top line, so, by
VII-8.3(b), to show that D∞ ∼= [D∞—~ D∞] , it remains only to
prove the following :

1
Ex. VII-8.4. Show directly from the arrow-theoretic

definition that [D∞—~ D∞] together with its maps η∞n : [D∞—~

D∞] → [Dn—~ Dn] ,


defined by η∞n(f ) = θ∞n ◦ f ◦ θn∞ , is the inverse limit of the lower line,
lim([Di—~ Di], ψi+1) , in the category of complete lattices and continuous
maps.

Remark. The last three exercises complete the job, but the
first and last may be a good challenge for most readers. They
are likely to involve much of the material in the following more
pedestrian approach to proving
Scott’s theorem (which is therefore not really a different proof),
and which does exhibit the isomorphisms D∞ ←−→— [D∞—~ D∞]
quite explicitly.
Ex. VII-8.5. (a) Show that (φ0 ◦ ψ0 )(f ) ± f for all f ∈ D1 .
(b) Deduce that, for all n, we have (φn ◦ ψn)(f ) ± f for all f ∈ Dn+1 .
The next exercise is best remembered as : “up, then down,
(or, right, then left) always gives the identity map” whereas :
“down, then up, (or left, then right) never gives a larger
element” We’re thinking of the objects as lined up in the usual
way :
D0 D1 D2 D3 D4 ······ D∞

Ex. VII-8.6. (a) Show that


(i) if b ≥ a, then θba ◦ θab = idDa ;
(ii) if a ≥ b, then (θba ◦ θab)(x) ± x for all x .
This, and the next part, include the cases when some subscripts are ∞ .
(b) More generally, deduce that, if b < a and b <

c, then (θbc ◦ θab)(x) ± θac(x) for all

x ∈ Da ,
whereas, for all other a, b and c, the maps θbc ◦ θab and θac are actually equal.
1 .
Starting now we shall have many instances of n dn. In every
case, the sequence of elements dn form a chain with respect to ,
and so a directed set, and thus the least ± upper bound does
exist by completeness of D. Checking the ‘chaininess’ will be
left to the reader.

1
Definitions. Define φ∞ : D∞ → [D∞—~ D∞] by
.
φ∞(x) := (θk,∞ ◦ (θ∞,k+1(x)) ◦ θ∞,k) .
Define ψ∞ : [D∞—~ D∞]k → D∞ by
.
ψ∞(f ) := θn+1,∞(θ∞,n ◦ f ◦ θn,∞) .
n

Ex. VII-8.7. Show that all the following are continuous :


(i) φ∞(x) : D∞ → D∞ for all x ∈ D∞ (so φ∞ is well-defined) ;
(ii) φ∞ ; and
(iii) ψ∞ .
Ex. VII-8.8. Prove that, for all n, k < ∞, and all f ∈ Dk+1, we have
.
θn+1,∞(θk,n ◦ f ◦ θn,k) = θk+1,∞(f ) ,
n
perhaps first checking that

θn+1,∞ ( ± θk+1,∞(f ) for n ≤ k


(θk,n ◦ f ◦ n,k ) ;
θ = θk+1,∞(f ) for n ≥ k .
(Oddly enough, this is not meaningful for n or k ‘equal’ to ∞ .)
Ex. VII-8.9. Show that
.
θ∞,k+1 ( θn+1,∞(zn+1)) = zk+1 ,
n
if {zr ∈ Dr}r>0 satisfies ψr(zr+1) = zr .
Ex. VII-8.10. Show that, for all x ∈ D∞ ,
.
(θk+1,∞ ◦ θ∞,k+1)(x) = x .
k
Ex. VII-8.11. Show that, for all f ∈ [D∞—~ D∞] ,
.
(θk,∞ ◦ θ∞,k ◦ f ◦ θk,∞ ◦ θ∞,k) = f .
k

1
Using these last four exercises, we can now complete the more
mundane of the two proofs of Scott’s theorem, by directly
calculating that φ∞ and ψ∞ are mutually inverse :

For all x ∈ D∞ ,
.
ψ∞(φ∞(x)) = ψ∞ ( (θk,∞ ◦ (θ∞,k+1(x)) ◦ θ∞,k)) (definition of φ∞)
. k
= ψ∞(θk,∞ ◦ (θ∞,k+1(x)) ◦ θ∞,k) (since ψ∞ is continuous)
. . k
= θn+1,∞(θ∞,n ◦ θk,∞ ◦ (θ∞,k+1(x)) ◦ θ∞,k ◦ θn,∞) (definition of ψ∞)
k n
. .
= θn+1,∞(θk,n ◦ (θ∞,k+1(x)) ◦ θn,k)(by VII — 8.6(b))
k n
.
= θk+1,∞(θ∞,k+1(x)) = x , as required, by VII — 8.8 and VII — 8.10.
k

For all f ∈ [D∞—~ D∞] ,


.
φ∞(ψ∞(f )) = (θk,∞ ◦ (θ∞,k+1(ψ∞(f ))) ◦ θ∞,k) (definition of φ∞)
. k .
= (θk,∞ ◦(θ∞,k+1( θn+1,∞(θ∞,n ◦f ◦θn,∞)))◦θ∞,k) (definition of ψ∞)
k n
.
= (θk,∞ ◦ (θ∞,k ◦ f ◦ θk,∞) ◦ θ∞,k) = f as required, by VII — 8.11.
The penultimate
k “=” uses VII-8.9 with zr+1 = θ∞,r f θr,∞
; and that result is applicable because, rather trivially,
◦ ◦
ψr(θ∞,r ◦ f ◦ θr,∞) = θ∞,r−1 ◦ f ◦ θr−1,∞ ,

employing the definition of ψr and VII-8.6 .

The following fundamental result of Tarski about fixed points in


complete lattices will be useful in the next subsection. Later we
relate this to the fixedpoint combinator Y of Curry, called the
paradoxical combinator by him.

1
Theorem VII-8.12. Let D be a complete lattice, and consider operators
Ω ∈ [D—~ D]. Define
.
fix : [D—~ D] → D by fix(Ω) := Ωn(⊥ ) .
D
n

Then fix(Ω) is the minimal fixed point of Ω, and fix is itself continuous.
In particular, any continuous operator on a complete lattice has at
least one fixed point.

Proof. That fix is well-defined should be checked. As with


all our .
other uses of n, the set of elements form a chain with
±
respect to , and so a directed set, and the least upper bound
does exist by completeness of D. To prove ‘chaininess’, we have

⊥D ± Ω(⊥D ) implies immediately that n Ω (⊥D ) ± Ω (⊥D ) .

As for the fixed point aspect, using continuity of Ω ,


. n . n+1
Ω(fix(Ω)) = Ω( D
)) = Ω ) = fix(Ω) ,
D
n n
Ω (⊥
as required, again using continuity of Ω .
As to the minimality, if Ω(d) = d, then d gives Ωn( )
⊥± n ⊥
Ω (d) = d
for all n, so
. . .
fix(Ω) = Ωn(⊥ ) ± Ωn(d) = d = d,
D
n n n

as required.
Finally, given a directed set O ⊂ [D—~ D], we have
. . . . .
fix( Ω) = ( Ω)n(⊥) = ( Ωn)(⊥)
Ω∈O n n Ω∈O
Ω∈O
. . . .
= (Ωn(⊥)) (Ωn (⊥)) = fix(Ω) ,
.
=
n Ω∈O Ω∈O n Ω∈O

1
as required, where justifications of the three middle equalities are
left to the reader. And so, fix is continuous.

1
VII-9 Two (not entirely typical) Examples of Denotational Semantics.
We shall write out in all detail what presumably ought to be
the de- notational semantics of ATEN (earlier used to define
computability), then illustrate it with a few examples. In the
second half, we re-write the maps ρ+ in the style of
“denotational semantics for the λ-calculus”, finishing with
several interesting theorems about ρ+ when the domain is D∞,
which also gives practice calculating in that domain.
Denotational semantics of ATEN.
Since machine readability and executability seem to be central
concerns here, the formulas will all be given very technically, and
we’ll even begin with a super-formal statement of the syntax
(though not exactly in the standard BNF-style).
Here it all is, in one largely human-unreadable page. See
the following remarks.

BRA := { ) , ( } , which merely says that we’ll have lots of brackets, despite some CSers’
abhorrence. v ∈ IDE := { x | ∗ || x || v∗ } which says, e.g. that the ‘real’ x3 is x∗∗∗, and x0 is
just x.
s, t ∈ EXP := { BRA | IDE | + | × | 0 | 1 || v | 0 | 1 || (s + t) | (s × t) } .
F, G ∈ EXP ′ := { BRA | EXP | < | ≈ | ¬ | ∧ || s < t | s ≈ t || ¬F | (F ∧ G) } .
C, D ∈ COM := { BRA | IDE | EXP | EXP ′ | ←: | ; | whdo || v ←: t || (C; D) | whdo(F )(C) } .

V AL “ = ” {⊥N} ∪ N ; BOOL “ = ” { ⊥BOOL , Tr , Fs } ; STATE = [IDE—~ V AL] .

E : EXP → [STATE—~ V AL] defined by


E[[v]](σ) := σ(v) ; E[[0]](σ) := 0N ; E[[1]](σ) := 1N ;
E[[(s + t)]](σ) := E[[s]](σ) +N E[[t]](σ) ; E[[(s × t)]](σ) := E[[s]](σ) ·N E[[t]](σ) .
(Note that ⊥N is produced when it is either of the inputs for +N or for ·N .)

1

E : EXP ′ → [STATE—~ BOOL] defined by

saying firstly that E [[F ]](σ) :=⊥BOOL if either F is an atomic formula involving a
term s with E[[s]](σ) :=⊥N , or if F is built using ¬ or ∧ using a formula G for
which E ′ [[G]](σ) :=⊥BOOL ; otherwise

E ′ [[s < t]](σ) := if E[[s]](σ) <N E[[t]](σ), then Tr, else Fs ;

E ′ [[s ≈ t]](σ) := if E[[s]](σ) = E[[t]](σ), then Tr, else Fs ;


E ′ [[¬F ]](σ) := if E ′ [[F ]](σ) = Fs, then Tr, else Fs ;
E ′ [[(F ∧ G)]](σ) := if E ′ [[F ]](σ) = Tr and E ′ [[G]](σ) = Tr, then Tr, else Fs .

C : COM → [STATE—~ STATE] defined by


C[[v ←: t ]](σ) := σ [v'→E[[t]](σ)]
where σ [v'→α]
agrees with σ, except v '→ α ;
C[[(C; D)]] := C[[D]] ◦ C[[C]] ;
C[[whdo(F )(C)]] := fix(f '→ (σ '→ (if E ′ [[F ]](σ) = Tr, then (f ◦ C[[C]])(σ), else σ ))) ,
where the fixpoint operator, fix : [D—~ D] → D, has D = [STATE—~ STATE] .

Remarks. The first five displays are the syntax. All but the
first give a set of strings in the usual three stages: first, all
{ ||
symbols to be used next, which| are the atomic strings
and finally,
} how the
‘production’ of new strings is done (induction on structure) .
To the far left are ‘typical’ member(s) of the string set to be
defined, those being then used in the production and also on
lines further down. [Where whole sets of strings
are listed on the left of the symbols-to-be-used listing and
other times as well, a human interpreter thinks of those
substrings as single symbols, when
they appear inside the strings being specified on that line. For
example, (x ∗ ∗ ∗ +x ∗ ∗) has ten symbols, but a human ignores the
brackets and thinks of three, i.e. x3 + x2—and maybe even as a
single symbol, since, stretching
it a bit, however complicated, a term occurring inside a
formula is intuited as a single symbol in a sense, as a
component of the formula. And similarly, a term or a formula
within a command is psychologically a single symbol.
Sometimes the phrase “immediate constituents” is used for this, in
explaining below the inductive nature of the semantic function
definitions.]
1
The last of the five lines is the syntax of ATEN itself. The
third and fourth lines give the syntax of the assertion language,
first terms, then (quantifier-free) formulas, in 1storder number
theory. We’ve used names

1
which should remind CSers of the word “expression”, but they
might better be called TRM (= EXP ) and FRM (= EXP ') from
much earlier stuff here. Often, CSers would lump EXP and EXP ′ together
into a single syntactic category.
Why they do so is not unmysterious to me; I am probably missing some considerable
subtlety. But I am well aware of the confusion that tendency in earlier CS courses causes
for average students when I teach them logic. They initially find it strange that I should
make a distinction between strings in a formal language which we intuit as standing for
objects, and other strings which we intuit as standing for statements! On the other hand,
I myself find it strange to think of (3 < 4) + ((5 = 2) + 6) as any kind of expression! (and
not because 5 = 2) Another inscrutability for me is that many of the technicalities on the
/
previous page would normally be written down in exactly the opposite order in a text on
programming languages! While I’m in the mood to confess a mentality completely out of
synch with the CSers, here’s another bafflement which some charitable CSer will hopefully
straighten me out on. Within programs in, say, PASCAL, one sees the begin—end and
use of indentation as a replacement for brackets, which makes things much more readable
(if any software could ever be described as (humanly) readable). But there seems to be
a great desire to avoid brackets, and so avoid non-ambiguity, in writing out the syntax of
languages. Instead, vague reference is made to parsers. I can appreciate a desire
sometimes to leave open different possibilities. A good example is the λ-calculus, where
using A(B) rather than (AB) in the basic syntactic setup is a possibility, and probably better
for psychological purposes, but not for economy. And the semantics is largely
independent of the parsing. But when precision is of utmost importance, I cannot
understand this tendency to leave things vague. Of course, there still remains a need to
specify algorithms for deciding whether a string (employing the basic symbols of the
language) is actually in the language. But surely that’s a separate issue.
The sixth display gives information about the semantic domains
needed. We haven’t said what V AL actually “is” (hence the
“=”), other than that it contains as elements and all the

natural numbers. Below we come clean on that. The use of [

~ ] is some indication that these semantic sets have some
structure, actually a partial order.
Finally, the three semantic functions, corresponding to terms,
formulae, and commands, are given. We have been super-strict in
distinguishing nota- tionally between the symbol “+” and the
actual operation “+N ” on natural numbers, and similarly for “ N
·
” and “<N ”. Some elementary confusions re- volve around this
point, in the presence of ambiguous notation, which point
otherwise
≈ might seem overly fussy. We have, in the same vein,
used our usual “ ” as the formal equality symbol, to distinguish
it from the actual relation
of sameness. Each of the three semantic functions is defined

1
by structural induction on the productions defined at the right-
hand ends of the corre- sponding syntactic sets. As mentioned in
defining S, the operations +N and

1
·N produce ⊥N if either or both of their inputs is ⊥N . And from
the defini- tion of ', the relation <N has no need to concern
S comparing N with anything.
itself with
As for , we’re just saying how, using ’s adjoint, a term
together S S
with a state will yield a number, exactly as in Tarski’s
definition of truth. In fact,
S[[t]](σ) is just another name for tv, where v is identified with that state σ mapping each xn to vn.

See [LM], beginning of Ch.6.


As for ', we’re just saying how, after taking its adjoint, a
S gether with a state will yield a truth value, exactly
formula to-
as in Tarski’s definition of truth. In fact,
Tr and Fs for S ′ [[F ]](σ) are just other ways of saying whether v ∈ F V or not.
See [LM] for F V . Alternatively, ν ∈ F V says “F is true at ν from V ”.
As for , we’re just saying how, after taking its adjoint, a
command Ctogether with a state will yield another state, exactly
as in our original defi- nition of the semantics of BTEN. In fact,
C[[C]](σ) is just another name for ||C||(v); i.e. C[[C]] is another name for ||C|| .
The right-hand sides in the definitions of ' and use an (if-then-
else) in the spirit of McSelf and should S be taken
C that way. (See
also the discussion just below the diagram in the next
paragraph.) In [Al] Ch.5, a program
in PASCAL is written out soon after specifying the denotational
semantics of his small language. Presumably LISP would be
just as good. Such a program is called an implementer. Another
one of my confusions is the question of why implementing a
simple language inside a very complicated practical language is
useful. (I can see where it would be fun.) Perhaps it is related to
the fact that the implementer is just a single program in that
complex language, a single program in which one can develop
considerable confidence. I am certainly not (yet?) the one to
write an implementation of ATEN in any real programming
language.
Note that for the final clause, giving the denotation of the whdo-
command, we need the discussion of fixed points from the end of
the previous subsec- tion. This specification of the whdo-
command comes from the remarks at the end of Subsection VII-
7, and is evidently far more ‘implementable’ than that in the
1
originally specified semantics of BTEN. If not for that one com-
mand, this whole exercise would presumably be regarded as rather
sterile. In- deed, as mentioned earlier, it’s when applied to much
more complicated (e.g.

1
ALGOL-like) languages that this impression of sterility dissipates.
The need for anything like a self-reflective domain as constructed
in the previous sub- section is unclear. But we do at least need

that D = [STATE ~ STATE] is a lattice for which Tarski’s theorem
on minimum fixpoint operators works. That follows as long as
STATE is a complete lattice, which itself follows as long as V AL
is. Therefore we complete the unfinished business on the
previous page by specifying

V AL := the flat lattice 0 1 2 and BOOL similarly.


3 ···
\\ //
\\ // ···
\\ //
⊥N
There are a couple of points worth adding, to shore up the
semantic definitions. Suppose we were dealing with BTEN
rather than ATEN, so we had to give the semantics of the
command ite(F )(C)(D), that is, the (if-then-else)-command. In
the style previous this would be

C[[ite(F )(C)(D)]](σ) := if S'[[F ]](σ) = Tr, then C[[C]](σ), else C[[D]](σ) .

To my complete bafflement, [Al], top of p.12 would object


strenuously to this, insisting that it must be

C[[ite(F )(C)(D)]](σ) := (if S'[[F ]](σ) = Tr, then C[[C]], else C[[D]])(σ) .

Once again, I am in need of a kind CSer to straighten me out


on a subtle point. Furthermore, the definition in [Go], p.51
(C3) , seems not to be consistent in terms of the domains
involved with the earlier definitions there, though that seems to
be fixable by extending the cond-function there in a certain
way. But perhaps again I need a tutorial. In any case here is
a re-written version of our definition more in that style, which
is similar to [St], p.196. (He has the added complication of
“side-effects” to deal with, but in their absence, his definition
reduces to exactly the following, at least up to

1
currying/uncurrying.) Further down we do the same for the
whdo- command, to which similar bafflements on my part apply
with respect to the versions in [Al] and [Go]. Re-define (without
really changing the definition)

C[[ite(F )(C)(D)]] := con ◦ (S'[[F ]] × C[[C]] × C[[D]]) ◦ ddg ,

1
where
ddg : STATE → STATE × STATE × STATE ; σ '→ (σ, σ, σ)
is the double diagonal map, and
con : BOOL × STATE × STATE → STATE
(Tr, σ, τ ) '→ σ ; (Fs, σ, τ ) '→ τ ; (⊥BOOL, σ, τ ) '→ ⊥STATE ;
is the conditional. (This has turned out a bit simpler than in the
above refer- ences, with no need to fool around with defining a *-
operator, partly because we are not insisting on currying
everything, with its attendant contortions.)
In any case, this isolates the facts that we definitely need ddg
and con to be continuous, which they are, but otherwise no
further comment is needed.
Now we can re-write the while-do semantic definition also in this style :

C[[whdo(F )(C)]] := fix [ f '→ con ◦ (S'[[F ]] × (f ◦ C[[C]]) × Id) ◦ ddg ] .

Some examples of the definition.


The first three examples below are utterly simple programs in
ATEN, to the point of being silly. And the purpose of
denotational semantics, though not crystal clear to me, is
certainly not to be able to write out particular examples. But the
ones below should serve to better familiarize us with how the
definition works, and, more importantly, to give some
confidence that the answers are coming out ‘right’.
(1) If C = whdo(x0 x0)(D) for some command D, we

obviously realize an infinite loop no matter what is in the bins
to start, i.e. for any initial state. Let’s figure out C[[C]]. Using
the definition, we get
fix(f '→ (σ '→ (f ◦ C[[D]])(σ))) = fix(f '→ f ◦ C[[D]]) ,
since certainly S'[[x0 ≈ x0]](σ) = Tr for all σ . Now define a
function f0 in [STATE—~ STATE] by f0(σ) =⊥ for all σ . It is
very clear that f0 ± f for all f ∈ [STATE—~ STATE]. Also f0 ◦ g
= f0 for any function g in [STATE ~ STATE] , since f0 is a
constant function. So f0 is the

1
required answer, that is, the minimal fixed point of the
operator which is ‘composition on the left with C[[D]]’. And
surely f0 should be the element

1
in the domain [STATE—~ STATE] which denotes a program that
always loops ! A better name for that function is ⊥[ST AT E−~ST ATE]
. The previous ⊥ is ⊥STATE = ⊥ [I D E −~V AL] , namely the function
which maps all ν ∈ IDE to
⊥V AL , which Dana Scott refers to as “the undefined”.
(2) If C = whdo(x0 < 1)(x0 : x0 + 1) , we see that any initial

state is unchanged by the command, except when bin 0 contains
zero. In the latter case, the zero in bin 0 is changed to 1, and
then the process terminates. First here are three preliminary
calculations:
S[[x0 + 1]](σ) = S[[x0]](σ) +N S[[1]](σ) = σ(x0) +N 1N .

C[[x0 ←: x0 + 1]](σ) = σ[x 0 '→ E[[x0 +1]](σ)]


= σ[x 0 '→ 1N +N σ(x0 )]
.
(
S'[[x0 Tr if σ(x0) <N 1N ;
< 1]] = Fs otherwise .
Let’s figure out C[[C]]. Using the definition, we get
fix(f '→ (σ '→ (if S'[[x0 < 1]](σ) = Tr, then (f ◦C[[x0 ←: x0 + 1]])(σ), else σ )))

= fix(f '→ (σ (
'→ f (σ[x0 '→ 1N ]
) if σ(x0) = 0 ;
N
σ if σ(x0) 0N .
Now let f1 be any fixed point of the latter
operator. So
( ;
'→ 1N ]
f1 (σ[x0 ) if σ(x0) =
0
N
f1(σ) =
σ if σ(x0) /= 0N .
But the lower line then determines the upper line, and we
conclude that there is only one fixed point in this case (no
minimization needed!), given by
(
f1(σ) = σ[x0 '→ 1N ]
if σ(x0) = 0N;
σ if σ(x0) /= 0N .
Well, this last function is exactly the one the command was
supposed to compute, i.e. ‘if necessary, change the 0 in bin
1
zero to a 1, and do nothing else’, so our definition is doing the
expected here as well.
(3) If C = whdo(x0 < 1)(x1 : x1) , we see that any initial state

is unchanged by the command, except when bin zero contains
0. In the latter case, the program does an infinite loop.

1
First here are the three preliminary ‘calculations’:
S[[x1]](σ) = σ(x1) ;

C[[x1 ←: x1]](σ) = σ[x 1 '→ E[[x1 ]](σ)]


= σ[x 1 '→ σ(x1 )]
=σ,
hardly surprising; and, as before

(
S'[[x0 Tr if σ(x0) <N 1N ;
< 1]] = Fs otherwise .
From the definition, C[[C]] is given by
fix(f '→ (σ '→ (if S'[[x0 < 1]](σ) = Tr, then (f ◦ C[[x1 ←: x1]])(σ), else σ )))
(
f (σ) if σ(x0) = 0N ;
= fix(f (σ'→
'→ σ if σ(x0) /= 0N .
Suppose that f2 is a fixed point of the latter operator. So
(
f2(σ) if σ(x0) = 0N ;
f (σ)
2 =
σ if σ(x0) .
The top line says nothing, and any such function is a fixed
point. Thus, clearly f3 is the minimal fixed point, where
(
⊥ if σ(x0) = 0N ;
f (σ)
3
=
σ if σ(x0) .
Once again, this last function is exactly the one the command was
supposed to compute, i.e. ‘loop if bin 0 has a 0, otherwise,
terminate after doing nothing’. So our definition is doing the
expected, giving at least a little reinforcement to our
confidence in the technicalities.

(4) Now we give a more extended (and possibly interesting)


example, us- ing the obvious non-recursive algorithm that
calculates the factorial function. In [Al], p.55, this is also an
example, with very abbreviated explanation, so the reader may
be interested to compare. A major contributor to the much
more detail here (besides kindness to the reader) is the fact
that ATEN is so primitive. The simple language in [Al] at
1
least has subtraction and negative integers, which ATEN
doesn’t. But the details below are probably

1
instructive, even in the direction of writing an general algorithm
to translate programs in that language into commands in
ATEN.
We would like to use the ‘command’
x0 ←: 1 ; whdo(0 < x1)(x0 ←: x1 × x0 ; “x1 ←: x1 — 1”) .
After approximately “ν1” cycles, this should terminate, leaving
the number “ν1!” in bin zero (where the natural number νi is the
initial content of bin number i, i.e. it is σ(xi) below). And below
we demonstrate that the semantic definitions prove this fact. The
quotation marks are there because x1 — 1 is
not a term. We take “x1 ←: x1 — 1” to be an abbreviation for
x2 ←: 0 ; whdo(¬x2 + 1 ≈ x1)(x2 ←: x2 + 1) ; x1 ←: x2 ,
omitting associativity brackets for “;”. So this produces that
predecessor function which is undefined on 0.
So we’ve got a fairly lengthy job, much of which will be left
as exercises for the reader.
First show that
C[[whdo(¬x
+1 x1)(x2 f (σ[x2'→σ(x2)+1]) if σ(x2) + 1 = σ(x1) ;
2 x2+1)]] fix(f '→ (σ
≈ ←: = σ if σ(x2) + 1 = σ(x1) .

Then argue (by induction on σ(x1) σ(x2) in the first case) that
the minimal fixed point here is—given by

σ '→
⊥ if σ(x2) ≥ σ(x1) .
σ[x2'→σ(x1)−1] if σ(x2) <
Now argue σ(x1) ;
that

C[[“x1
←: x1 — 1”]] = σ[x2'→σ(x1)−1][x1'→σ(x1)−1] if σ(x1) /= 0 ;
σ ⊥ if σ(x1) = 0 .

Next argue that, if E = (x0 ←: x1 × x0 ; “x1 ←: x1 — 1”) , then


C[[E
]] = σ[x0'→σ(x1)σ(x2)][x1'→σ(x1)−1][x2'→σ(x1)−1] if σ(x1) /= 0 ;
σ ⊥ if σ(x1) = 0 .

1
The next step is yet another application of the definition of the
semantic function on whdo-commands to yield

C[[whdo(0
)]] = fi( '→
)( E σ σ[x0'→σ(x1)σ(x2)][x1'→σ(x
σ
1)−1][x2'→σ(x1)−1] ififσ(x1)1) =0 ;
σ(x
< '→
(

1
A more subtle argument than the earlier analogues then yields
the minimal fixed point as

σ '→
σ[x0'→σ(x1)!σ(x0)][x1'→0][x2'→0] if σ(x1) = 0 ;
σ if σ(x1) = 0 .

Finally , we can easily see


that
σ[x0'→σ(x1)!][x1'→0][x2'→0] if σ(x1) = 0 ;
C[[x0 ←: 1 ; whdo(0 < x1 )(E)]] = σ σ[x0'→1] if σ(x1) = 0 .
'→

This is the required result—our original command computes a


function which puts “σ(x1)!” into bin zero (to mix up the two
ways of thinking about states), and which does irrelevant
things to bins 1 and 2.

1
Denotational semantics of Λ
This consists of super-formal statements of the syntax from
the very be- ginning, about 100 pages back, and of the
specifications in the definition of λ-model. See the first half of this
subsection for an analogue and an explana- tion of the syntax
specification in the first three lines below. Here it is, again largely
human-unreadable, but perhaps more mechanically translatable
for writing an implementation. See Ch. 8 of [St] for much more
on this.
BRA := { ) , ( } .

ν ∈ IDE := { x | ∗ || x || ν ∗ } .
A, B ∈ Λ = EXP := { BRA | IDE | λ | • || ν || (AB) | (λν • A) } .
V AL = any λ—model
. ENV = [IDE → V
AL] .
For ρ ∈ ENV , ν ∈ IDE, and d ∈ V AL, define ρ[v'→d]∈ ENV to agree with
ρ except it maps the variable (‘identifier’) ν to d .

S : EXP → [ENV → V AL] is defined by


S[[ν]](ρ) := ρ(ν) ;
S[[(AB)]](ρ) := S[[A]](ρ) · S[[B]](ρ) ;
S[[(λν • A)]](ρ) := ψ(d '→ S [[A]](ρ [v'→d]
) .
Note that the last three lines are just the three basic properties, in
a messier notation, describing how what we called ρ+ behaves
in a λ-model. Note also that the last line is only meaningful in this
structural inductive definition with an accompanying inductive
proof that the map to which we are applying ψ is in fact in [V

AL ~ V AL] .
The other two fundamental proofs wanted here are of the facts that

(∗) A ≈ B =⇒ S[[A]] = S[[B]] ; i.e. ρ+(A) = ρ+(B) ∀ρ ;


and that

1
(B)]
(∗∗) S[[A[x→B]]](ρ) = S[[A]](ρ[x'→E[[B]](ρ)]) ; i.e. ρ+(A[x→B]) = ρ[x'→ρ+ (A) .
+

1
None of these proofs requires any great cleverness.
Later we discuss how, after passing to equivalence classes,
and when Scott’s D∞ is V AL, the map on EXP/ coming from ρ+
is not injective, but that it becomes≈ so when restricted to the
set of equivalence classes of
closed terms which have a normal form; i.e. closed normal
form terms all map under this denotational semantics to
different objects in the λ-model, this being independent of ρ.
We’ll now go back to the handier ρ+ notation.
Proving the theorem of David Park below gives an
opportunity for illus- trations of this “denotational semantics of
the λ-calculus” (and especially for calculations to increase one’s
familiarity with the innards of Scott’s D∞).
Recall that Curry’s fixpoint (or “paradoxical”) combinator Y is a closed
term, so it is unambiguous to define
Y := ρ+(Y ) ∈ D∞ ,
i.e. it is independent of the particular ρ used.

Theorem VII-9.1. Under the canonical isomorphism

[[D∞—~ D∞]—~ D∞] =


←∼ → D∞ ,
the Tarski fixpoint operator, fix (from VII-8.12), corresponds
to Y , the image of Curry’s fixpoint combinator. Three equivalent
expressions of this are
φ∞(Y ) = fix ◦ φ∞ ; Y = ψ∞(fix ◦ φ∞) ; fix = φ∞(Y ) ◦ ψ∞ .

We shall give the proof as a sequence of exercises, ending with


establishing the first of those three. The fact that the three are
equivalent is staggeringly
trivial, via ψ∞ = φ−1 . The only two proofs of the theorem that I have

seen are the scanned image of Park’s succinct typescript from 1976-78, and
the proof in [Wa] (which we give, with proofs of basic identities he uses).
This proof depends on the explicit φ0 and ψ0 (but not any
explicit D0) from Scott’s building of D∞ from D0, as exposited

1
here in VII-8. Park sketches how a different choice of (φ0, ψ0)
gives a different answer. So this the- orem definitely does not
generalize as is, to an arbitrary extensional λ-model

1
with compatible complete lattice structure. Later we use
Park’s theorem a couple of times to give other explicit results
about the denotational seman- tics of Λ, including another
fixpoint operator, though not Turing’s fixpoint combinator
(which the reader might like to think about).

First here are some useful exercises and notation.

Definition. Abbreviate θn,∞ ◦ θ∞,n to just πn : D∞ → D∞ .


Then πn is a projection operator (i.e. πn ◦ πn = πn), whose
image is a sublattice of D∞ which is isomorphic to Dn, and often
identified with Dn .

Much of the literature makes the notation even more


succinct, denoting πnx as xn . This can be very efficient, but a bit
confusing for us beginners. Firstly, it’s rather more natural in
mathematics to use xn as the nth compo-
nent of x, that is, to write x as (x1, x2, x3, · · ·) . So this latter xn
would be in Dn, not in D∞ . To avoid conflict with the literature,
we won’t use xn at all. Another source of confusion with this
subscripting is that an expression
such as yn+1(xn) sits in one’s consciousness more naturally as an
element of Dn+1 = [Dn—~ Dn] acting on an element of Dn to
produce an element of Dn . But in fact it must (and does in the
literature) mean what in the above defined notation is φ∞(πn+1y)
(πnx) . (The two confusable meanings are of course closely
related—see for example Ex. 8 below.) This last expression is
becoming unwieldy, and we’ll at least be able to simplify it to
πn+1y · πnx .
Definition. Let “·” be the binary operation on D∞ adjoint to φ∞ ; that
is,
a · b := φ∞(a)(b) .
This is the binary operation from earlier in the general theory of λ-models.

(Perhaps annoying the experts) for a bit more clarity we


continue with juxtaposition for the operation in Λ, but use “ ”
·
in D∞ . Thus, a basic property of the maps from denotational
semantics is
ρ+(A B) = ρ+(A) · ρ+(B) for all A and B in Λ .

1
Readers who wish to go further with this and consult the literature
will need to accustom themselves to two ‘identifications’ which
we are avoiding for conceptual clarity (but giving ourselves
messier notation) :

1
(i) identifying Λ with its image ρ+(Λ) ⊂ D∞, which is ambiguous on
non-closed terms in Λ , and also ambiguous in another sense ‘up to ≈’ ;
(ii) identifying Dn with its image πn(D∞) ⊂ D∞, that is, the
image of the map θn,∞ .

General Exercises on D∞.


1. Show, for n ≤ k and all a ∈ D∞, that πna ± πka ± a .
2. Prove, for all a ∈ D∞, that a =
. n πna .

3. Using the definition of φk for k > 0, show inductively on k


that, for all c ∈ D0,
θk,∞ ◦ θ0,k+1(c) ◦ θ∞,k = θ0,∞ ◦ φ0(c) ◦ θ∞,0 : D∞ → D∞ .
4. Deduce from 3 and the explicit definition of φ∞ as the lub
of a non- decreasing sequence that, for all a, b in D∞ ,
(π0a) · b = θ0,∞(φ0(θ∞,0(a))(θ∞,0(b))) .
5. Deduce from 4 and Scott’s version of φ0 that (π0a) · b = π0a .
6. Deduce from 5 and from π0 ⊥ = ⊥ that ⊥ · ⊥ = ⊥ .
7. As in 3, show inductively on k ≥ n that, for all a ∈ D∞ ,
θk,∞ ◦ θ∞,k+1(πn+1a) ◦ θ∞,k = θn,∞ ◦ θ∞,n+1(a) ◦ θ∞,n : D∞ → D∞ .

8. Deduce from 7 and the explicit definition of φ∞ as the lub


of a non- decreasing sequence that, for all a, b in D∞ ,
(πn+1a) · b = θn,∞(θ∞,n+1(a)(θ∞,n(b))) .
(And so (πn+1a) · b ∈ πnD∞ = “Dn” .)
9. Combining 8 with θ∞,n(πnb) = θ∞,n(b) , deduce that
(πn+1a) · b = (πn+1a) · (πnb) .

10. Using 9 and 1 , deduce that, for all k ≥ n,


(πn+1a) · b = (πn+1a) · (πkb) .

1
11. Using the definition of φ∞, and a · ⊥= φ∞(a)(⊥) , show that, for all
a ∈ D∞ , we have
π0a ± π0(a · ⊥) .
(Actually, equality holds here, but is unneeded below.)

Now let’s come back to the situation of Park’s theorem.


Again to be perhaps overly fussy, we shall use x as the name
for a ‘general’ element of D∞. And also (to keep Λ and D∞
distinct) underline all the elements in Λ, including variables. We
fix a ‘general’ variable x , and always use a (so-called
environment) ρ which maps x to x. Fix another variable∈ y Λ,
and define
Curry’s combinator explicitly as

Y := λx • (λy • x(y y))(λy • x(y y)) = λx • X X ,


where
X := λy • x(y y) .
Define elements of D∞
by
Y := ρ+(Y ) [independent of ρ] ,

X := ρ+(X) [depending only on ρ(x) = x] .

Remaining Exercises to prove Park’s theorem.


12. Show X z = x(z z) for any variable z in Λ .
13. Deduce that X · z = x · (z · z) for any z ∈ D∞ .
14. Show Y x = X X in Λ .
15. Deduce that Y · x = X · X
[for any x ∈ D∞ , but note that X ‘depends on x’].
16. Deduce from π0a ± a for all a ∈ D∞, combining 13, 6 and 11, that
π0X ± x· ⊥ .

17. Deduce from 13, 5 and 16 that

X · π0X = x · (π0X · π0X) ± x · (x· ⊥) .


1
18. Using 17 as the initial case, and noting from 10 that
πn+1X · πn+1X = πn+1X · πnX ,
show by induction on n that
X · πnX = x · (πnX · πnX) ± φ∞(x)n+2(⊥) .

19. Using the definitions of “·” and of fix, deduce from 2, 15 and 18 that
φ∞(Y )(x) ± fix(φ∞(x)) .

20. Recalling that Y is a fixpoint combinator, show that φ∞(Y )


(x) is a fixed point of φ∞(x) .
21. Deduce from 20 and the basic property of fix (in VII-8.12) that
fix(φ∞(x)) ± φ∞(Y )(x) .

22. Combine 19 and 21 to get Park’s theorem :


φ∞(Y ) = fix ◦ φ∞ .

Quick proofs of the crucial later exercises.

16. π0X ± π0(X· ⊥) ± X· ⊥ = x · (⊥ · ⊥) = x· ⊥ .


17. X · π0X = x · (π0X · π0X) = x · π0X ± x · (x· ⊥) .
18. The equality is 13, and, by the hint, the left-hand
side for the inductive step in the inequality is
x · (πn+1X · πnX) ± x · (X · πnX) ± x · (φ∞(x)n+2(⊥)) = φ∞(x)n+3(⊥) .

19.
.
φ∞(Y )(x) = Y · x = X · X = X · πn X
n
. . .
= X·πnX ± φ∞(x) (⊥) =
n+2
φ∞(x)n(⊥) =
fix(φ∞(x)) .
n n n

1
20.
φ∞(x)(φ∞(Y )(x)) = x·(Y ·x) = ρ+(x(Y x)) = ρ+(Y x) = Y ·x = φ∞(Y )(x) .

11.
.
π0(a · ⊥) = θ0∞(θ∞0(φ∞(a)(⊥))) = θ0∞θ∞0 (θk∞ ◦ θ∞,k+1(a) ◦ θ∞k)(⊥∞)
. k
= θ0∞(θk0(θ∞,k+1(a)(⊥k))) ± θ0∞(θ00(θ∞,1(a)(⊥0)))
k
= θ0∞(ψ0(θ∞,1(a))) = θ0∞(θ∞,0(a)) = π0a .
The step after the inequality uses the definition of ψ0 . To
strengthen the inequality to equality (which is needed further
down), one shows that all
terms in the lub just before the “ ” agree with π0a, by a slightly
±
longer argument. For example, with k = 1, we get down to
θ0∞(θ10(θ∞,2(a)(⊥1))) = θ0∞(ψ0(θ∞,2(a)(φ0(⊥0)))) = θ0∞((ψ0◦θ∞,2(a)◦φ0)(⊥0))
= θ0∞(ψ1(θ∞,2(a))(⊥0)) = θ0∞(ψ0(ψ1(θ∞,2(a)))) = θ0∞(θ∞,0(a)) = π0a .

Our early example of a Λ-term with no normal form, namely


(λx • xx)(λx • xx) ,
is another interesting case for the denotational semantics of Λ using Scott’s
D∞ . The answer is actually that
ρ+((λx • xx)(λx • xx)) = ⊥ ,
a theorem of Scott, derived below using Park’s theorem. (This
tries to say that the term carries ‘no information at all’.) The
reader is challenged to use the definitions directly to show this,
and see how much is involved. One
approach is to attempt to show, for all d ∈ D∞ , that
ρ+((λx • xx)(λx • xx)) ± d .
The trick we use is the following : Let
I := ρ+(I) := ρ+(λx • x) .

1
Using the evident fact that I A ≈ A for all A ∈ Λ, it is clear
that I · d = d for all d ∈ D∞ ; that is, every element of D∞ is a
fixed point of φ∞(I), which is the identity map of D∞ . In
particular, the minimum fixed point of φ∞(I) is certainly ⊥ .
And so, from Park, we see that
ρ+(Y I) = ⊥ .

Thus it suffices to prove that

Y I ≈ (λx • xx)(λx • xx) .


Letting C := λy x(yy), we have Y = λx CC, and, for any closed

A, we immediately (β)-reduce Y A to• CACA , where CA := C[x→A]
= λy A(yy). But •
CI = λy • I(yy) ≈ λy • yy ,
so
Y I ≈ CICI ≈ (λy • yy)(λy • yy)
,
as
required.

Exercise. Show that ρ+((λx • xxx)(λx • xxx)) = ⊥ .

As mentioned earlier, by (∗) from about 7 pages back , we can only expect
ρ+ to be possibly injective after passing to equivalence classes under . And

it certainly won’t be on non-closed terms in general unless ρ itself
is injective. But for closed terms without normal forms, it still
isn’t injective in general, as we see further down. However, we
do have the following ‘faithfulness of the semantics’ for terms
with normal forms.
Theorem VII-9.2. If E and F are distinct normal form terms in
Λ (i.e. they are not related by a change of bound variables), and
continuing with
Scott’s D∞ as our domain, for some ρ, we have ρ+(E) /= ρ+(F ) . In
partic- ular, the restriction of ρ+ to closed normal form terms
(which restriction is
independent of ρ) is injective.
This will be almost immediate from another quite deep

1
syntactical result whose proof will be omitted:

1
B¨ohm’s Theorem VII-9.3. (See [Cu], p.156) If E and F
are as in the previous theorem, and y1, · · · , yt are the free variables
in EF, then there are terms G1, , Gt, and H1, , Hk for some k, such
that for some· · distinct
· ·
variables u and ν
→ →
E [ →y→ G ]H1 · · · Hk uν ≈ u and F [ →y→ G ]1
H ···H
k
uν ≈ ν .

To deduce the earlier theorem, proceed by contradiction,


and use the property labelled (∗∗) at the beginning of this
section to see easily that
[→y→G ] → →
[ →y→ G ]
∀ρ , ρ (E)
+
= ρ (F+) =⇒ ∀ρ , ρ (E + ) = ρ (F
+
).
But then the left-hand sides in B¨ohm’s theorem map to the same
thing under all the maps ρ+ , contradicting the fact that ρ+(u) =
/
ρ+(ν) for some ρ (since ρ can take any values we want on
variables).

To illustrate the ‘serious’ non-injectiveness of the semantics


related to terms without normal forms, here is a striking theorem
of Wadsworth [Wa].
Theorem VII-9.4. Let M be any closed Λ-term with a
normal form. Then there is a closed term N with no normal
form (so, of course, M /≈ N ) such that ρ+(M ) = ρ+(N ) .
Proof. Firstly, let us establish that it suffices to prove this
for the single example M = I := λx • x , where ρ+(I) = I
corresponds to the identity map D∞ → D∞ under φ∞ . So
assume (as we shall prove below) that ρ+(J) = ρ+(I) for some
closed term J with no normal form. Given M as in the theorem,
let M ' be its normal form. Find the variable occurrence furthest to
the right in the string M ' (and say x is that variable). Now
replace that single occurrence x by (I x) and call the result M '' .
Similarly replace it by (J x) and call the result N . Since M ' is
normal, the sequence of leftmost reductions for N is simply the
same (infinite) sequence for J applied to that
subterm. So N has no normal form. Also
ρ+(M ) = ρ+(M ') = ρ+(M '') ,
since M ≈ M ' ≈ M '' , the latter because I x ≈ x . But also

1
ρ+(M '') = ρ+(N ) ,

1
completing this part of the proof. To see this last equality, note
that, in the notation way back in VII-1.1, it is easy to see that

if ρ+(A) = ρ+(B) , then ρ+(SAT ) = ρ+(SBT ) .

So take A and B to be I and J , respectively, in SAT = M '' and SBT


= N . Thus it remains to establish the particular case M = I .
Here we’ll give
J = N quite explicitly. Let

F := λfxy • x(fy) , and take J := Y F ,


where Y is again Curry’s fixpoint combinator. We shall show quite
generally in the paragraphs after next that, for any closed A ∈ Λ
,
F A ≈ A =⇒ ρ+(A) = ρ+(I) (∗)
(It is very easy, but irrelevant, to see that F I ≈ I .) However,
the fact that F J ≈ J is immediate from the fixpoint property of
Curry’s combinator, so the above will give us ρ+(J) = ρ+(I) ,
the main conclusion of the theorem.
First let’s prove the other required property of J , namely
that it has no normal form. With Y := λx • CxCx, where Cx := λy
• x(yy) and

CA := = λy • A(yy) ,
x
C[x→A]
the leftmost reduction of J = Y F goes as follows :

Y F ≥ CF CF = (λy • F (yy))CF ≥ F (CF CF )


≥ λxy • x((CF CF )y) ≥ · · · · · ·
≥ λxy • x((λxy • x((CF CF )y))y) ≥ · · · · · ·
≥ λxy • x((λxy • x((λxy • x((CF CF )y))y))y) ≥ — — — — — — .
This sequence does not terminate, as required.
The subtlest part comes now, in proving ( ) above, by

manipulations due to Wadsworth which seem fiendishly clever
to me! So assume that A is any closed term with F A ≈ A ,
and we shall show that A = I, where A := ρ+(A) . For all B and
C in Λ, we have

1
A B C ≈ F A B C = (λfxy • x(fy))A B C ≈ B (A C)

1
Applying ρ+ , we get, for all B and C in D∞ ,
A · B · C = B · (A · C) (∗∗)
To show that A = I, we’ll now use only (∗∗). It suffices to show that
πnA = πnI for all n, by induction on n, involving the projections πn intro-
duced back in the discussion of Park’s theorem. Here is a
repeat of three of the exercises from back there, plus a list of
four new general exercises for the reader to work on. But if
needed, see after the end below of the present theorem’s proof
for some hints which make their proofs completely mechanical.

5. (π0a) · b = π0a .
9. (πn+1a) · b = (πn+1a) · (πnb) .
11. π0a = π0(a · ⊥) .
23. πn ⊥ = ⊥ .
24. ⊥·x= ⊥ .
25. I·C = C .
26. (πn+1a) · b = πn(a · πnb) .
In both initial cases and also in the inductive case, we’ll use
the exten- sionality of D∞ twice.
The initial case n = 0 :
Using 5, then 5, then 11, for all B, x and y in D∞ ,
π0B · x · y = π0B · y = π0B = π0(B· ⊥) .
Thus, with B = I, using the above, then 25,
π0I · x · y = π0(I· ⊥) = π0 ⊥ .
With B = A, using the above, then 11, then (∗∗), then 24,
π0A · x · y = π0 (A· ⊥) = π0 (A· ⊥ · ⊥) = π0(⊥ ·(A· ⊥)) = π0 ⊥ .
So π0A = π0I, by extensionality.

1
The initial case n = 1 :
Using 26, then 25, then idempotency of π0, then 5,

π1I · x · y = π0(I · π0x) · y = π0(π0x) · y = π0x · y = π0x .


Using 26, then 5, then 11, then (∗∗), then 5, then
idempotency of π0, π1A · x · y = π0(A · π0x) · y = π0(A · π0x)
= π0(A · π0 x· ⊥) = π0(π0x · (A· ⊥)) = π0(π0x) = π0x .
So π1A = π1I, by extensionality.
The inductive case :
Assume inductively that πn+1A = πn+1I. Using 26, then 25,
then idem- potency of πn+1, then 9,

πn+2I ·x·y = πn+1(I ·πn+1x)·y = πn+1(πn+1x)·y = πn+1x·y = πn+1x·πny .

Now using 26, then the inductive assumption, then 26, then 25,
then idem- potency of πn,

πn(A · πny) = πn+1A · y = πn+1I · y = πn(I · πny) = πn(πny) = πny .

So, using 26, then 26, then (∗∗), then 9, then 26, then idempotency of
πn+1, then the display just above,

πn+2A · x · y = πn+1(A · πn+1x) · y = πn(A · πn+1x · πny)


= πn(πn+1x · (A · πny)) = πn(πn+1x · πn(A · πny))
= πn+1(πn+1x) · πn(A · πny) = πn+1x · πn(A · πny) = πn+1x · πny .
So, once again by extensionality, πn+2A =
πn+2I . Thus VII-9.4 is now finally
proved.

As for the new exercises used above, if you haven’t done


them already, here are statements which reduce them to
mechanical checks :

1
As for 23, this just amounts to the fact that

⊥ := ⊥D∞ = (⊥0, ⊥1, ⊥2, · · ·) ,

since the right-hand side is clearly the smallest element in D∞ .


As for 24, the argument is the same as
for 6. As for 25, this is immediate
from I C ≈ C .
27.Using the definition of the maps ψj, show by induction on
k ≥ n that for all a ∈ D∞ and y ∈ Dn , we have
θkn(θ∞,k+1(a)(θnk(y))) = θ∞,n+1(a)(y) .
As for 26, use 27 with y = θ∞n(b) in the last step just below to see that
.
πn (a·πn b) = πn(φ∞(a)(πnb)) = θn∞ ◦θ∞n( θk∞(θ∞,k+1(a)(θ∞,k◦θn∞◦θ∞n(b))))
. k
= θn∞( θkn(θ∞,k+1(a)(θnk ◦ θ∞n(b)))) = θn∞(θ∞,n+1(a)(θ∞n(b))) .
k
But the latter is (πn+1a) · b by 8 .

Quite a bit easier is producing an example of two distinct


elements of Λ/ ≈ , both closed, and neither with normal form,
which give the same element of D∞ under ρ+ . Such elements
are the classes of Y and Y ' = Y G, where Y is again Curry’s
fixpoint combinator, and

G := λuν • ν(uν) .

The argument is quite elegant :


Firstly, it’s becoming slightly embarrassing how many
syntactic results we are leaving the reader to look up, but here
is the final one:
Proposition VII-9.5. Y ' / Y and neither has a normal form.
See B¨ohm p. 179 in [St-ed].
To show that ρ+Y ' = ρ+Y , first we establish a nice result of
inde- pendent interest.

1
Proposition VII-9.6.
∀Z ∈ Λ , [ G Z ≈ Z ⇐⇒ ∀A ∈ Λ , A (Z A) ≈ Z A ] .
That is, Z is a fixed point of G if and only if it is a fixpoint operator.
Proof. First note that
G Z = (λuν • ν(uν))Z ≈ λν • ν(Zν)

(∗) As for =⇒: Using the assumption, then (∗), then β-


reduction,
Z A ≈ G Z A ≈ (λν • ν(Zν))A ≈ A (Z A) .
As for ⇐=: Using (∗), then the assumption with A = ν, then η-
reduction, G Z ≈ λν • ν(Zν) ≈ λν • Zν ≈ Z .
Now abbreviate ρ+Y ' and ρ+Y to Y ' and Y , respectively.
Proposition VII-9.7. Y ' is a fixpoint operator ; and so
A · (Y ' · A) = Y ' · A for all A ∈ D∞ .

Proof. Use 9.6= , after calculating using the fact that Y is a



fixpoint operator :
G Y ' = G(Y G) ≈ Y G = Y ' .

Corollary of Park’s Theorem VII-9.8.


Y·A ± Y'·A for all A ∈ D∞ .

Proof. Both Y · A and Y ' · A are fixed by A · . But Park says that Y · A
is the smallest of all elements in D∞ fixed by A · .
Corollary VII-9.9 (of ⇐= in 9.6).
GY ≈ Y and so G · Y = Y

Thus Y is a fixed point of G · . And so Y · G ± Y , i.e. Y ' ± Y , since


Y · G is the smallest fixed point of G · . Thus we get

1
Corollary VII-9.10

Y'·A ± Y ·A for all A ∈ D∞ .

And now finally


Corollary VII-9.11 (of 9.10 and 9.8)

Y'·A = Y ·A for all A ∈ D∞ .

Thus, by extensionality, we have what we want, namely

Y' = Y that is, ρ+Y ' = ρ+Y .

We should finish with what seems to be the canonical example


used when- ever anything related to ‘recursive computing’
needs a simple illustration, namely, the factorial function. And
then again, maybe we shouldn’t, though just seeing how
complicated or otherwise a Λ-term is needed to programme the
factorial function is a bit interesting!

1
References
[Al] Allison, Lloyd A Practical introduction to denotational semantics.
Cambridge U. Press, Cambridge, 1989.
[Ba] Barendregt, H. P. The Lambda Calculus : its syntax and semantics.
North-Holland, Amsterdam, 1984.
[BKK-ed] Barwise, J., Keisler, H.J. and Kunen, K. The Kleene
Sympo- sium. North-Holland, Amsterdam, 1980.
[ B¨o - ed] B¨ohm, C. λ-calculus and computer science theory : Proceedings.
LNCS # 37, Springer-Verlag, Berlin, 1975.
[Br-ed] Braffort, Paul. Computer Programming and Formal Systems.
North-Holland, Amsterdam, 1963.
[Ch] Church, Alonzo The Calculi of Lambda-Conversion. Princeton
U. Press, Princeton, 1941.
[CM] Hoffman, P. Computability for the Mathematical. this
website, 2005.
[Cu] Curry, H. B., Hindley, J. R. and Seldin, J. P. Combinatory Logic, Vol.
II. North-Holland, Amsterdam, 1972.
[En] Engeler, Erwin, et al. The Combinatory Programme.
Birkh¨auser, Basel, 1995.
[Fi] Fitch, F. B. Elements of Combinatory Logic. Yale U. Press,
New Haven, 1974.
[Go] Gordon, M. J. C. Denotational Description of Programming
Lan- guages. Springer-Verlag, Berlin, 1979.
[Go1] Gordon, M. J. C. Programming Language Theory and its
Imple- mentation. Prentice Hall, New York, 1988.
[Hi] Hindley, J. R. Standard and Normal Reductions. Trans.AMS, 1978.
[HS] Hindley, J. R. and Seldin, J. P. Introduction to Combinators
and the λ-calculus. London Mathematical Society Student Texts #
1, Cambridge U. Press, Cambridge, 1986.

1
[HS-ed] Hindley, J. R. and Seldin, J. P. To H. B. Curry : Essays on
Com- binatory Logic, lambda-calculus, and formalism. Academic Press,
London, 1980.
[Kl] Klop, J. W. Combinatory Reduction Systems. Mathematisch
Cen- trum, Amsterdam, 1980.
[Ko] Koymans, C. P. J. Models of the lambda calculus. CWI Tract,
Ams- terdam, 1984.
[La-ed] Lawvere, F. W. Toposes, Algebraic Geometry and Logic.
LNM # 274, Springer-Verlag, Berlin, 1972.
[LS] Lambek, J. and Scott, P. J. Introduction to higher order
categorical logic. Cambridge U. Press, Cambridge, 1986.
[LM] Hoffman, P. Logic for the Mathematical. this website, 2003.
[Pe] Penrose, R. The Emperor’s New Mind. Oxford U. Press,
Oxford, 1989.
[Ru-ed] Rustin, Randall Formal Semantics of Programming Languages.
Prentice-Hall, N.J., 1972.
[SAJM-ed] Suppes, P.—Henkin, L.—Athanase, J.—Moisil, GR. C. Logic,
Methodology and Philosophy of Science IV. North-Holland, Amsterdam,
1973.
[St-ed] Steel, T.B. Formal Language Description Languages for
Computer Programming. North-Holland, Amsterdam, 1966.
[ S ¨o ] Stenlund, S¨oren Combinators, λ-terms, and Proof Theory.
Reidel Pub. Co., Dordrecht, 1972.
[St] Stoy, Joseph Denotational Semantics : the Scott-Strachey
approach to programming language theory. MIT Press, Cambridge,
Mass., 1977.
[Wa] Wadsworth, Christopher P. The Relation between
Computational and Denotational Properties for Scott’s D∞-Models of the
Lambda-calculus. SIAM J. Comput. 5(3) 1976, 488-521.

You might also like