The Calculus Lambda Calculus
The Calculus Lambda Calculus
Peter Hoffman
This is quite a rich subject, but rather under-appreciated in
many mathe- matical circles, apparently. It has ‘ancient’
connections to the foundations of both mathematics and also of
computability, and more recent connections to functional
programming languages and denotational semantics in computer
science. This write-up is largely independent of the lengthier
Computability for the Mathematical—[CM], within which it will
also appear.
Do not misconstrue it as negative criticism, but I find some
of the litera- ture on the λ-calculus to be simultaneously quite
stimulating and somewhat hard to read. Reasons for the latter
might be:
(1)neuron overload caused by encyclopaediae of closely
related, but subtly different, definitions (needed for deeper
knowledge of the subject perhaps, or maybe a result of the experts
having difficulty agreeing on which concepts are central and
which peripheral—or of me reading only out-of-date literature!);
(2) authors whose styles might be affected by a formalist
philosophy of mathematics (‘combinatory logicians’), while I
am often trying to ‘picture actual existing abstract objects’, in
my state of Platonist original sin; and
(3) writing for readers who are already thoroughly imbued
(and so, fa- miliar with the jargon and the various ‘goes without
saying’s) either as profes- sional universal algebraists/model theorists
or as graduate students/professionals in theoretical computer
science.
We’ll continue to write for an audience assumed to be fairly
talented up- per year undergraduates specializing in
mathematics. And so we expect a typical (perhaps unexamined)
Platonist attitude, and comfortable knowledge of functions,
equivalence relations, abstract symbols, etc., as well as, for ex-
ample, using the ‘=’ symbol between the names of objects only
when wishing to assert that they are (names for) the same
object.
1
The text [HS] is one of the definite exceptions to the ‘hard for
me to read’ comment above. It is very clear everywhere. But it
requires more familiarity with logic and model theory (at least
the notation) than we need below. It uses model theoretic
language in doing the material we cover, and tends to be more
encyclopaedic, though nothing like [Ba]. So the following 111
or so pages will hopefully serve a useful purpose, including re-
expressing things such as combinatorial completeness and the
‘equivalence’ between combina- tors and λ-calculus in an
alternative, more familiar language for some of us
2
beginners. Also it’s nice not to need to be continually
reminding readers that “=” is used for identicalness of objects,
with the major exception of the objects most discussed (the
terms in the λ-calculus), where a different symbol must be
used, because “=” has been appropriated to denote every
equivalence relation in sight! At risk of boring the experts who
might stumble upon this paper, it will include all but the most
straightforward of details, and plenty of the latter as well (and
stick largely to the extensional case).
Contents.
1. The Formal System Λ.———2
2. Examples and Calculations in Λ.———8
3. So—what’s going on?———17
4. Non-examples and Non-calculability in Λ—undecidability.———24
5. Solving equations, and proving λ-definable.———25
RC
6. Combinatorial Completeness; the Invasion of the Combinators.—45
7. λ-Calculus Models and Denotational Semantics.———63
8. Scott’s Original Models.———76
9. Two (not entirely typical) Examples of Denotational Semantics.—91
λ || • || ) || ( || x0 || x1 || x2 || x3 · · ·
All but the first four are called variables. The inductive definition
gives Λ as the smallest set of strings of symbols for which (i)
and (ii) below hold:
(i) Each xi Λ (atomic terms).
∈
(ii) If A and B are in Λ, and x is a variable, then
3
(AB) ∈ Λ and (λx • A) ∈ Λ .
4
Definition of free and bound occurrences of variables.
Notice that the definition below is exactly parallel to the
corresponding one in 1storder logic, [LM], Section 5.1. That
is, ‘λx•’ here behaves the same as the quantifiers ‘∀x’ and ‘Ix’ in
logic.
(i) The occurrence of xi in the term xi is free.
(ii) The free occurrences of a variable x in (AB) are its free
occurrences in A plus its free occurrences in B.
(iii) There are no free occurrences of x in (λx A) .
•
(iv) If y is a variable different than x, then the free
occurrences of y in (λx A) are its free occurrences in A
•
(v) A bound occurrence is a non-free occurrence.
6
from which ‘okayness’ [Or should it be ‘okayity’?] follows automatically.)
8
course; one must restore the outside brackets on a term when
it is used as a subterm of a larger term.
We should comment about the definition of the occurrences of
subterms. The implicit definition above is perfectly good to
begin with: a subterm occurrence is simply a connected
substring inside a term, as long as the substring is also a term
when considered on its own.
However, there is a much more useful inductive definition
just below. Some very dry stuff, as in Appendix B of [LM], shows
that the two definitions agree. The inductive definition is:
(0) Every term is a subterm occurrence in itself.
(i) The atomic term xi has no proper subterm occurrences.
(ii) The proper subterm occurrences in (AB) are all the
subterm occur- rences in A together with all the subterm
occurrences in B .
(iii) The proper subterm occurrences in (λx A) are all the
•
subterm occurrences in A .
Finally we come to an interesting definition! But why this is so
will remain a secret for a few more pages.
Definition of the equivalence relation ≈ .
This is defined to be the smallest equivalence relation on Λ for
which the following four conditions hold for all A, B, A', B', x and y:
(α) : λx • A ≈ λy • A[x→y]
if A[x→y] is okay and y has no free occurrence in A.
(β) : (λx • A)B ≈ A[x→B] if A[x→B] is okay.
(η) : λx • (Ax) ≈ A if A has no free occurrence
of x . (Congruence Condition) :
A ≈ A' and B ≈ B' =⇒ AB ≈ A'B' and λx • A ≈ λx • A' .
Exercise. Show that
λx • A ≈ λx • A' =⇒ A ≈ A' .
Remarks. The names (α), (β) and (η) have no significance as
far as I know, but are traditional. The brackets on the left-hand
9
side of (η) are not
10
necessary. Most authors in this subject seem to use A B for
≡
our A = B, and then use A = B for our A B . I strongly prefer to
≈
reserve ‘=’ to mean ‘is the same string as’, or more generally,
‘is the same thing as’, for things from Plato’s world. Think
about the philosophy, not the notation, but do get the notation
clear.
And most importantly, this definition more specifically says
that two terms are related by ‘ ’ if and only if there is a finite
≈
sequence of terms, starting with one and ending with the
other, such that each step in the sequence (that is, successive
terms) comes from (α), (β) or (η) being applied to a subterm of
some term, where those three ‘rules’ may be applied in either
direction.
The last remark contains a principle, which again seems to
go without saying in most expositions of the λ-calculus:
Replacement Theorem VII-1.1. If A, B are terms, and S, T
are (pos- sibly empty) strings of symbols such that SAT is a
term, then
(i) SBT is also a term;
(ii) if A ≈ B then SAT ≈ SBT .
This is not completely trivial, but is very easy by induction on the term
SAT . See Theorem 2.1/2.1∗ in [LM] for an analogue.
Here are a few elementary results.
Proposition VII-1.2. For all terms E and Ei for 1 ≤ i ≤ n
such that no variable occurs in two of E, E1, · · · En, we have
(λy1y2 · · · yn • E)E1E2 · · · En = E [ y →1 E 1 ] [ y →2 E2 ]···[yn→En ]
.
Proof. Proceed by induction on n, using (β) liberally. (The
condition on non-common variable occurrences is stronger than
really needed.)
Proposition VII-1.3. If A and B have no free occurrences of x, and if
Ax ≈ Bx, then A ≈ B .
Proof. Using the ‘rule’ (η) for the first and last steps, and
the replace- ment theorem (or one of the congruence
conditions) for the middle step,
11
A ≈ λx • Ax ≈ λx • Bx ≈ B .
12
Now we want to consider the process of moving from a left-
hand side for one of (α), (β) or (η) to the corresponding right-
hand side, but applied to a subterm, often proper. (However,
for rule (α), the distinction between left and right is irrelevant.)
Such a step is called a reduction.
If one such step gets from term C to term D, say that C reduces to D.
Now, for terms A and B, say that A B if and only if there is a
≥
finite sequence, starting with A and ending with B, such that
each term in the sequence (except the last) reduces to the next
term in the sequence.
Of course, A B implies that A B but the converse is false.
≥
To show that not all terms are equivalent in the –sense, one
≈
seems to need a rather non-trivial result, namely the following
theorem.
Theorem VII-1.4.(Church-Rosser) For all terms A, B and C, if A ≥ B
and A ≥ C, then there is a term D such that B ≥ D and C ≥ D.
We shall postpone the proof (maybe forever); readable ones are given in
[Kl], and in [HS], Appendix 1.
Say that a term B is normal or in normal form if and only if no
sequence of reductions starting from B has any step which is an
application of (β) or (η) (possibly to a proper subterm). An
equivalent statement is that B contains no subterm of the form
of the left-hand sides of (β) or (η), i.e.
(λx A)D ,
•
or
λx (Ax) if A has no free occurrence of x.
•
Note that we don’t require “A[x→D] is okay” in the first one.
Effectively, we’re saying that neither reductions (β) nor (η) can
ever be applied to the result of making changes of bound
variables in B.
The claims just above and below do constitute little
propositions, needed in a few places below, particularly
establishing the following important corol- lary of the Church-
Rosser theorem :
For all A, if A B and A C where B and C are both normal,
≥
then B and C can be obtained from each other by applications
of rule (α), that is, by a change of bound variables.
13
So a given term has at most one normal form, up to renaming bound variables.
In particular, no two individual variables, regarded as terms,
are related by , so there are many distinct equivalence classes.
≈
As terms, variables are normal, ≈ and are not related by ‘ ’ to
any other normal term, since there aren’t any bound variables
to rename ! But also, of course, you can
14
find infinitely many closed terms (ones without any free variables)
which are normal, no two of which are related by ‘ ’.
≈
Note also that if B is normal and B C, then C B .
≥
But a term B with this property is not necessarily normal, as
the example just below shows.
16
exists (possibly after an (α)-‘reduction’ is applied to change bound
variables, and doing the (β)-reduction when there is a choice
between (β)and (η) for the leftmost λ), we get an algorithm which
produces the normal form, if one exists for the start-term. It
turns out that having a normal form is undecidable, though it is
semi-decidable as the ‘leftmost algorithm’ just above shows.
Now comes a long list of specific elements of Λ . Actually they
are mostly only well defined up to reductions using only (α),
that is, up to change of bound variables. It is important only that
they be well≈defined as equivalence classes under . We give
them as closed terms, that is, terms with no free variables, but
leave somewhat vague which particular bound variables are to be
used. Also we write them as normal forms, all except Y . For
example,
Definitions of T and F .
T := λxy • x := (λx • (λy • x)) ; F := λxy • y := (λx • (λy • y)) ,
where x and y are a pair of distinct variables. The second
equality each time is just to remind you of the abbreviations.
Since terms obtained by altering x and y are evidently
equivalent to those above, using (α), the classes of T and F under
≈
are independent of choice of x and y. Since all the propositions
below concerning T and F are ‘equations’ using ‘ ’, not ‘=’, the
≈
choices above for x and y are irrelevant.
We shall continue for this one section to work in an
unmotivated way, just grinding away in the formal system, by
which I mean roughly the processing of strings by making
reductions, as defined a few paragraphs above. But the
notation for some of the elements defined below is quite
suggestive as to what is going on. As Penrose [Pe], p. XXXX
has said, some of this, as we get towards the end of this section,
is “magical”. . . and . . . “astonishing”! See also the exercise in
the next subsection.
17
Proofs. For the latter, note that
F B = (λx • (λy • y))B ≈ (λy • y)[x→B] = λy • y .
Thu
s
F BC = (FB)C ≈ (λy • y)C ≈ y[y→C] = C .
18
For VII-2.1(a), choose some variable z not occurring in B,
and also different from x. Then
Thus
T BC = (TB)C ≈ (λz • B)C ≈ B[z→C] = B .
Definition of ¬ . Now define
¬ := λz • z F T = λz • ((z F ) T ) /= (λz • z) F T ,
for some variable z. Once again, using rule (α), this is independent, up to
≈, of the choice of z.
VII-2.2 We have ¬ T ≈ F and ¬ F ≈ T .
Proof. Using VII-2.1(a) for the last step,
¬ T = (λz • z F T )T ≈ (z F T )[z→T ] = T F T ≈ F .
¬ F = (λz • z F T )F ≈ (z F T )[z→F ] = F F T ≈ T .
Definitions of ∧ and ∨ . Let
∧ := λxy • (x y F ) and ∨ := λxy • (x T y) ,
for a pair of distinct variables x and y.
VII-2.3 We have
an
d ∨T T≈∨T F≈∨F T≈T ; ∨F F≈F .
Proof. This is a good exercise for the reader to start becoming a λ-phile.
19
Definitions of 1st, rst and [A, B] .
1st := λx • x T ; rst := λx • x F ; [A, B] := λx • xAB ,
where x is any variable for the first two definitions, but, for the
latter defini- tion, must not occur in A or B, which are any terms.
Note that [A, B] is very different than (AB)
VII-2.4 We have
1st[A, B] ≈ A and rst[A, B] ≈ B .
= (λy • yAB)T ≈ T AB ≈ A ,
using VII-2.1(a). The other one is exactly parallel.
20
wher
e S := λxyz • (xz)(yz) and B := S (T S) T .
(Of course 1th is the same as 1st, and maybe we should have
alternative names 2nd and 3rd for 2th and 3th !) This is all we
need, by the second ‘equation’ below:
B := λxyz • x(yz) ,
for our purposes here, and proved that BABC ≈ A(BC) directly.
Definitions of I and An . Define, for any term A,
21
A0 := I := S T T ,
22
and
inductively An+1 := B A An .
VII-2.6 For all terms A and B, and all natural numbers i and j, we have
23
The latter ‘ ’ is from the remark just before the display.
≈
Finally, for any closed term B and many others, the
definition of s and rule (β) immediately give
s B ≈ λyz • By(yz)
So
pp := λuw • [ F , 1stw(rstw)(u(rstw)) ] ,
an
d p := λxyz • rst(x(pp y)[ T , z ]) .
VII-2.8 (Kleene) For all natural numbers n > 0, we have p n ≈ n − 1 .
Proof. First we show
pp x [ T , y ] ≈ [ F , y ] and pp x [ F , y ] ≈ [ F , xy ] .
Calculate :
pp x [ T , y ] ≈ (λuw • [ F , 1stw(rstw)(u(rstw)) ]) x [ T , y ]
≈ (λw • [ F , 1stw(rstw)(x(rstw)) ]) [ T , y ]
≈ [ F , 1st[ T , y ](rst[ T , y ])(x(rst[ T , y ])) ]
≈ [ F , T y(xy) ] ≈ [ F , y ] .
24
The last step uses VII-2.1(a).
Skipping the almost identical middle steps in the other one, we get
pp x [ F , y ] ≈ · · · ≈ [ F , F y(xy) ] ≈ [ F , xy ] .
Next we deduce
25
Exercise. Show that p 0 ≈ 0 .
Exercise. Do we have p s ≈ I or s p ≈ I ?
Definition of Y .
Y := λx • (λy • x(yy))(λy • x(yy)) .
Of course, x and y are different from each
other.
It is a very powerful concept, one that in a sense has inspired many math-
ematical developments, such as recursion theory, indeed also some literary
productions, such as Hofstadter’s popular book.
Erwin Engeler [En]
FB ≈ A[y→F ][x→B] .
26
The ‘okayness’ of the substitutions below follows easily from the
restrictions on A and B, which are stronger than strictly needed,
but are satisfied in all the applications.
Proof. Using the basic property of Y from VII-2.9 for the first step,
Then, for any term B, using VII-2.10 for the first step,
F n + 1 ≈ H[ F (p n + 1) , p n + 1] ≈ H[ F n , n ] ,
27
At first we saw some terms which seemed to be modelling
objects in propositional logic. Here are two slightly curious
aspects of this. Firstly, T and F are presumably truth values in a
sense, so come¬ ∧ from∨ the semantic side, whereas , and are
more like syntactic objects. Having them all on the same footing
does alter one’s perceptions slightly, at least for non-CSers.
Secondly, we’re not surprised to see the latter versions of the
connectives acting like functions which take truth values as
input, and produce
∧ truth values as output. But everything’s on
a symmetric footing, so writing down a term like F now seems
like having truth values as functions which can take
connectives as input, not a standard thing to consider. And F F
seems even odder, if interpreted as the truth value ‘substituted
into itself’!
But later we had terms n which seemed to represent
numbers, not func- tions. However, two quick applications of
the β-rule yield
n¯ A B ≈ AnB ≈ A(A(A · · · (AB)) · · ·)) .
So if A were thought of as representing a function, as explained
a bit below, the term n¯ may be thought of as representing that
function which maps A to its n-fold iterate An.
Now the n¯ , along with s and p as successor and
predecessor functions, or at least terms representing them,
give us the beginnings of a numeral system sitting inside Λ. The
other part of that numeral system is isz , the “Is it 0?”–
predicate. Of course predicates may be regarded as functions.
And the final result goes a long way towards showing that
terms exist to represent all primitive recursive functions. To
complete the job, we just need to amplify VII-2.11 to functions
of more than one variable, and make some observations about
composition. More generally, in the subsection after next, we
show that all (possibly partial) recursive functions are definable in
the λ- calculus. It is hardly surprising that no others are,
≈
thinking about Church’s thesis.
Exercise. Show that k l lk . Deduce that 1 is alternatively
inter- pretable as a combinator which defines the exponential
28
function. (See the exercise before VII-2.8 for the addition and
multiplication functions.)
Show that 9 9 9 9 n , where n > 2m, where m is greater than
≈
the mass of the Milky Way galaxy, measured in milligrams !
It is likely clear to the reader that [A , B ] was supposed to
be a term which represents an ordered pair. And we then
produced ordered n-tuple
29
terms in Λ, as well as terms representing the projection functions.
So we can maybe settle on regarding each term in Λ as
representing a function in some sense, and the construction
(AB) as having B fed in as input to the function represented by
A, producing an ‘answer’ which again represents a function. But
there is sense of unease (for some of us) in seeing what appears
to be a completely self-contained system of functions, every
one of which has the exact same set of all these functions
apparently as both its domain and its codomain. (A caveat here
—as mentioned below, the ex- istence of terms without normal
forms perhaps is interpretable as some of these functions being
partial.) That begins to be worrisome, since except for the 1-
element set, the set of all functions from X to itself has strictly
larger cardinality than X. But nobody said it had to be all
functions. However it still seems a bit offputting, if not
inconsistent, to have a bunch of functions, every one of which
is a member of its own domain! (However, the formula for Y
reflects exactly that, containing ‘a function which is substituted
into itself’—but Y was very useful in the last result, and that’s only
the beginning of its usefulness, as explained in the subsection
after next. The example of a term with no normal form at the
beginning of Subsection VII-2 was precisely ‘a function which is
substituted into itself’.) The “offputting” aspect above arises
perhaps from the fact that, to re-interpret this stuff in a consistent
way within axiomatic 1storder set theory, some modification
(such as abandon- ment) of the axiom of foundation seems to be
necessary. See however Scott’s constructions in Subsection VII-
8 ahead.
In fact, this subject was carried on from about 1930 to 1970 in a very
syntactic way, with little concern about whether there might exist
mathemat- ical models for the way Λ behaved, other than Λ
itself, or better, the ≈set of equivalence classes Λ/ . But one at
least has the Church-Rosser theorem, showing ≈ that, after
factoring out by the equivalence relation , the whole thing
doesn’t reduce to the triviality of a 1-element set. Then around
1970 Dana Scott produced some interesting such models, and not
just because of pure mathematical interest. His results, and later
30
similar ones by others, are now regarded as being very
fundamental in parts of computer science. See Subsections VII-
7, VII-8 and VII-9 ahead.
Several things remain to be considered. It’s really not the
terms them- selves, but rather their corresponding equivalence
≈
classes under which might be thought of as functions. The terms
themselves are more like recipes for calculating functions. Of
what does the calculation consist? Presumably
31
its steps are the reductions, using the three rules, but
particularly (β) and (η). When is a calculation finished? What is
the output? Presumably the an- swer is that it terminates when
the normal form of the start term is reached, and that’s the
output. Note that the numerals n themselves are in normal
form (with one exception). But what if the start term has no
normal form? Aha, that’s your infinite loop, which of course must
rear its ugly head, if this scheme is supposed to produce some
kind of general theory of algorithms. So perhaps one should
consider the start term as a combination of both input data and
procedure. It is a fact that a normal form will eventually be
pro- duced, if the start term actually has a normal form, by always
concentrating on the leftmost occurrence of λ for which an
application of (β) or (η) can do a reduction (possibly preceded
by an application of (α) to change bound variables and make the
substitution okay). This algorithm no doubt qualifies intuitively as
one, involving discrete mechanical steps and a clear ‘signal’, if
and when it’s time to stop computing. The existence of more
than one uni- versal Turing machine, more than one ATEN-
command for computing the universal function, etc. . . .
presumably correspond here to the possibility of other definite
reduction procedures, besides the one above, each of which
will produce the normal form of the start term (input), if the
start term has one. As mentioned earlier, such reduction
schemes are algorithms showing the semidecidability of the
existence of a normal form. Later we’ll see that this question is
undecidable.
We have probably left far too late a discussion of what the
rules (β) and (η) are doing. And indeed, what is λ itself? It is
often called the abstraction-operator. The symbol string ‘λx ’ is to
•
alert the computer (hu- man or otherwise) that a function of
the variable x is about to be defined. And of course it’s the free
occurrences of x in A which give the ‘formula’ for the function
•
which λx A is supposed to be defining.
So now the explanations of (β) and (η) are fairly clear :
The rule (β) just says that to evaluate the function above
on B, i.e. (λx A)B , you substitute B for the free occurrences of x
•
32
•
in A (of course!). And rule (η) just says that, if x doesn’t occur
freely in A, then the function defined by λx (Ax) is just A itself
(of course—‘the function obtained by
evaluating A on its argument!).
[As an aside, it seems to me to be not unreasonable to ask
why one shouldn’t change the symbol ‘λ’ to a symbol ‘ ’. After
'
all, that’s what we’re talking about, and it has already been
noted that the second basic
33
symbol ‘ ’ is quite unneeded, except for one of the
•
abbreviations. So the string (λx A) would be replaced by (x
• '
A). Some things would initially be more readable for ordinary λ-
inexperienced mathematicians. I didn’t want to do that in the last
section, because that would have given the game away too
early. And I won’t do it now, out of respect for tradition.
' Also
we can think of ‘λ’ as taking place in a formal language,
whereas ‘ ’ is a concept from the metalanguage, so maybe that
distinction is a good one to maintain in the notation. Actually, λ-
philes will often use a sort of ‘Bourbaki-λ’ in their
metalanguages !]
This has become rather long-winded, but it seems a good place
to preview Subsection VII-6, and then talk about the history of
the subject, and its successful (as well as aborted) applications.
There is a small but infinite subset of Λ called the set of
combinators (up to “ ”, just the set of closed terms), some of
≈
whose significance was discovered by Schonfinkel around
1920, without explicitly dealing with the λ-calculus. We shall
treat it in some detail in Subsection VII-6, and hopefully explain
more clearly than above about just which functions in some
abstract sense are being dealt with by the subject, and what it
was that Schonfinkel discovered.
Independently reinventing Schonfinkel’s work a few years
later, Curry attempted to base an entire foundations of
mathematics on the combinators, as did Church soon after, using
the λ-calculus, which he invented. But the system(s) proposed
(motivated by studying very closely the processes of
substitution occurring in Russell & Whitehead’s Principia
Mathematica) were soon discovered to be inconsistent, by
Kleene and Rosser, students of Church. There will be nothing
on this (largely abortive) application to foundations here
(beyond the present paragraph). It is the case that, for at least
45 years after the mid-1930’s when the above inconsistency
was discovered (and possibly to the present day), there
continued to be further attempts in the same direction by a
small group, a subject known as illiative combinatory logic. [To
34
get some feeling for this, take a look at [Fi], but start reading
at Ch.1. Skip the Introduction, at least to begin, and also the
Preface, or at least don’t take too seriously the claims there,
before having read the entire book and about 77 others,
including papers of Peter Aczel, Solomon Fefferman and Dana
Scott.] It’s not totally unreasonable to imagine
35
a foundational system in which “Everything is a function!” might
be attractive. After all, our 1storder set theory version of
foundations is a system in which “Everything is a set!” (and
G¨odel seems to tell us that, if it is consistent, we can never be
very sure of that fact). On the other hand, it is also hardly
surprising that there might be something akin to Russell’s
paradox, which brought down Frege’s earlier higher order
foundations, but in an attempted system with the combinators.
The famous fixed point combinator Y (see VII- 2.9), or an analogue,
played a major role in the Kleene-Rosser construction showing
inconsistency.
What I always found disturbing about combinatory logic was what seemed
to me to be a complete lack of conceptual continuity. There were no functions
known to anyone else that had the extensive properties of the combinators
and allowed self-application. I agree that people might wish to have such
functions, but very early on the contradiction found by Kleene and Rosser
showed there was trouble. What I cannot understand is why there was not
more discussion of the question of how the notion of function that was be-
hind the theory was to be made even mildly harmonious with the “classical”
notion of function. The literature on combinatorial logic seems to me to
be somehow silent on this point. Perhaps the reason was that the hope of
“solving” the paradoxes remained alive for a long time—and may still be
alive.
D. Scott [BKK-ed]
37
in the λ-calculus. A real breakthrough occurred when he saw
how to do it for the predecessor function, our VII-2.8. It was
based on Kleene’s re- sults that his supervisor, Church, first
made the proposal that the intuitively computable functions be
identified with the mathematically defined set of λ-definable
functions. This is of course Church’s Thesis, later also called
the Church-Turing Thesis, since Turing independently proposed
the Turing computable functions as the appropriate set within a
few months, and gave a strong argument for this being a
sensible proposal. Then he proved the two sets to be the same,
as soon as he saw Church’s paper. The latter pa- per proved
the famous Church’s Theorem providing a negative solution to
Hilbert’s Entscheidungsproblem, as had Turing also done
independently. See the subsection after next for the definition
of λ-definable and proof that all recursive functions are λ-
definable.
Finally, I understand that McCarthy, in the late 1950’s,
when putting forth several fundamental ideas and proposing what
have come to be known as functional programming languages (a bit of
which is the McSELF pro- cedures from earlier), was directly
inspired by the λ-calculus [Br-ed]. This lead to his invention of
LISP, the first such language (though it’s not purely functional,
containing, as it does, imperative features). It is said that all
such languages include, in some sense, the λ-calculus, or even
that they are all equivalent to the λ-calculus. As a miniature
example, look at Mc- SELF in [CM]. A main feature
differentiating functional from imperative programming
languages (of which ATEN is a miniature example) is that each
program, when implemented, produces steps which (instead of
altering a ‘store’, or a sequence of ‘bins’ as we called it) are
stages in the calculation of a function, rather like the reduction
steps in reducing a λ-term to normal form, or at least
attempting to do so. Clearly we should expect there to be a
theorem saying that there can be no algorithm for deciding
whether a λ-term has a normal form. See the next subsection
for a version of this theorem entirely within the λ-calculus.
38
Another feature differentiating the two types of languages
seems to be a far greater use of the so-called recursive (self-
referential) programs in the functional languages. In the case
of our earlier miniature languages, we see that leaving that
feature out of McSELF would destroy it (certainly many
computable functions could not be programmed), whereas one
of the main points of Subsection IV-9 was to see that any
recursive command could be replaced by an ordinary ATEN-
command.
39
We shall spend much time in the subsection after next with
explaining in detail how to use the Y -operator to easily produce
terms which satisfy equations. This is a basic self-referential
aspect of computing with the λ- calculus. In particular, it
explains the formula which was just ‘pulled out of a hat’ in the
proof of VII-2.11.
We shall begin the subsection after next with another, much
simpler, technical idea, which implicitly pervades the last
subsection. This is how the triple product ABC can be regarded
as containing the analogue of the if-then-else-construction which
is so fundamental in McSELF.
First we present some crisp and edifying versions within the λ-
calculus of familiar material from earlier in this work.
VII-4 Non-examples and Non-calculability in Λ—undecidability.
Thinking of terms in Λ as being, in some sense, algorithms,
here are a few analogues of previous results related to the non-
existence of algo- rithms/commands/Turing machines/recursive
functions. All this clearly de- pends on the Church-Rosser
theorem, since the results below would be man- ifestly false if,
for example, all elements of Λ had turned out to be related
under “≈” . Nothing later depends on this subsection.
VII-4.1 (Undecidability of .) There is no term E Λ
≈ ∈
such that, for all
A and B in Λ,
( T if A ≈ B ;
E AB F if A /≈ B .
≈
First Proof. Suppose, for a contradiction, that E did exist, and define
u := λy • E(yy)0 1 0 .
By the (β)-rule, we get uu ≈ E(uu)0 1 0 . Now either uu ≈ 0 or uu /≈ 0 .
But in the latter case, we get uu ≈ F 1 0 ≈ 0 , a contradiction.
And in the former case, we get uu ≈ T 1 0 ≈ 1,
contradicting uniqueness of normal form, which tells us that 1
/≈ 0.
VII-4.2 (Analogue of Rice’s Theorem.) If R ∈ Λ is such that
40
∀A ∈ Λ either R A ≈ T or R A ≈ F ,
then either ∀A ∈ Λ , R A ≈ T or ∀A ∈ Λ , R A ≈ F .
41
Proof. For a contradiction, suppose we can choose terms B and
C such that RB ≈ T and RC ≈ F . Then define
M := λy • R y C B and N := Y M.
By VII-2.7, we get N ≈ MN . Using the (β)-rule for the second ≈ ,
RN ≈ R(MN ) ≈ R(RNCB) .
Since RN ≈ either T or F , we get either
T ≈ R (T CB) ≈ R C ≈ F ,
or
F ≈ R (F CB) ≈ R B ≈ T ,
both of which are rather resounding contradictions to uniqueness
of normal form.
Second Proof of VII-4.1. Again assuming E exists, fix any B
in Λ and define R := EB. This immediately contradicts VII-4.2,
by choosing any C with C /≈ B , and calculating RB and RC .
Second Corollary to VII-4.2. (Undecidability of the existence
of normal form.) There is no N ∈ Λ such that
42
VII-5 Solving equations, and proving RC ⇐⇒ λ-definable.
Looking back at the first two results in Subsection VII-2, namely
T AB ≈ A and F A B ≈ B ,
we are immediately reminded of if-then-else, roughly : ‘a true
condition says to go to A, but if false, go to B.’
Now turn to the displayed formula defining A in the proof of VII-2.11,
i.e.
A := (isz x) G (H[ y(px) , px ]) .
This is a triple product, which is more-or-less saying
‘if x is zero, use G, but if it is non-zero, use H[ y(px) , px ]’ .
This perhaps begins to explain why that proof works, since we are
trying to get a term F which, in a sense, reduces to G when x is
zero, and to something involving H otherwise. What still needs
more explanation is the use of Y , and the form of the term H[
y(px) , px ] . That explanation in general follows below; both it
and the remark above are illustrated several times in the
constructions further down.
The following is an obvious extension of VII-2.10.
Theorem VII-5.1 Let z, y1, , yk be mutually distinct
·
variables and let A be any term in which no other variables
occur freely. Define F to be the closed term Y (λzy1 · · · yk • A) .
Then, for all closed terms V1, · · · , Vn , we
ha
FV 1 · · · Vn ≈ A[z→F ][y →V ] · · ·[y →V ] .
1 1 k k
43
F n1 · · · nk ≈ − − − F − − − − − −F − − − F − − − ,
44
where the right-hand side is something constructed using the
λ-calculus, except that F is an unknown term which we wish to
find, and it may occur more than once on the right-hand side.
We need to construct a term F which satisfies the equation.
The right-hand· side will probably also have the numerals n1 nk
appearing. Well, the above theorem tells us how to solve it.
Just produce the term A by inventing “k + 1” distinct variables
not occurring bound in the right-hand side above, and use
them to replace the occurrences of F and of the numerals in
that right-hand side. That will produce a term, since that’s
what we meant by saying that the right-hand side was
“constructed using the λ-calculus”. Now the theorem tells us
the formula for F in terms of A.
Of course the theorem is more general, in that we can use
any closed terms Vi in place of numerals. But very often it’s
something related to a function of k-tuples of numbers where
the technique is used. In the specific example of the proof of
VII-2.11, the function f to be represented can be written in the
if-then-else-form as
“f (x) , if x = 0, is g, but otherwise is h(f (pred(x)), pred(x)) .”
Here “pred” is the predecessor function. Also g is a number
represented by the term G (which is presumably a numeral),
and h is a function of two variables represented by H . This
immediately motivates the formula for A in the proof of VII-
2.11 .
What we called the “basic property of Y from VII-2.9”
says that it is a so-called fixed point operator. There are
many other possibilities for such an operator. But the detailed
form of Y is always irrelevant to these constructions in
existence proofs. However, when getting right down to the
details of compiling a functional programming language, those
details are undoubtedly essential, and there, presumably, some
Y ’s are better than others. We gave another one, due to
Turing, soon after the introduction of Curry’s Y in Subsection
VII-2.
45
Definition of the numeral equality predicate term, eq .
Here is a relatively simple example, producing a useful little
term for testing equality of numerals (but only numerals—testing
for ≈ in general is
46
unecidable, as we saw in the previous subsection!); that is, we require
( T if n = k ;
eq n k F if n /= k .
≈
An informal McSELF-like procedure for the actual predicate would be
EQ(n, k) : if n = 0
⇐
then if k = 0
then
true
else
false
else if k = 0
then
false
else EQ(n 1, k 1)
—
(This is not really a McSELF procedure for two reasons—
the equality predicate is a primitive in McSELF, and the values
should be 1 and 0, not true and false. But it does show that we
could get ‘more primitive’ by beginning McSELF only with
‘equality to zero’ as a unary predicate in place of the binary
equality predicate.)
What the last display does is to tell us a defining equation
which eq must satisfy:
λ-definability
First here is the definition. In the spirit of this work, we go
47
directly to the case of (possibly) partial functions, rather than
fooling around with totals first.
48
Definition. Let D ⊂ Nk and let f : D → N . Say that the
function f is λ-definable if and only if there is a term F in Λ such
that the term Fn 1 · · · nk has a normal form for exactly those (n1,
, nk) which are in D, the domain ·
of f , and, in this case, we have
F n1 · · · nk ≈ f (n1, · · · , nk) .
50
Thus we merely need to take
H := Y (λzy1 · · · yk • A) ,
wher
e
A := (isz (Gy1 · · · yk))y1(z(s y1)y2 · · · yk) .
It’s surprising how easy this is, but ‘that’s the power of Y ’ !
52
Now there is a term Da,b such that
53
Theorem VII-5.4 Any total recursive function is λ-definable.
Proof. We use the fact that any such function can be
obtained from the starters discussed below, by a sequence of
compositions and minimizations which use as ingredients (and
produce) only total functions. See [CM]. The starters are shown
to be λ-definable below, and VII-5.3 deals with composition.
So it remains only to use VII-5.2 to deal with minimization.
Suppose that p : Nn+1 → N is a total function which is λ-defined
by H. And assume that for all (k1, · · · , kn), there is a k with p(k, k1,
· · · , kn) = 0.
Define the total function mp : Nn → N by
mp(k1, · · · , kn) := min{k | p(k, k1, · · · , kn) = 0} .
Then, if h(g) is the h produced from g in the statement of VII-5.2,
it is straightforward to see that mp = h(p) (zeron, π1, , πn),
◦ ·
where the last tuple of “n”-variable functions consists of the
zero constant function and the projections, all λ-definable. But
h(p) is λ-definable by VII-5.2. So, by closure under composition,
the proof is complete.
54
At this point, if we wish to use the starter functions just
above, the proof that every total recursive function is λ-definable
would be completed by showing that the set of λ-definable
functions is closed under primitive recursion. The latter is
simply the many-variable case of the curried version of VII-2.11,
and can be safely left to the reader as yet another (by now
mechanical) exercise with Y .
55
On the other hand, we don’t need to bother with primitive
recursion if we go back to our original definition of the recursive
functions, and also accept the theorem that a total recursive
function can be built from starters using only total functions at
all intermediate steps (as we did in this last proof). But we do
need ≥to check that the addition and multiplication functions, and
the ‘ ’-predicate, all of two variables, are λ-definable, since they
were the original starters along with the projections. These are
rather basic functions which everybody should see λ-defined. So
we’ll now do these examples of using the fixed point operator,
plus one showing how we could also have gotten a different
predecessor function this way. In each case we’ll write down a
suitable (informal) McSELF-procedure for the function
(copying from earlier), then use it to write down the λ-calculus
equation, convert that into a formula for what we’ve been
calling A, and the job is then done as usual with an application
of the fixed-point operator Y .
Here is that list of terms for basic functions mentioned just
above, where we’ve just stolen the informal McSELF procedures
from an earlier section.
The addition function. (See also the exercise before VII-2.8)
ADD(m, n) : if n = 0
⇐
then m
else ADD(m + 1, n − 1)
So the term add must satisfy
57
The ‘greater than or equal to’ predicate.
GEQ(m, n) : if n = 0
⇐
then 1
else if m = 0
then 0
else GEQ(m − 1, n − 1)
So the term geq must satisfy
(
e
q
(
s
m
)
n
)
m
(
q
q
(
s
m
)
n
)
q
q
59
where
A := (eq(s x)y)x(z(s x) y) .
Theorem VII-5.5. For any choice of numeral system, that is, of terms
s, p, isz, and n for each n ≥ 0, which satisfy
s n ≈ n + 1 ; isz 0 ≈ T ; if n > 0 , then isz n ≈ F and p n ≈ n − 1 ,
every (partial) recursive function is λ-definable.
But we’ll just prove it for the earlier Church numeral
system, reviewed below. Were it not for needing a term which
‘does a loop’, i.e. has no normal form, in ‘all the right places’, we
would at this point have done what was necessary to prove this
theorem. For obtaining the needed result about the non-
existence of normal forms, it is surprising how many technicalities
(just below), and particularly what difficult syntactic theorems
(see the proof of the theorem a few pages ahead), seem to be
needed .
We begin with a list of new and old definitions, and then
four lemmas, before completing the proof.
Confession. Just below there is a slight change from an earlier definition, and
also a ‘mistake’—a small fib, if you like. We come clean on this
right at the end of this subsection, using sf print. This seems the
\
best way, and it doubles the profit, as we explain there. So most
readers, if they discover one or both of these two minor anomalies,
should just press on in good humour. But if that is a psychological
impossibility, go to the end of the subsection to get relief.
Define inductively, for Λ-terms X and Y :
60
X Y :=
(
m Y if m = 0 ;
X (XY )
m−1
if m > 0 .
61
Thus XmY = X(X(· · · · · · (X(XY )) · · ·)) = XC for C = Xm−1Y
. Recall the definitionm := λxy • xmy .
Note that xmy, despite appearances, doesn’t have the form Ay
for any Λ- term A, so we cannot apply (η)-reduction to
•
reduce this to λx xm . In fact, m is in normal form. (In both
cases, m = 1 is an exception.) We have
m A B ≥ A mB ,
as is easily checked for any terms A and B, though you do have
to check that (xmy)[x→A][y→B] = AmB , which is not entirely, though
almost, a no-brainer.
Recall that s := λxyz • (xy)(yz) . One can easily directly show that
s m ≥ m + 1 , though it follows from the earlier fact that sm ≈ m+1
,
since the right-hand side is in normal form. We have also
IX ≥ X and KXY ≥ X,
where now K := λxy • x is denoted just K .
Define D := λxyz • z(Ky)x . Then we have
DAB 0 ≥ 0(KB)A ≥ (KB)0A = A ,
and, for i > 0 ,
Define
T := λx • D0(λuv • u(x(sv))u(sv)) ,
and then define, for any Λ-terms X and Y ,
62
Lemma A. For X and Y in Λ with XY ≥ i , we have
( Y if i = 0 ;
PXY PX(sY ) if i > 0 .
≥
Also, in each case, the “ ” may be realized by a sequence of (β)-
≥
reductions, the first of which obliterates the leftmost
occurrence of λ .
Proof. Using, in the first step, the definitions of PXY and of
T , and a leftmost (β)-reduction,
so
63
Lemma B. Given m such that h(k1, k2,kn, l) 0 for 0 ≤ l < m , we
have, for these l, ·
(i) P (Hk1k2 k )l P (Hk1k2 k )l + 1 ,
··· n ≥ · n
where each such “” may be realized by a sequence of (β)-
≥
reductions at least one of which obliterates the leftmost
occurrence of λ ; and
(ii) P (Hk1k2 · · · kn)0 ≥ P (Hk1k2 · · · kn)m .
Proof. (ii) is clearly immediate from (i). As for the latter, with
X = Hk1k2 · · · kn and Y = l, we have
XY = Hk1k2 · · · knl ≥ h(k1, k2, · · · kn, l) = (say) i ,
where i > 0 . And so
we
get P (Hk1k2 · · · kn)0 ≥ P (Hk1k2 · · · kn)m ≥ m .
64
Main Lemma D. (a) Let
Then
6
(b)(ii) We have
for primitive recursive functions g and h. These two are, in particular, total
recursive functions. So we can, by VII-5.4, find G and H in Λ such that
for all j, k1, k2, kn, k in N. But since the right-hand sides are
·
already in normal form, by the normal form theorem we can
≈
change “ ” to “ ” in both places above. But now, using parts
of Lemma D, first get an F as in
part (a), and then use it to get an E as in part (b). Part (b)(i)
gives the “≈” equation required for (k1, · · · , kn) ∈ dom(f ) (and
more, since it even gives a specific way to reduce Ek1k2 kn to f
(k1, k2, kn), dependent on having · · ·similar reductions
· for G and H
—see the exercises below as well). For (k1, · · · , kn) /∈ dom(f ), part
(b)(ii) gives an infinite reduction sequence
starting with Ek1k2 kn , and such that, infinitely often, it is the leftmost
·
λ which is obliterated. Thus it is a “quasi-leftmost reduction” in the sense of
[Kl], because of leftmost (β)-reductions occurring arbitrarily far out. Thus,
6
by Cor 5.13, p.293 of that thesis, Ek1k2 kn has no normal form (in the
·
extensional λ-calculus), as required.
6
“leftmost” to “quasi-leftmost” in the preceding statement about
those M ’s which do have normal forms).
6
holds using the new definition. Since they are defined using xmy, the
numerals m have now been changed slightly, so there are actually
quite a few earlier results to which that comment applies. It just
seemed to the author that in VII-2 it would avoid fuss if Xm
actually had a meaning on its own as a term. From the
“Confession” onwards, that definition should be ignored—Xm never
occurs without being followed by a term Y .
The mistake referred to in the “Confession” is that, in the
extensional λ- calculus, though all the other numerals are, the
numeral 1 is NOT in normal form, despite the claim. It can be (η)-
reduced to the term we are denoting as I. There is only one place
where this affects the proofs above, as indicated in the paragraph
after next.
But first let’s double our money. The reader may have
noticed that (η)- reduction never comes in anywhere in these
proofs—it’s always (β)-reduction. Furthermore, the numeral 1, in
the intensional λ-calculus, IS in normal form. So, using the
corresponding three syntactic theorems from just a few paragraphs
above (whose proofs are actually somewhat easier in the
intensional case), we have a perfectly good and standard proof that
every recursive function is λ- definable in the intensional λ-calculus.
The only place where the non-normality of the numeral 1 is
problematic, in the proofs above for the extensional λ-calculus, is in
the proof of VII-5.5, where we go from asserting “ ” to asserting “
”, when it happens ≈ that the right- hand side is 1. But just use the
above result for the intensional λ-calculus to see that we can get
defining terms for which these reductions exist. So that takes care
of the small anomaly (and it did seem simplest to ignore it and to
be a bit intellectually dishonest until now).
7
Every λ-definable function has the form fn,A for some (n, A)
[though the function fn,A itself might not actually be definable
using the term A, if A is not chosen so that Aν1 · · · νn “loops” for
all →ν /∈ Dn,A ] . However, we prove that all fn,A are recursive.
In fact, as in the case of -computable functions (but using
B
lower-case names so as not to confuse things), one is able to
write
Go¨d(x i ) =< 1, i > ; Go¨d(AB) =< 2, Go¨dA, Go¨dB > ; Go¨d(λxi •A) =< 3, i,
Go¨dA > .
Now define kln to be the relation for which
7
(iii) h is the code of the history of that leftmost
reduction : that is, if the reduction is
Aν1 · · · νn = B1 ≥ B2 ≥ B3 ≥ · · · ≥ Bk = l ,
then h =< b1, , bk >, where bi = Go¨d(B i ) .
·
Finally print : N N may be suitably defined (see the next
→
exercise) so that
print(< b1, · · · , bk >) = l if bk = Go¨d(l) .
Go¨d(3) = < 3, 2, < 3, 1, < 2, < 1, 2 >, < 2, < 1, 2 >, < 2, < 1, 2 >, < 1, 1
>>>>>> .
7
Both the λ-calculus and the theory of combinators were originally devel-
oped as foundations for mathematics before digital computers were
invented. They languished as obscure branches of mathematical logic until
rediscov- ered by computer scientists. It is remarkable that a theory
developed by logicians has inspired the design of both the hardware and
software for a new generation of computers. There is an important lesson
here for peo- ple who advocate reducing support for ‘pure’ research: the
pure research of today defines the applied research of tomorrow.
•
•
•
David Turner proposed that Sch¨onfinkel and Curry’s combinators
·
could be used as machine code for computers for executing functional pro-
gramming languages. Such computers could exploit mathematical properties
of the λ-calculus · · ·
•
•
•
We thus see that an obscure branch of mathematical logic underlies im-
portant developments in programming language theory, such as:
(i) The study of fundamental questions of computation.
(ii) The design of programming languages.
(iii) The semantics of programming languages.
(iv) The architecture of computers.
7
VII-6 Combinatorial Completeness
and the Invasion of the Combinators.
Let Ω be a set with a binary operation, written as
juxtaposition. We shall be discussing such objects a lot, and
that’s what we’ll call them, rather than some subname of hemi-
demi-semi-quasiloopoid/groupoid/algebraoid, or indeed
applicative set, though the latter is suggestive of what we are
n
trying to understand here. The set ΩΩ consists of all functions
→
of “n” variables from Ω, taking values in Ω, i.e. functions Ω n Ω .
It has a rather small subset consisting of those functions
‘algebraically definable just using the operation’—for example,
(ω1, ω2, ω3) '→ (ω2((ω3ω3)(νω1)))ω2 ,
where ν is some fixed element of Ω. [We’ll express this precisely
below.] The binary operation Ω is said to be combinatorially
complete if and only if every such “algebraically definable”
function can be given as left multiplication by at least one
element of Ω (i.e. ‘is representable’). For example, there would
have to be an element ζ such that, for all (ω1, ω2, ω3) we have
ζω1ω2ω3 = (ω2((ω3ω3)(νω1)))ω2 .
We are using the usual convention here on the left, that is,
ζω1ω2ω3 := ((ζω1)ω2)ω3 .
n
Note that the cardinality of ΩΩ is strictly bigger than that of Ω
except in the trivial case that Ω has only one element, so we
cannot expect this definition to lead anywhere without some
restriction on which functions are supposed to be realizable by
elements ζ .
We shall show quite easily in the next section that Λ/ is
≈
combinatori- ally complete, where the binary operation is the one
inherited from Λ. Then we introduce an ostensibly simpler set Γ,
basically the combinators extended with variables. This produces
the original combinatorially
∼ complete binary operation, namely
Γ/ , due to Schonfinkel, who invented the idea. He didn’t prove
it this way, since the λ-calculus hadn’t been invented yet, but we
≈ 7
do it in the second section below by showing that the two
binary operations Λ/ and Γ/ are isomorphic. So we don’t
really have two different examples here, but the new description
using Γ is simpler in many respects.
7
Reduction algorithms for “ ” in Γ have actually been used in
∼
the design of computers, when the desire is for a machine well
adapted to functional programming languages [Go1].
Combinatorial Completeness of Λ/ ≈ .
First let’s give a proper treatment of which functions we are
talking about, that is, which are algebraically definable. The
definition below is rather for- mal. It makes proving facts about
these functions easy by induction on that inductive definition.
However, for those not hidebound by some construc- tivist
philosophy, there are simpler ways of giving the definition—“. .
. the smallest set closed under pointwise multiplication of
functions, and contain- ing all the projections and all constant
functions. . . ”. See the axiomatic construction of combinatorially
complete binary operations near the begin- ning of Subsection
VII-8 for more on this.
If Θ is any set, then the free binary operation generated by
Θ, where we use to denote the operation, is the
∗
smallest set, FREE(Θ), of ∪non-empty
{ ∗ finite strings of symbols
}
from Θ (, , ) (a disjoint union of course!)
such that each
'→ ∗element of Θ is such a (length 1)∗string, ∗ ∗ and ∗
the
set is closed under (g, h) (g h) . So it consists of strings
such as ((θ3 ((θ1 θ2) θ4)) θ3). Now suppose given a binary
operation on Ω as in the introduction, so
here the operation is denoted by juxtaposing. Let {ν1, ν2, · · ·} be a
sequence of ‘variables’, all distinct, and disjoint from Ω, and,
for each n ≥ 1, take Θ to be the set Ω ν1, ν2, , νn .
Now define a ∪ function
{ ··· }
n
FUN = FUN n : FREE(Ω ∪ {ν1, ν2, · · · , νn}) → ΩΩ
7
defined to be those which are in the image of the FUN n for
≥
some n 1 . In the example beginning this section, the function
given is
FUN 3 [ (ν2 ∗ ((ν3 ∗ ν3) ∗ (ν ∗ ν1))) ∗ ν2 ] .
So let’s formalize the earlier definition:
Definition. The binary operation Ω is combinatorially complete if
and only if, for all n and for all f ∈ FREE(Ω ∪ { ν1 , ν2 , · · · , νn
}) , there is a ζ ∈ Ω such that, for all (ω1, ω2, · · · , ωn) we have
ζω1ω2 · · · ωn = FUN n (f )(ω1, ω2, · · · , ωn) .
Combinators.
Let S and K be two distinct symbols, disjoint from our
original sequence x1, x2, · · · of distinct variables (which, though it
won’t come up directly, are quite distinct from the variables νi of
the last section).
Definition. The binary operation Γ is written as
juxtaposition, and is defined to be the free binary operation
{ ··
generated by the set S, K, x1, x2, . Its elements are called by
various names : combinatory terms [S¨o][Ko], com- binations [S¨o],
CL-terms [Ba], combinatorial expressions [Go]. We won’t need a
name for them as individuals. The combinators are the elements
{ }
of the subset of Γ which is the free binary operation generated
by S, K . Thus
combinators are just suitable strings of S’s and K’s and lots of
brackets. We shall again use the convention that in Γ, the
string P1P2P3 · · · Pn really means ((· · · ((P1P2)P3) )Pn) .
Finally, let ∼ be the equivalence relation on Γ generated by the following
conditions, where we are requiring them to hold for all elements A, B, A1,
etc. in Γ :
SABC ∼ (AC)(BC) ; KAB ∼ A ;
A1 ∼ B1 and A2 ∼ B2 =⇒ A1A2 ∼ B1B2 ;
an
d
Ax ∼ Bx =⇒ A ∼ B , if the variable x appears in neither A nor B .
Remarks. (i) See the proof of VII-6.12 below for a more
explicit de- scription of .
∼
7
(ii) Don’t be fooled by the ‘argument’, that the last
(extensionality) con- dition is redundant, which tells you to just
multiply on the left by K and apply the second and third
conditions : the point is that K(Ax) is a lot different than KAx.
8
(iii) The requirement that x isn’t in A or B cannot be removed
from the last condition defining . For we have K(xx)x xx,
~ ∼
but K(xx) x. If the latter was false, it would follow from the
results below ∼ in several ways that K(BB) B for any B.
Taking
~ B = I = SKK, it would follow that KI I, and then
multiplying on the ∼ right by A, that I A for any A. This would
show that relates any ~ two elements. That contradicts the main
result below that the -classes are in 1-1 correspondence ≈ with
the -classes from Λ : The Church-Rosser theorem tells us that
there are tons of -classes.
(iv) Apologies to the λ-experts for avoiding a number of arcane
subtleties, both here and previously. In particular, whether to
assume that last condi- tion, or something weaker such as
nothing at all, occupies a good deal of the literature. This is the
great intensionality versus extensionality debate.
To make the ascent from mathematics to logic is to pass from the
object language to the metalanguage, or, as it might be said without
jargon, to stop toying around and start believing in something . . . A
function could be a scheme for a type of process which would become
definite when presented with an argument . . . Two functions that are
extensionally the same might be ‘computed’, however, by quite different
processes . . . The mixture of abstract objects needed would obviously have
to be very rich, and I worry that it is a quicksand for foundational studies
. . . Maybe after sufficient trial and error we can come to agree that
intensions have to be believed in, not just reconstructed, but I have not
yet been able to reach that higher state of mind.
Dana Scott ([SAJM-ed], pp. 157-162)
But, for the intended audience here, that debate is simply an unnecessary
complication on first acquaintance with the subject, I believe.
Theorem VII-6.2. The binary operation on Γ/ , induced by
∼
the juxtaposition operation on Γ, is combinatorially complete.
More examples arising from combinatorial completeness.
The top half of the following table is already familiar. On all
lines, that the λ-version gives the effect is immediate. The
reader might enjoy the following exercises :
(1) Show that the definition gives the effect; that is, work
directly with combinator identities rather than the λ-calculus.
8
(2) Use the λ-versions of the ingredients in the definition
column, and reduce that λ-term to normal form, which should
be the entry in the 3rd column, up to re-naming bound
variables.
8
It should be understood that the variables in each entry of
the third column are distinct from each other.
Ψ : Γ → Λ and Φ:Λ→
Γ such that the following five results hold :
8
VII-6.3. If P ∼ Q then Ψ(P ) ≈ Ψ(Q).
VII-6.4. If A ≈ B then Φ(A) ∼ Φ(B).
VII-6.5. For all P ∈ Γ, we have ΦΨ(P ) ∼ P .
VII-6.6. For all A ∈ Λ, we have ΨΦ(A) ≈ A.
VII-6.7. For all P and Q in Γ, we have Ψ(PQ) ≈ Ψ(P )Ψ(Q).
That this suffices is elementary general mathematics which
the reader can work out if necessary. The first two give maps
back and forth between the sets of equivalence classes, and the
second two show that those maps are inverse to each other. The
last one assures us that the maps are morphisms.
Definition of Ψ. Define it to be the unique morphism of
binary opera- tions which maps generators as follows : all the
variables go to themselves; K goes to K := T := λxy • x; and S
goes to S := λxyz • (xz)(yz).
Remarks. (i) Since Ψ, by definition, preserves the
operations, there is nothing more needed to prove VII-6.7
(which is understated—it holds with
= , not just .
≈
(ii) The sub-binary operation of Λ generated by S , K and all
the variables is in fact freely generated by them. This is
equivalent to the fact that Ψ is injective. But we won’t dwell on
this or prove it, since it seems not to be useful in establishing
that the map induced by Ψ on equivalence classes is injective.
But for concreteness, the reader may prefer to identify Γ with
that subset of Λ, namely the image of Ψ . So you can think of
the combinators as certain kinds of closed λ-expressions,
closed in the sense of having no free variables.
The main job is to figure out how to simulate, in Γ, the
abstraction operator in Λ.
Definition of µx • in Γ . For each variable x and each P ∈ Γ, define
µx • P as follows, by induction on the structure of P :
If P is an atom other than x, define µx • P := KP .
Define µx • x := SKK := I .
Define µx • (QR) := µx • QR := S(µx • Q)(µx • R) .
8
Definition of Φ . This is again inductive, beginning with the atoms,
i.e. variables x :
Φ(x) := x ; Φ(AB) := Φ(A)Φ(B) ; Φ(λx • A) := µx • Φ(A) .
It should be (but seldom is) pointed out that the correctness of
this definition depends on unique readability of the strings
which make up the set Λ .
The first result is the mirror image of the last part of that definition.
K Ψ(P ) .
8
so they agree, completing the proof.
Any λ-expert reading the previous proof and next result will
possibly find them irritating. We have made extensive use of
extensionality in the last proof. But the results actually hold
(and are very important in more encyclopaedic
≈ versions of this
subject) with ‘ ’ replaced by ‘ in’, where the latter is defined
using only rules (α) and (β), and congruence [that is, drop rule
(η)]. So an exercise for the reader is to find proofs of these slightly
more delicate facts.
8
present result, VII-6.5. [The method there also gives an
alternative to the above calculation for showing ΦΨ(K) ∼ K .]
The latter two are almost trivial, and the first two are simple
consequences of earlier identities : Using VII-2.1(a),
8
VII-6.9. If x isn’t in P, then µx • P ∼ KP .
Proof. Proceeding by induction on P , the atomic case
doesn’t include P = x, so we get equality, not just . The
∼
inductive step goes as follows : For any T ,
VII-6.10. If P ∼ Q then µx • P ∼ µx • Q.
Proof. By the definition of , and basics on equivalence
∼
relations, it suffices to prove one fact for each of the four
conditions generating the relation ∼ , those facts being
µx • SABC ∼ µx • AC(BC) ; µx • KAB ∼ µx • A ;
µx • A1 ∼ µx • B1 and µx • A2 ∼ µx • B2 =⇒ µx
• A1A2 ∼ µx • B1B2 ; and finally, if the variable z appears in
neither A nor B,
µx • Az ∼ µx • Bz =⇒ µx • A ∼ µx • B .
However we first eliminate need to deal with the last of
these when z = x by proving the case of the entire result where
x is not in P or Q. In that case, using VII-6.9,
µx • P ∼ KP ∼ KQ ∼ µx • Q .
As for the last fact when z /= x, for any R we have
(µx • Az)R = S(µx • A)(µx • z)R ∼ (µx • A)R((µx • z)R) ∼
(µx•A)R(KzR) ∼ (µx•A)Rz .
The same goes with B replacing A, so taking R as a variable
different from z and which is not in A or B, we cancel twice to
get the result.
For the second fact above, we have, for any C,
9
(µx • KAB)C = S(µx • KA)(µx • B)C ∼ (µx • KA)C((µx • B)C) ∼
S(µx • K)(µx • A)C((µx • B)C) ∼ (µx • K)C((µx • A)C)((µx • B)C) ∼
KKC((µx • A)C)((µx • B)C) ∼ K((µx • A)C)((µx • B)C) ∼ (µx • A)C ,
as suffices.
The first one is similar but
messier. The third fact is
quick:
9
Now proceed by contradiction, assuming that R P has a
∼
shortest pos- sible verification among all pairs for which the
result fails for some x and some Q—shortest in the sense of
sequence length.
Then the pair (R, P ) cannot have any of the forms as in (1) to
(5) with respect to its “shortest verification”, because (R[x→Q], P
[x→Q]
) would have the same form (with respect to another
verification in the case of (2) and (3), by
“shortest”).
It also cannot have the form in (6), since (again by “shortest”) we could
concatenate verifications for ~ B[x→Q] and [x→Q] 2
~ [x→Q] to prove
B 2
1 1 A
[x→Q]
A
that (A1A2)[x→Q] ∼ (B1B2)[x→Q] .
So (R, P ) must have the form in (7), i.e. we have (R, P ) = (A, B) where
(Az, Bz) occurs earlier in that shortest sequence, for some z not
in A or B; and yet A[x→Q] /∼ B[x→Q] .
Now necessarily z /= x, since otherwise x does not occur in A or B, so
A[x→Q] = A ∼ B = B[x→Q] .
9
(µx • P )Q ∼ P [x→Q] .
9
[Note that the first part plus ‘extensionality’ quickly give the
direct analogue in Γ for the (η)-rule from Λ, namely
µx • Rx ∼ R if x is not in R .
To see this, just apply the left side to x.]
Proof. Proceeding by induction on P for the first identity,
the atomic case when P = x just uses Ix ∼ x . When P is an
atom other than x,
(µx • P )x = KPx ∼ P .
Then the second identity follows using the first one, using VII-
6.12, and using the previous fact in VII-6.11 about x not
occurring in µx • P :
(µx • P )Q = ((µx • P )x)[x→Q] ∼ P [x→Q] .
9
P = y : Both sides give I .
P = x : Both sides give KQ, up to ∼ . (Use VII-6.9 .)
P is any other atom : Both sides give KP .
The inductive step uses nothing but
definitions : (µy • TR)[x→Q] = (S(µy • T )(µy • R))
[x→Q]
=
S(µy • T )[x→Q](µy • R)[x→Q] ∼ S(µy • T [x→Q])(µy • R[x→Q]) =
µy •(T [x→Q]R[x→Q]) = µy •((TR)[x→Q]) .
=
(Φ(C)Φ(D))[x→Φ(B)] = Φ(CD)[x→Φ(B)] .
(II) A = λx • C : We have,
Φ((λx • C)[x→B]) = Φ(λx • C) = µx • Φ(C) ,
whereas, using VII-6.11,
(Φ(λx • C))[x→Φ(B)] = (µx • Φ(C))[x→Φ(B)] = µx • Φ(C) .
9
observing that, by VII- 6.15, the variable y doesn’t occur in
Φ(B) because it is not free in B, the
9
latter being the case because (λy • C)[x→B] is okay.
Φ((λy•C)[x→B]) = Φ(λy•(C[x→B])) = µy•(Φ(C[x→B])) ∼
µy•(Φ(C)[x→Φ(B)]) ∼ (µy•Φ(C))[x→Φ(B)] = (Φ(λy•C))[x→Φ(B)] .
9
To prove (α)', the left-hand side is
µx • Φ(A) ∼ µy • (Φ(A)[x→y]) ∼ µy • Φ(A[x→y]) ,
9
For example, as we saw above, when Ω = Γ/ ∼ , we could take κ and σ
to be the equivalence classes [K]∼ and [S]∼ , respectively.
Proof. Assume it is combinatorially complete, and with F =
FUN 2 (ν 1 ) and G = FUN3((ν1 ν3) (ν1 ν3)), let κ and σ respectively
∗ ∗ ∗
be correspond- ing elements ζ from the definition of
combinatorial completeness. Thus, as required,
where
(n
:= FREE(Ω ∪ {ν1, ν2, · · · , νn})
Ω)
.+
1
a ∼ b ⇐⇒ ∀χ , χ(a) = χ(b)
1
(referring to χ which map elements of Ω by the identity map).
Then we have, proceeding by induction on P as in VII-6.13,
(µy • P ) ∗ y ∼ P ,
and, substituting Q for y,
(µy • P ) ∗ Q ∼ P [y→Q] ,
and finally, by induction on n,
1
And that’s only the beginning, as further along we have
“categorical lambda models”, etc., though here the first adjective
refers to the method of construc- tion, rather than an axiomatic
definition. In view of all this intricacy (and relating well to the
simplifications of the last subsection), we shall consider in
detail only what are called extensional lambda models . The
·
definition is quite simple : Such an object is any structure (D ,
, κ , σ) , where “ ” is a binary operation on the set D, which set
has specified distinct elements κ and σ causing it to be
combinatorially complete as in VII-6.18, and the operation
must satisfy
1
am aware, but easily seen to be mathematically equivalent to the
definitions usually given. In particular, this will indicate how to
get maps from the λ-calculus, Λ, into any model, and quite
explicitly into extensional ones as above, one such map for
each assignment of variables.
A (not entirely) Optional Digression on λ- models.
To remind you of a very basic point about sets and functions, there is a 1-1 adjoint-
ness correspondence as follows, where AB denotes the set of all functions with (domain,
codomain) the pair of sets (B, A) :
AB×C ←→ (AC)B
f →| [b '→ [c '→ f (b, c)]]
[(b, c) '→ g(b)(c)] ←| g
[Perhaps the λ-phile would prefer λbc f (b, c) and λ(b, c) gbc .] Taking B = C = A, this
• •
specializes to a bijection between the set of all binary operations on A and the set
(AA)A. In particular, as is easily checked, the extensional binary operations on A
correspond to the injective functions from A to AA .
Let us go back to some of the vague remarks of Subsection VII-3, and try to puzzle
out a rough idea of what the phrase “model for λ-calculus” ought to mean, as an object
in ordinary naive set-theoretic mathematics, avoiding egregious violations of the axiom
of foundation. We’ll do this without requiring the reader to master textbooks on model
theory in order to follow along. It seems to the author that the approach below, certainly
not very original, is more natural than trying to force the round peg of λ-calculus into the
square hole of something closely similar to models for 1storder theories.
We want some kinds of sets, D, for which each element in D can play a second role
as also somehow representing a function from D to itself. As noted, the functions so
represented will necessarily form a very small subset, say [D ~ D], of DD, very small
−
relative to the cardinality of DD.
The simplest way to begin to do this is to postulate a surjective function φ, as below :
φ : D → [D−~ D] ⊂ DD .
So we are given a function from D to the set of all its self-functions, we name that function’s
image [D ~ D], and use φ as the generic name for this surjective adjoint. It is the adjoint
−
of a multiplication D D D, by the remarks in the first paragraph of this digression.
×
Now let Λ(0) be the set of all closed terms in Λ, those without free variables. A
model, D, such as we seek, will surely involve at least a function Λ (0) D with good
→
properties with respect to the structure of Λ(0) and to our intuitive notion of what that
structure is supposed to be modelling; namely, function application and function
abstraction. As to the former, the obvious thing is to have the map be a morphism
between the binary operations on the two sets.
1
Because Λ is built by structural induction, it is hard to define anything related to Λ
without using induction on structure. Furthermore, by first fixing, for each variable x,
an element ρ(x) ∈ D, we’d expect there to be maps (one for each “assignment” ρ)
ρ+ : Λ → D ,
which all agree on Λ with the one above. (So their restrictions to Λ(0) are all the
(0)
same map.) These ρ+ ’s should map each variable x to ρ(x) . Thinking about
application and abstraction (and their formal versions in Λ), we are led to the
following requirements.
ρ+ : x '→ ρ(x) ;
ρ+ : MN '→ φ(ρ+ (M ))(ρ+ (N )) ;
ρ+ : λx • M '→ ψ(d '→ ρ[x'→d](M )) .
+
The middle display is the obvious thing to do for a ‘semantic’ version of application.
It’s just another way of saying that ρ+ is a morphism of binary operations.
But the bottom display needs plenty of discussion. Firstly, the assignment
ρ[x'→d] is the assignment of variables which agrees with ρ, except that it assigns d D
∈ but is
to the variable x. (This is a bit like substitution, so our notation reflects that,
[x'→d] [x→d]
deliberately different, rather than ). For the moment, let ψ(f ) vaguely mean
“some element of D which maps under φ to f ”. Thus, the bottom display says that
λx M , which we •
intuit as “the function of x which M gives”, should be mapped by ρ+ to the element
of D which is “somehow represented” (to quote our earlier phrase) by the function D
→ D which sends d to ρ[x'→d](M ) .
Inductively on the
+ structure of M , it is clear from the three displays that, for all ρ,
the values ρ+ (M ) are completely determined, at least once ψ is specified (if the
displays really make sense).
And it is believable that ρ+ (M ) will be independent of ρ for closed terms M . In
fact, one would expect to be able to prove the following :
1
So now, after all this motivation, here is the definition of λ-model or model for the
λ-calculus which seems the most natural to me :
1
Example. This illustrates how to reduce calculation of the ρ+ to statement (1)
of the theorem, using (2) and (3)(ii), and then why the three needed cases of (3)(i)
hold.
ρ+ (λxyz • yzx) = ψ(d '→ ρ[x'→d](λyz • yzx)) = ψ(d '→ ψ(c '→ ρ[x'→d][y'→c](λz • yzx)))
+ +
= ψ(d '→ ψ(c '→ ψ(b '→ ρ[x'→d][y'→c][z'→b](yzx)))) = ψ(d '→ ψ(c '→ ψ(b '→ cbd))) .
+
But we want to see, as follows, why each application of ψ makes sense, in that it
applies to a function which is actually in [D ~ D] . Using combinatorial completeness,
choose elements ϵ, ζ1, ζ2 and ζ3 in D−such that, for all p, q, r, s and t in D, we have
ϵp = (ψ ◦ φ)(p) ; ζ1pqr = prq ; ζ2pqrs = p(qsr) ; ζ3pqrst = p(qrst) .
[Note that a neat choice for ϵ is ψ(ψ φ) .]
◦
Now
b '→ cbd = ζ1cdb = φ(ζ1cd)(b) ,
so the innermost one is okay; it is φ(ζ1cd) ∈ [D−~ D] . But then
c '→ ψ(b '→ cbd) = ψ(φ(ζ1cd)) = ϵ(ζ1cd) = ζ2ϵζ1dc = φ(ζ2ϵζ1d)(c) ,
so the middle one is okay; it is φ(ζ2ϵζ1d) ∈ [D−~ D] . But then
d '→ ψ(c '→ ψ(b '→ cbd) = ψ(φ(ζ2ϵζ1d)) = ϵ(ζ2ϵζ1d) = ζ3ϵζ2ϵζ1d = φ(ζ3ϵζ2ϵζ1)(d) ,
so the outer one is okay; it is φ(ζ3ϵζ2ϵζ1) ∈ [D−~ D] .
Meyer-Scott approach.
The element ϵ from the last example determines ψ φ, and therefore ψ, since φ is
◦
surjective. It is easy to show that, for all a and b in D ,
ϵab = ab and [∀c, ac = bc] =⇒ ϵa = ϵb :
ϵ · a · b = (ψ ◦ φ)(a) · b = φ((ψ ◦ φ)(a))(b) = (φ ◦ ψ ◦ φ)(a)(b) = (φ)(a)(b) = a · b .
[∀c, a · c = b · c] i.e. φ(a) = φ(b) =⇒ ψ(φ(a)) = ψ(φ(b)) i.e. ϵ · a = ϵ · b .
Using these two facts, one can quickly recover the properties of ψ. Thus our definition of
λ-model can be redone as a triple (D ; ; ϵ) with (ii) replaced by the two properties just
·
above. A minor irritant is that ϵ is not unique. That can be fixed by adding the condition
ϵϵ = ϵ. So this new definition is arguably simpler than the one given, but I think less
motivatable; but the difference between them is rather slight.
Scholium on the common approach.
The notation ρ+ (M ) is very non-standard. Almost every other source uses the notation
[[M ]]Mρ where M = (D ; ·) .
Both in logic and in denotational semantics (and perhaps elsewhere in computer science),
the heavy square brackets [[ ]] seem to be ubiquitous for indicating ‘a semantic version of
1
the syntactic object inside the [[ ]]’s’. As mentioned earlier, the vast majority of sources
for λ-model (and for weaker (more general) notions such as λ-algebra and proto-λ-algebra)
express the concepts’ definitions by lists of properties of the semantic function [[ ]]. And,
of course, you’ll experience a heavy dosage of “ =” and “ ”. Again [Hind-Seld], Ch. 11
| ▶
(and Ch. 10 for models of combinatory logic), is the best place to start, esp. pp.112-122,
for the list of properties, first the definition then derived laws.
Given such a definition, however weakly motivated, one can get to our definition—
more quickly than the other way round (the theorem above embellished)—by defining ψ
using
ψ(φ(a)) := ρ[y'→a](λx • yx) .
+
Note that λx yx y, but λx yx in y, as expected from this. The properties of ρ+
• ≈ • /≈
vaguely alluded to above assure one that
φ(a) = φ(b) [i.e. ∀d, ad = bd] =⇒ ρ[y'→a](λx • yx) = ρ[y'→b](λx • yx) ,
+ +
whereas ρ (y) /= ρ
[y'→a]
(y) clearly, if a /= b !
[y'→b]
+ +
The combinatorial completeness in our definition will then follow from Schonfinkel’s
theorem, noting that σ := ρ+ (λxyz (xz)(yz)) and κ := ρ+ (T ) (for any ρ) have the
needed properties. •
Again, from the viewpoint which regards [[ ]], or ρ+ , as primary, the element ϵ
previous would be defined by ϵ := ρ+ (λxy xy) for any ρ ;—hardly surprising in view
of the law ϵab = ab . •
Finally, going the other way, a quick definition of ρ+ for an extensional D is just
the composite ρ
Φ ∗
ρ+ = (Λ −→ Γ − D) ,
where we recall from VII-6 that Γ is the free binary operation on S, K, x1, x2, and
the map Φ is the one inducing the isomorphism between ≈Λ/ and { Γ/ , and· ·here, ρ is
∼
defined to be the unique morphism of binary operations taking each xi to ρ(xi), and taking
S and K to the elements σ and κ from Schonfinkel’s theorem (unique by
extensionality).
Term model.
The set of equivalence classes Λ/ in is a combinatorially complete non-trivial
≈ would be if we used “ ” in place of “ in”.
binary operation. It is not extensional, but
≈ is to define ≈
The canonical way to make it into a λ-model, using our definition here,
It will be an exercise for you to check that this is well-defined, and satisfies (ii) in the
definition of λ-model. Notice that, as it should, if we used “≈” in place of “≈in”, the
displayed formula would just say that ψ ◦ φ is the identity map.
What does a λ-model do?
(1) For some, using the Meyer-Scott or ‘our’ definition, it is sufficient that it provides an
interesting genus of mathematical structure, one for which finding the ‘correct’ definition
1
had been a decades-long effort, and for which finding examples more interesting than the
term models was both difficult and important. See the next subsection.
(2) For computer scientists, regarding the λ-calculus as a super-pure functional pro-
gramming language, the functions [[ ]] are the denotational semantics of that language.
See the rest of this subsection, the table on p.155 of [St], and the second half of Subsection
VII-9.
(3) For logicians, [[ ]] provides the semantics for a putative form of proto-logic. Or,
coming down from that cloud, one may think of [[ ]] as analogous to the “Tarski definition
of truth” in standard 1storder logic. Again, read on !
1
corrections/criticisms from any knowledgeable source. See
Subsection VII-9 for some very specific such semantics of the
language ATEN from the main part of this paper, and for the
semantics of the λ-calculus itself.
We did give what we called the semantics of BTEN right after the syn-
1
tactic definition of that language. As far as I can make out, that
was more like what the CSers would call an “operational
semantics”, rather than a denotational semantics. The reason
is that we referred to a ‘machine’ of sorts, by talking about
“bins” and “placing numbers in bins after erasing the numbers
already there”. That is a pretty weak notion of a machine. But
to make the “meaning” of a language completely “machine
independent” is one of the main criteria for an adequate
denotational semantics.
See also the following large section on Floyd-Hoare logic. In
it, we use a kind of operational semantics for several command
languages. These as- sign, to a (command, input)-pair, the entire
sequence of states which the pair generates. So that’s an even
more ‘machine-oriented’ form of operational se- mantics than the
above input/output form. On the other hand, Floyd-Hoare logic
itself is sometimes regarded as a form of semantic language
specification, much more abstract than the above forms. There
seems to have been (and still is?) quite a lot of controversy as
to whether the denotational viewpoint is preferable to the above
forms, especially in giving the definitions needed to make sense of
questions of soundness and completeness of the proof systems in
F-H logic. But we are getting ahead of ourselves.
Along with some of the literature on F-H logic, as well as
some of Dana Scott’s more explanatory productions, the two CS
textbooks [Go] and [Al] have been my main sources. The
latter book goes into the mathematical details (see the next
subsection) somewhat more than the former.
In these books, the authors start with very small imperative
programming languages : TINY in [Go], and “simple language” in
Ch.5 of [Al]. These are clearly very close to ATEN/BTEN from
early in this work. They contain some extras which look more
like features in, say, PASCAL. Examples of “extras” are :
(1) the command “OUTPUT E” in [Go] ;
(2)the use of “BEGIN. . . END” in [Al], this being simply a
more readable replacement for brackets ;
(3) the SKIP-command in [Al], which could be just whdo(0 =
1)(any C) or x0 ←: x0 from ATEN .
1
So we may ask : which purely mathematical object
corresponds to our mental image of a bunch of stored natural
numbers, with the number called
1
νi stored in bin number i ? That would be a function
σ : IDE → V AL ,
where σ is called a state, IDE is the (syntactic) set of identifiers, and V AL
is the (semantic) ‘set’ of values.
Now IDE for an actual command language might, for
example, be the set of all finite strings of symbols from the 62-
element symbol set consisting of the 10 decimal digits and the
×
(26 2) upper- and lower-case letters of the Roman alphabet,
with the proviso that the string begin with a letter (and with a
few exclusions, to avoid words such as “while” and{“do” being ··
used as identifiers). For us in BTEN, the set IDE was simply the
set x0, x1, x2,
of variables. (In each case we of course have a countably
infinite set. And in the latter case, there would be no problem
in changing to a set of finite strings from a finite alphabet, for
example the strings x|||···|| . Actually, as
long as the subscripts are interpreted as strings of [meaningless?] digits, as
opposed to Platonic integers, we already have a set of finite
strings from a finite alphabet.)
In referring above to V AL, we used “ ‘set’ ” rather than
“set”, because this is where λ-calculus models begin to come in,
or what are called domains in this context. For us, V AL was just
the set N= {0, 1, 2, · · ·} of all natural numbers. And σ was the
function mapping xi IDE to νi V AL. But ∈ for doing∈denotational
semantics with a real, practical language, it seems to be
essential that several changes including the following are made :
Firstly, the numbers might be more general, such as allowing
all integers (negatives as well).
Secondly, V AL should contain something else, which is often
⊥
called in this subject. It corresponds more-or-less to our use of
“err” much earlier.
Thirdly, we should have a (rather weak) partial order on the
±
objects above (much more on this in the next subsection). As to
the actual ordering here, other than requiring b b for all b
±
(part of the definition of partial order), we want c for all
⊥± ∈
numbers c V AL (but no other relation between elements as
±
above). So is a kind of ‘how much information?’ partial order.
1
Finally, for a real command language, V AL would contain other
semantic objects such as the truth values Tr and Fs. So, for
example, in [Al], p.34, we see definitions looking like
Bν = INT + BOOL + Char + · · · ,
1
an
d
V AL = Bν + [V AL + V AL] + [V AL × V AL] + [V AL−~ V AL] .
Tr Fs 0 1 −1 2 −2 · · Tr Fs 0 1 −1 2 ···
·
\ / \\ / / \\ //
\ / \\ / /· · · , or \\ // ··· ?
\/ \\ merely \\ //
//
⊥BOOL ⊥INT ⊥
\ /
\ /
\/
⊥
In any case, it is crucial that we have at least the following :
(1) Some syntactically defined sets, often IDE, EXP and COM ,
where the latter two respectively are the sets of expressions
and commands.
(2) Semantically defined ‘domains’ such as V AL and perhaps
where the right-hand side is a subposet of the poset of all functions from
IDE to the underlying set of the structure V AL.
(3) Semantically defined functions
an
d C : COM → [STATE−~ STATE] .
1
The situation in [Go] is a bit messier (presumably of necessity) because is not used
as a value, but separated out, called sometimes “unbound” and sometimes ⊥ “error”, but
systematically.
1
Be that as it may, let us try to explain and in the case of
E C
our simple language BTEN/ATEN. Much more detail on this
appears two subsections ahead.
Here EXP is the set of terms and quantifier-free formulae
from the lan- guage of 1storder number theory. Then is that
E
part of the Tarski definition of truth as applied to the particular
interpretation, N, of that 1storder lan- guage which
(1) assigns to a term t, in the presence of a state ν, the natural number
t . That is, adding in the totally uninformative ‘truth value’ ⊥,
v
E[t](ν) := tv .
(But here we should add that tv = if any νi,
⊥
with xi occurring in t, is .)
(2) and assigns to a formula G, in the presence of a state ν,
one the truth values. That is,
E[G](ν) Tr if G is true at ν ;
:= Fs if G is false at ν ;
⊥
if any νi, with xi free in G, is ⊥ .
Also, COM is the set of commands in ATEN or BTEN. And
assigns to a command C, in the presence of a state ν, the C new
state C (ν) . See the beginning of Subsection VII-9|| for more
details.
So it appears that there is no great difference between the
‘operational’ semantics already given and this denotational
⊥ semantics, other than allowing as a ‘totally undefined number’
or as a ‘totally uninformative truth value’.
But such a remark unfairly trivializes denotational semantics for
several rea- sons. Before expanding on that, here is a question
that experts presumably have answered elsewhere.
What are the main impracticalities of trying to bypass the
whole enter- prise of denotational semantics as follows ?
(1) Give, at the syntactic level, a translation algorithm of the
relevant practical imperative programming language back into
ATEN. For example, in the case of a self-referential command
(i.e. recursive program), I have already done that in IV-9 of this
1
work. I can imagine that GOTO-commands would present some
problems, but presumably not insuperable ones.
(2) Then use the simple semantics of ATEN to do whatever
needs to be done, such as trying to prove that a program never
loops, or that it is correct
1
according to its specifications, or that an F-H proof system is
complete, or, indeed, such as implementing the language.
This is the sort of thing mentioned by Schwartz [Ru-ed] pp.
4-5, but reasons why it is not pursued seem not to be given
there. Certainly one relevant remark is that, when one (as in
the next section on F-H logic) generalizes ATEN by basing it
on an arbitrary 1storder language and its semantics on an
arbitrary interpretation of that language, the translation as in
(1) may no longer be possible. And so, even when the
interpretation contains the natural numbers in a form which
permits G¨odel numbering for coding all the syntax, the
translation might lack some kind of desirable naturality. And
its efficiency would be a problem for sure.
This is becoming verbose, but there are still several matters
needing ex- planation, with reference to why denotational
semantics is of interest.
First of all, it seems clear that the richer ‘extras’ in real
languages such as PASCAL and ALGOL60, as opposed to the
really basic aspects already seen in these simpler languages, are
where the indispensibility of denotational semantics really
resides. (As we’ve emphasized, though these extras make
programming a tolerable occupation, they don’t add anything
to what is programmable in principle.) Examples here would be
those referred to in (1) just above—self-referential or recursive
commands, and GOTO-statements (which are also self-referential
in a different
B− sense), along with declarations and calls to
procedures with parameters, declarations of variables (related
to our -command in BTEN), declarations of functions, etc.
Here we are completely ignoring parallelism and concurrent
programming, which have become very big topics in recent
years.
But I also get the impression that mechanizing the projects
which use denotational semantics is a very central aspect here.
See the last chapter of [Al], where some of it is made executable,
in his phraseology. The math- ematical way in which we had
originally given the semantics of BTEN is inadequate for this.
1
It’s not recursive enough. It appears from [Go] and [Al], without
being said all that explicitly, that one aspect of what is really
being done in denotational semantics is to translate the language
into a form of the λ-calculus, followed by perhaps some
standard maps (like the ρ+ ear- lier) of the latter into one or
another “domain”. So the metalanguage of the first half of this
semantics becomes quite formalized, and (it appears to me), is a
pure functional (programming?) language. (Perhaps other pure
func-
1
tional programming languages don’t need so much denotational
semantics (beyond that given for the λ-calculus itself) since
they are already in that form?)
For example, lines -5 and -6 of p. 53 of [Al] in our notation
above and applied to BTEN become
C[ite(F )(C)(D)](σ) := (if E[F ](σ) = Tr, then C[C], else C[D])(σ) ,
for all formulas F , commands C and D, and states σ . At first, this
appears to be almost saying nothing, with the if-then-else being
semanticized in terms of if-then-else. But note how the right-
hand side is a statement from the informal version of McSELF,
in that what follows the “then” and “else” are not commands,
but rather function values. In a more formal λ-calculus version
of that right-hand side, we would just write a triple product, as
was explained in detail at the beginning of Subsection VII-5.
Even more to the point here, lines -3 and -4 of p. 53 of [Al]
in our notation above and applied to BTEN become
C[whdo(F )(C)](σ) := (if E[F ](σ) = Tr, then C[whdo(F )(C)]◦C[C], else Id)(σ) ,
First notice how much more compact this is than the early
definition in the semantics of BTEN. And again, the “if-then-
else” on the right-hand side would be formalized as a triple
product. But much more interestingly here, we have a self-
reference, with the left-hand side appearing buried inside the
right-hand side. So here we need to think about solving
equations. The Y -operator does that for us systematically in
the λ-calculus, which is where the right-hand side resides, in one
sense. The discussion of least fixed points ending the next
subsection is clearly relevant. A theorem of Park shows that, at
least in Scott’s models from the next subsection, the same
result is obtained from Tarski’s ‘least fixed points’ operator in
classical lattice theory (see VII-8.12 ending the next
subsection), as comes from the famous Y - operator of Curry
within λ-calculus.
We shall return to denotational semantics after the more
purely mathe- matical subsection to follow on Scott’s construction
1
of λ-calculus models for doing this work.
1
VII-8 Scott’s Original Models.
We wish to approach from an oblique angle some of the
fundamental ideas of Dana Scott for constructing extensional
models of the λ-calculus. There are many other ways to motivate
this for mathematics students. In any case, presumably he won’t
object if this approach doesn’t coincide exactly with his
original thought processes.
One of several basic ideas here, in a rather vague form, is to employ some
mathematical structure on A in order to construct such a model
with underly- ing set A. We wish to define a (relatively small)
subset of AA, which we’ll call [A ~ A], and a bijection between it
−
and A itself. The corresponding extensional binary operation on
A (via the adjointness discussed in the first paragraph of the
digression in the last subsection) we hope will be combina-
torially complete; that is, all its algebraically definable
functions should be representable. And of course,
representable implies algebraically definable almost by
−
definition. To pick up on the idea again, can we somehow build
a structure so that [A ~ A] turns out to be precisely the
‘structure-preserving’ functions from A to itself? If this were the
case, and the structure-preserving functions satisfied a few simple
and expected properties (with respect to com- position
particularly), then the proof of combinatorial completeness
would become a fairly simple formality :
representable = algebraically definable
⇒
= structure-preserving =
⇒
representable . In the next
several paragraphs, the details of how such an object would
give an extensional model for the λ-calculus are given in an
axiomatic style, leaving for later ‘merely’ the questions of what
“structure” to use, and of
self-reflection.
Details of essentially the category-theoretic approach.
Assume the following:
Data
Given a collection of ‘sets with structure’, and, for any two
—
1
×
−
such objects, A and B, a subset [A ~ B] BA of ‘structure
preserving functions’. Given also a canonical structure on both
the Cartesian product A B and on [A ~ B] .
Axioms
(1) The ith projection is in [A1 × · · · × An−~ Ai] for 1
≤ i ≤ n ; in
1
particular, taking n = 1, the identity map, idA, is in [A−~ A] .
(2) If f ∈ [A−~ B] and g ∈ [B−~ C] then the composition g ◦ f is
always in [A−~ C] .
(3) The diagonal map δ is in [A−~ A × A] , where δ(a) := (a, a) .
(4) Evaluation restricts to a map eν in [[A ~ B] A ~ B] , where
— ×
eν(f, a) = f (a).
(5) If f1 ∈ [A1−~ B1] and f2 ∈ [A2−~ B2] then the f1 × f2 is
necessarily in [A1 × A2−~ B1 × B2] , where (f1 × f2)(a1, a2) :=
(f1(a1), f2(a2)).
(6) All constant maps are “structure preserving”.
(7) The adjointness bijection, from AB×C to (AC)B, maps [B × C
−~ A] into [B−~ [C−~ A]] .
All this will be relatively easy to verify, once we’ve chosen the
appropriate “structure” and “structure preserving maps”, to
make the following work, which is more subtle. That choosing
will also be motivated by the application to denotational
semantics.
Self-reflective Object
Now suppose given one of these objects A, and a mutually
inverse pair of bijections which are structure preserving :
mult : A × φ e
[A−~ A] × A −→ A
A −
×
→
id
1
Now [A ~ A] contains idA and all constant maps, and is closed under
−
pointwise multiplication of functions as follows :
1
A −→ δ A× f A× muLt
− A
A × A
g
→
−
x '→ (x, x) '→ (f (x), g(x)) '→ f (x) · g(x)
The fact that φ is injective implies that mult is extensional, as
we noted several times earlier. First we shall check the 1-variable
case of combinatorial completeness of mult.
Definitions. (More-or-less repeated from much earlier.)
Say that f AA is 1-representable when there is a ζ A such that
∈ ∈
f (a) = ζ a for all a A .
· ∈
Define the set of 1-algebraically definable functions in AA to be
the small- est subset of AA containing idA and all constant
functions, and closed under pointwise multiplication of
functions.
1- combinatorial completeness of A is the fact that the two notions
just above coincide.
Its proof is now painless in the form
1-representable = 1-algebraically definable
⇒
= structure-preserving =1-representable .
⇒
The first implication is because a 1-representable f as in the
definition is the pointwise multiplication of (the constant
function with value ζ) times (the identity function).
The second implication is the fact noted above that the set
[A−~ A] is an example of a set containing idA and all constant
functions, and closed
under pointwise multiplication of functions.
(Of course, the equation in the definition of 1-representability
can be re- written as f = φ(ζ) , so the equivalence of 1-
representability with structure preserving becomes pretty
obvious.)
The third implication goes as follows :
Given f ∈ [A−~ A], define ζ to be ψ(f ) . then
ζ · a = ψ(f ) · a = φ(ψ(f ))(a) = f (a) ,
1
as required.
Now we shall check the 2-variable case of combinatorial
completeness, and leave the reader to pump this up into a proof
for any number of variables.
1
Definitions.
Say that f ∈AA×A is 2-representable when there is a ζ∈A such that
f (b, c) = (ζ b) c for all b and c in A .
·
Define the set of 2-algebraically definable functions in AA×A to be
the smallest subset of AA×A containing both projections and all
constant func- tions, and closed under pointwise multiplication
of functions.
2- combinatorial completeness of A is the fact that the two notions
just above coincide.
Its proof is much as in the 1-variable case :
2-representable = 2-algebraically definable
⇒
= structure-preserving =2-representable .
⇒
The first implication is because a 2-representable f as in the
definition is a suitably sequenced pointwise multiplication of the
constant function with value ζ and the two projections.
The second implication is the fact that the set [A A ~ A] is an ex-
×
ample of a set containing the projections and all constant
functions, and closed under pointwise multiplication of
functions. The latter is proved by composing:
A×A δ (A × A) × (A × f A× muLt
− A
A) × A
−→ g
→ −
(b, c) '→ ((b, c), (b, c)) '→ (f (b, c), g(b, c))'→ f (b, c) · g(b,
c) The third implication goes as follows :
If f [A A ~ A], the composite ψ adj(f ) is in [A ~ A], using axiom
∈ first
(7) for the × −time. By the part of ◦ the 1-variable − case saying
that structure preserving implies 1-representable, choose ζ so
that, for all b ∈ A, we have ψ(adj(f )(b)) = ζ · b . Then
ζ · b · c = ψ(adj(f )(b)) · c = φ(ψ(adj(f )(b)))(c) = adj(f )(b)(c) = f (b, c) ,
as required.
1
maybe should be a partial order with some extra properties (so
that, in the application, it intuitively coincides with ‘comparing
information content’).
Now I believe (though it’s not always emphasized) that an
important part of Scott’s accomplishment is not just to be first
to construct a λ-model (and do so by finding a category and a
self-reflective object in it), but also to show how to start with an
individual from a rather general species of posets
(in the application, from
{ ⊥
numbers , truth values , at least)
, and show how to embed it (as a poset) into an
extensional λ-model.
So let’s start with any poset (D, ) and see how far we can
±
get before having to impose extra properties. Recall that the
definition of poset requires
(i) a b and b c implies a c ; and
± ±
(ii) a b and b a if and only if a = b .
±
Temporarily define [D ~ D] to consist of all those functions f
−
which pre- serve order; that is
d ± e =⇒ f (d) ± f (e) .
More generally this defines [D ~ E] where E might not be the
−
same poset as D.
Now [D ~ D] contains all constant functions, so we can
embed D into − [D ~ D] by φ : d (d' d) . This φD : D [D
D
~ D]— will seldom be '→ surjective.
'→ → d−to the constant
It maps each
function with value d.
Next suppose that D has a minimum element called ⊥ . Then
we can map [D ~ D] back into D by ψD : f f ( ) . It maps
each function
— to '→
its minimum value.
It is a trivial calculation to show that ψD φD = idD, the
◦
identity map of D. By definition, this shows D to be a retract of
[D ~ D] . (Note how this is the reverse of the−situation in the
digression of the last subsection, on the general definition of λ-
model, where [D−~ D] was a retract of D.) In particular, φD is
injective (obvious anyway) and ψD is surjective.
The set [D−~ D] itself is partially ordered by
f ±g ⇐⇒ ∀d , f (d) ± g(d) .
1
This actually works for the set of all functions from any set to any poset.
How do φD and ψD behave with respect to the partial orders
on their domains and codomains? Very easy calculations show
that they do preserve the orders, and thus we have
φD ∈ [D−~ [D−~ D]] and ψD ∈ [[D−~ D]−~ D] .
1
So D is a retract of [D ~ D] as a poset, not just as a set. But [D ~
— −
D] is usually too big—the maps above are not inverses of each
other, only injective and surjective, and one-sided inverses of
each other, as we already said.
Now here’s another of Scott’s ideas : whenever you have a retraction pair
‹→ E , you can automatically produce another retraction pair
D
←
[D−~ D]
← [E−~ E] .
‹→
If (φ, ψ) is the first pair and (φ', ψ') the second pair, then the
formulae defining the latter are
φ'(f ) := φ ◦ f ◦ ψ and ψ'(g) := ψ ◦ g ◦ φ .
Again a very elementary calculation shows that this is a
retraction pair. [As an aside which is relevant to your
further reading on this subject,
notice how what we’ve just done is purely ‘arrow-theoretic’ :
the only things used are associativity of composition and
behaviour of the identity mor- phisms. It’s part of category theory.
In fact Lambek has, in a sense, iden- tified the theory of
combinators, and of the typed and untyped λ-calculi, with the
theory of cartesian closed categories. The “closed” part is basically
the situation earlier with the axioms (1) to (7) where the set of
morphisms between two objects in the category is somehow
itself made into an object in the category, an object with good
properties.]
Now it is again entirely elementary to check that, for the
category of posets and order-preserving maps, the maps φ' and ψ'
do in fact preserve the order.
Back to the earlier situation in which we have the poset D as
a retract of the poset E = [D ~ D], the construction
−
immediately above can be iterated : Define
D0 := D , D1 := [D0−~ D0] , D2 := [D1−~ D1] , etc. · · · .
Then the initial retraction
pair D0 ‹→ D
←
1
gives rise by the purely cate gory-
1
theoretic construction to a sequence of retraction ‹←
→ Dn+ . We’ll
pairs Dn 1
denote these maps as (φn, ψn) .
Ignoring the surjections for the moment, we
have
φ0 φ1 φ2
D0 ‹→ D1 ‹→ D2 ‹→ · · · .
1
Wouldn’t it be nice to be able to ‘pass to the limit’, producing a
poset D∞ as essentially a union. And then to be able to just
set n = ∞ = n + 1, and say that we’d get canonical bijections
back-and-forth between D∞ and [D∞ ~ D∞] ??? (The first is n
+ 1, and the second one is n, so to speak.)
After—all, that was the original objective here !! In fact, going
back to some of the verbosities in Subsection VII-3, this looks
as close as we’re likely to get to a mathematical situation in
which we have a non-trivial structure D∞
which can be identified in a natural way with its set of (structure-
preserving!) self-maps. The binary operation on D∞ would of
course come from the adjointness discussed at the beginning of
this subsection; that is,
a∞ · b∞ := (φ∞(a∞))(b∞) ,
1
to go back and change poset to complete lattice and order
preserving map to continuous map. The definitions impose an
extra condition on both the objects and the morphisms.
Definition. A complete lattice is a poset (D, ), where every
±
subset of D has a least upper bound. Specifically, if A D, there is
⊂ ∈
an element l D such that
(i) a l for all a A ; and
± ∈
(ii) for all d D, if a d for all a A, then l d .
∈ ± ∈
It is immediate that another least upper bound m for A must
satisfy both l ± m and m ± l , so the least upper bound is
unique. We shall use HA to denote it. In particular, a complete
lattice always has a least element H∅, usually denoted ⊥ .
Definition. A function f : D E between two complete
→
lattices is called continuous if and only if f ( A) = f (A) for all
H H
directed subsets A of D, where f (A) is the usual image of the
subset under f . Being directed means that, for any two elements
a and b of A, there is a c ∈ A with a ± c
and b ± c.
It follows for continuous f that f ( D) = E. Also, such an f preserves
⊥
order, using the fact that, if x y, the x y := x, y = y . If f is
± H H{ }
bijective, we call it an isomorphism. Its inverse is automatically
continuous.
Definition. The set [D ~ E] is defined to consist of all
—
continuous maps from D to E.
Scott’s Theorem. Let D be any complete lattice. Then
there is a complete lattice D∞ which is isomorphic to [D∞ ~ D∞],
—
and into which D can be embedded as a sublattice.
For the theorem to be meaningful, the set [D∞ ~ D∞] is
made into a complete lattice as indicated in—the first exercise
below. In view of that exercise, this theorem is exactly what we
want, according to our elementary
axiomatic development in the last several pages.
For Scott’s theorem, we’ll proceed to outline two proofs in
the form of sequences of exercises. First will be a rather “hi-
fallutin’ ” proof, not much different than Lawvere’s suggestion in
[La-ed] p.179, but less demanding of sophistication about
categories on the reader’s part (as opposed to categorical
1
sophistication, which must mean sophistication about one and only
one thing,
1
up to isomorphism!).
Both proofs use the following exercise.
Ex. VII-8.1. (a) Verify the seven axioms for the category of
complete lattices and continuous maps, first verifying (and doing
part (b) below simul- taneously) that D E and [D ~ E] are
× —
complete lattices whenever D and E are, using their canonical
orderings as follows :
1
with respect to the maps θ∞n : D∞ → Dn . That is, given a
complete lattice E and continuous maps ηn : E → Dn such that
ηn = ψn ◦ ηn+1 for all n, there is a unique continuous map η∞ : E
D∞ such that ηn = θ∞n η∞ for all n. → ◦
Ex. VII-8.3.(General category theory)
(a) Show quite generally from the arrow-theoretic definition
that, given an infinite commutative ladder
0
A ←α—
A ←α A ←α · · ·
0 1 2
—1 —2
ζ0 ↓ ζ1 ↓ ζ2 ↓
B0 ←—
β
B1 ←—
β B2 ←—
β ···
1
Ex. VII-8.4. Show directly from the arrow-theoretic
definition that [D∞—~ D∞] together with its maps η∞n : [D∞—~
Remark. The last three exercises complete the job, but the
first and last may be a good challenge for most readers. They
are likely to involve much of the material in the following more
pedestrian approach to proving
Scott’s theorem (which is therefore not really a different proof),
and which does exhibit the isomorphisms D∞ ←−→— [D∞—~ D∞]
quite explicitly.
Ex. VII-8.5. (a) Show that (φ0 ◦ ψ0 )(f ) ± f for all f ∈ D1 .
(b) Deduce that, for all n, we have (φn ◦ ψn)(f ) ± f for all f ∈ Dn+1 .
The next exercise is best remembered as : “up, then down,
(or, right, then left) always gives the identity map” whereas :
“down, then up, (or left, then right) never gives a larger
element” We’re thinking of the objects as lined up in the usual
way :
D0 D1 D2 D3 D4 ······ D∞
x ∈ Da ,
whereas, for all other a, b and c, the maps θbc ◦ θab and θac are actually equal.
1 .
Starting now we shall have many instances of n dn. In every
case, the sequence of elements dn form a chain with respect to ,
and so a directed set, and thus the least ± upper bound does
exist by completeness of D. Checking the ‘chaininess’ will be
left to the reader.
1
Definitions. Define φ∞ : D∞ → [D∞—~ D∞] by
.
φ∞(x) := (θk,∞ ◦ (θ∞,k+1(x)) ◦ θ∞,k) .
Define ψ∞ : [D∞—~ D∞]k → D∞ by
.
ψ∞(f ) := θn+1,∞(θ∞,n ◦ f ◦ θn,∞) .
n
1
Using these last four exercises, we can now complete the more
mundane of the two proofs of Scott’s theorem, by directly
calculating that φ∞ and ψ∞ are mutually inverse :
For all x ∈ D∞ ,
.
ψ∞(φ∞(x)) = ψ∞ ( (θk,∞ ◦ (θ∞,k+1(x)) ◦ θ∞,k)) (definition of φ∞)
. k
= ψ∞(θk,∞ ◦ (θ∞,k+1(x)) ◦ θ∞,k) (since ψ∞ is continuous)
. . k
= θn+1,∞(θ∞,n ◦ θk,∞ ◦ (θ∞,k+1(x)) ◦ θ∞,k ◦ θn,∞) (definition of ψ∞)
k n
. .
= θn+1,∞(θk,n ◦ (θ∞,k+1(x)) ◦ θn,k)(by VII — 8.6(b))
k n
.
= θk+1,∞(θ∞,k+1(x)) = x , as required, by VII — 8.8 and VII — 8.10.
k
1
Theorem VII-8.12. Let D be a complete lattice, and consider operators
Ω ∈ [D—~ D]. Define
.
fix : [D—~ D] → D by fix(Ω) := Ωn(⊥ ) .
D
n
Then fix(Ω) is the minimal fixed point of Ω, and fix is itself continuous.
In particular, any continuous operator on a complete lattice has at
least one fixed point.
as required.
Finally, given a directed set O ⊂ [D—~ D], we have
. . . . .
fix( Ω) = ( Ω)n(⊥) = ( Ωn)(⊥)
Ω∈O n n Ω∈O
Ω∈O
. . . .
= (Ωn(⊥)) (Ωn (⊥)) = fix(Ω) ,
.
=
n Ω∈O Ω∈O n Ω∈O
1
as required, where justifications of the three middle equalities are
left to the reader. And so, fix is continuous.
1
VII-9 Two (not entirely typical) Examples of Denotational Semantics.
We shall write out in all detail what presumably ought to be
the de- notational semantics of ATEN (earlier used to define
computability), then illustrate it with a few examples. In the
second half, we re-write the maps ρ+ in the style of
“denotational semantics for the λ-calculus”, finishing with
several interesting theorems about ρ+ when the domain is D∞,
which also gives practice calculating in that domain.
Denotational semantics of ATEN.
Since machine readability and executability seem to be central
concerns here, the formulas will all be given very technically, and
we’ll even begin with a super-formal statement of the syntax
(though not exactly in the standard BNF-style).
Here it all is, in one largely human-unreadable page. See
the following remarks.
BRA := { ) , ( } , which merely says that we’ll have lots of brackets, despite some CSers’
abhorrence. v ∈ IDE := { x | ∗ || x || v∗ } which says, e.g. that the ‘real’ x3 is x∗∗∗, and x0 is
just x.
s, t ∈ EXP := { BRA | IDE | + | × | 0 | 1 || v | 0 | 1 || (s + t) | (s × t) } .
F, G ∈ EXP ′ := { BRA | EXP | < | ≈ | ¬ | ∧ || s < t | s ≈ t || ¬F | (F ∧ G) } .
C, D ∈ COM := { BRA | IDE | EXP | EXP ′ | ←: | ; | whdo || v ←: t || (C; D) | whdo(F )(C) } .
1
′
E : EXP ′ → [STATE—~ BOOL] defined by
′
saying firstly that E [[F ]](σ) :=⊥BOOL if either F is an atomic formula involving a
term s with E[[s]](σ) :=⊥N , or if F is built using ¬ or ∧ using a formula G for
which E ′ [[G]](σ) :=⊥BOOL ; otherwise
Remarks. The first five displays are the syntax. All but the
first give a set of strings in the usual three stages: first, all
{ ||
symbols to be used next, which| are the atomic strings
and finally,
} how the
‘production’ of new strings is done (induction on structure) .
To the far left are ‘typical’ member(s) of the string set to be
defined, those being then used in the production and also on
lines further down. [Where whole sets of strings
are listed on the left of the symbols-to-be-used listing and
other times as well, a human interpreter thinks of those
substrings as single symbols, when
they appear inside the strings being specified on that line. For
example, (x ∗ ∗ ∗ +x ∗ ∗) has ten symbols, but a human ignores the
brackets and thinks of three, i.e. x3 + x2—and maybe even as a
single symbol, since, stretching
it a bit, however complicated, a term occurring inside a
formula is intuited as a single symbol in a sense, as a
component of the formula. And similarly, a term or a formula
within a command is psychologically a single symbol.
Sometimes the phrase “immediate constituents” is used for this, in
explaining below the inductive nature of the semantic function
definitions.]
1
The last of the five lines is the syntax of ATEN itself. The
third and fourth lines give the syntax of the assertion language,
first terms, then (quantifier-free) formulas, in 1storder number
theory. We’ve used names
1
which should remind CSers of the word “expression”, but they
might better be called TRM (= EXP ) and FRM (= EXP ') from
much earlier stuff here. Often, CSers would lump EXP and EXP ′ together
into a single syntactic category.
Why they do so is not unmysterious to me; I am probably missing some considerable
subtlety. But I am well aware of the confusion that tendency in earlier CS courses causes
for average students when I teach them logic. They initially find it strange that I should
make a distinction between strings in a formal language which we intuit as standing for
objects, and other strings which we intuit as standing for statements! On the other hand,
I myself find it strange to think of (3 < 4) + ((5 = 2) + 6) as any kind of expression! (and
not because 5 = 2) Another inscrutability for me is that many of the technicalities on the
/
previous page would normally be written down in exactly the opposite order in a text on
programming languages! While I’m in the mood to confess a mentality completely out of
synch with the CSers, here’s another bafflement which some charitable CSer will hopefully
straighten me out on. Within programs in, say, PASCAL, one sees the begin—end and
use of indentation as a replacement for brackets, which makes things much more readable
(if any software could ever be described as (humanly) readable). But there seems to be
a great desire to avoid brackets, and so avoid non-ambiguity, in writing out the syntax of
languages. Instead, vague reference is made to parsers. I can appreciate a desire
sometimes to leave open different possibilities. A good example is the λ-calculus, where
using A(B) rather than (AB) in the basic syntactic setup is a possibility, and probably better
for psychological purposes, but not for economy. And the semantics is largely
independent of the parsing. But when precision is of utmost importance, I cannot
understand this tendency to leave things vague. Of course, there still remains a need to
specify algorithms for deciding whether a string (employing the basic symbols of the
language) is actually in the language. But surely that’s a separate issue.
The sixth display gives information about the semantic domains
needed. We haven’t said what V AL actually “is” (hence the
“=”), other than that it contains as elements and all the
⊥
natural numbers. Below we come clean on that. The use of [
—
~ ] is some indication that these semantic sets have some
structure, actually a partial order.
Finally, the three semantic functions, corresponding to terms,
formulae, and commands, are given. We have been super-strict in
distinguishing nota- tionally between the symbol “+” and the
actual operation “+N ” on natural numbers, and similarly for “ N
·
” and “<N ”. Some elementary confusions re- volve around this
point, in the presence of ambiguous notation, which point
otherwise
≈ might seem overly fussy. We have, in the same vein,
used our usual “ ” as the formal equality symbol, to distinguish
it from the actual relation
of sameness. Each of the three semantic functions is defined
1
by structural induction on the productions defined at the right-
hand ends of the corre- sponding syntactic sets. As mentioned in
defining S, the operations +N and
1
·N produce ⊥N if either or both of their inputs is ⊥N . And from
the defini- tion of ', the relation <N has no need to concern
S comparing N with anything.
itself with
As for , we’re just saying how, using ’s adjoint, a term
together S S
with a state will yield a number, exactly as in Tarski’s
definition of truth. In fact,
S[[t]](σ) is just another name for tv, where v is identified with that state σ mapping each xn to vn.
1
ALGOL-like) languages that this impression of sterility dissipates.
The need for anything like a self-reflective domain as constructed
in the previous sub- section is unclear. But we do at least need
—
that D = [STATE ~ STATE] is a lattice for which Tarski’s theorem
on minimum fixpoint operators works. That follows as long as
STATE is a complete lattice, which itself follows as long as V AL
is. Therefore we complete the unfinished business on the
previous page by specifying
C[[ite(F )(C)(D)]](σ) := (if S'[[F ]](σ) = Tr, then C[[C]], else C[[D]])(σ) .
1
currying/uncurrying.) Further down we do the same for the
whdo- command, to which similar bafflements on my part apply
with respect to the versions in [Al] and [Go]. Re-define (without
really changing the definition)
1
where
ddg : STATE → STATE × STATE × STATE ; σ '→ (σ, σ, σ)
is the double diagonal map, and
con : BOOL × STATE × STATE → STATE
(Tr, σ, τ ) '→ σ ; (Fs, σ, τ ) '→ τ ; (⊥BOOL, σ, τ ) '→ ⊥STATE ;
is the conditional. (This has turned out a bit simpler than in the
above refer- ences, with no need to fool around with defining a *-
operator, partly because we are not insisting on currying
everything, with its attendant contortions.)
In any case, this isolates the facts that we definitely need ddg
and con to be continuous, which they are, but otherwise no
further comment is needed.
Now we can re-write the while-do semantic definition also in this style :
1
in the domain [STATE—~ STATE] which denotes a program that
always loops ! A better name for that function is ⊥[ST AT E−~ST ATE]
. The previous ⊥ is ⊥STATE = ⊥ [I D E −~V AL] , namely the function
which maps all ν ∈ IDE to
⊥V AL , which Dana Scott refers to as “the undefined”.
(2) If C = whdo(x0 < 1)(x0 : x0 + 1) , we see that any initial
←
state is unchanged by the command, except when bin 0 contains
zero. In the latter case, the zero in bin 0 is changed to 1, and
then the process terminates. First here are three preliminary
calculations:
S[[x0 + 1]](σ) = S[[x0]](σ) +N S[[1]](σ) = σ(x0) +N 1N .
= fix(f '→ (σ (
'→ f (σ[x0 '→ 1N ]
) if σ(x0) = 0 ;
N
σ if σ(x0) 0N .
Now let f1 be any fixed point of the latter
operator. So
( ;
'→ 1N ]
f1 (σ[x0 ) if σ(x0) =
0
N
f1(σ) =
σ if σ(x0) /= 0N .
But the lower line then determines the upper line, and we
conclude that there is only one fixed point in this case (no
minimization needed!), given by
(
f1(σ) = σ[x0 '→ 1N ]
if σ(x0) = 0N;
σ if σ(x0) /= 0N .
Well, this last function is exactly the one the command was
supposed to compute, i.e. ‘if necessary, change the 0 in bin
1
zero to a 1, and do nothing else’, so our definition is doing the
expected here as well.
(3) If C = whdo(x0 < 1)(x1 : x1) , we see that any initial state
←
is unchanged by the command, except when bin zero contains
0. In the latter case, the program does an infinite loop.
1
First here are the three preliminary ‘calculations’:
S[[x1]](σ) = σ(x1) ;
(
S'[[x0 Tr if σ(x0) <N 1N ;
< 1]] = Fs otherwise .
From the definition, C[[C]] is given by
fix(f '→ (σ '→ (if S'[[x0 < 1]](σ) = Tr, then (f ◦ C[[x1 ←: x1]])(σ), else σ )))
(
f (σ) if σ(x0) = 0N ;
= fix(f (σ'→
'→ σ if σ(x0) /= 0N .
Suppose that f2 is a fixed point of the latter operator. So
(
f2(σ) if σ(x0) = 0N ;
f (σ)
2 =
σ if σ(x0) .
The top line says nothing, and any such function is a fixed
point. Thus, clearly f3 is the minimal fixed point, where
(
⊥ if σ(x0) = 0N ;
f (σ)
3
=
σ if σ(x0) .
Once again, this last function is exactly the one the command was
supposed to compute, i.e. ‘loop if bin 0 has a 0, otherwise,
terminate after doing nothing’. So our definition is doing the
expected, giving at least a little reinforcement to our
confidence in the technicalities.
1
instructive, even in the direction of writing an general algorithm
to translate programs in that language into commands in
ATEN.
We would like to use the ‘command’
x0 ←: 1 ; whdo(0 < x1)(x0 ←: x1 × x0 ; “x1 ←: x1 — 1”) .
After approximately “ν1” cycles, this should terminate, leaving
the number “ν1!” in bin zero (where the natural number νi is the
initial content of bin number i, i.e. it is σ(xi) below). And below
we demonstrate that the semantic definitions prove this fact. The
quotation marks are there because x1 — 1 is
not a term. We take “x1 ←: x1 — 1” to be an abbreviation for
x2 ←: 0 ; whdo(¬x2 + 1 ≈ x1)(x2 ←: x2 + 1) ; x1 ←: x2 ,
omitting associativity brackets for “;”. So this produces that
predecessor function which is undefined on 0.
So we’ve got a fairly lengthy job, much of which will be left
as exercises for the reader.
First show that
C[[whdo(¬x
+1 x1)(x2 f (σ[x2'→σ(x2)+1]) if σ(x2) + 1 = σ(x1) ;
2 x2+1)]] fix(f '→ (σ
≈ ←: = σ if σ(x2) + 1 = σ(x1) .
Then argue (by induction on σ(x1) σ(x2) in the first case) that
the minimal fixed point here is—given by
σ '→
⊥ if σ(x2) ≥ σ(x1) .
σ[x2'→σ(x1)−1] if σ(x2) <
Now argue σ(x1) ;
that
C[[“x1
←: x1 — 1”]] = σ[x2'→σ(x1)−1][x1'→σ(x1)−1] if σ(x1) /= 0 ;
σ ⊥ if σ(x1) = 0 .
1
The next step is yet another application of the definition of the
semantic function on whdo-commands to yield
C[[whdo(0
)]] = fi( '→
)( E σ σ[x0'→σ(x1)σ(x2)][x1'→σ(x
σ
1)−1][x2'→σ(x1)−1] ififσ(x1)1) =0 ;
σ(x
< '→
(
1
A more subtle argument than the earlier analogues then yields
the minimal fixed point as
σ '→
σ[x0'→σ(x1)!σ(x0)][x1'→0][x2'→0] if σ(x1) = 0 ;
σ if σ(x1) = 0 .
1
Denotational semantics of Λ
This consists of super-formal statements of the syntax from
the very be- ginning, about 100 pages back, and of the
specifications in the definition of λ-model. See the first half of this
subsection for an analogue and an explana- tion of the syntax
specification in the first three lines below. Here it is, again largely
human-unreadable, but perhaps more mechanically translatable
for writing an implementation. See Ch. 8 of [St] for much more
on this.
BRA := { ) , ( } .
ν ∈ IDE := { x | ∗ || x || ν ∗ } .
A, B ∈ Λ = EXP := { BRA | IDE | λ | • || ν || (AB) | (λν • A) } .
V AL = any λ—model
. ENV = [IDE → V
AL] .
For ρ ∈ ENV , ν ∈ IDE, and d ∈ V AL, define ρ[v'→d]∈ ENV to agree with
ρ except it maps the variable (‘identifier’) ν to d .
1
(B)]
(∗∗) S[[A[x→B]]](ρ) = S[[A]](ρ[x'→E[[B]](ρ)]) ; i.e. ρ+(A[x→B]) = ρ[x'→ρ+ (A) .
+
1
None of these proofs requires any great cleverness.
Later we discuss how, after passing to equivalence classes,
and when Scott’s D∞ is V AL, the map on EXP/ coming from ρ+
is not injective, but that it becomes≈ so when restricted to the
set of equivalence classes of
closed terms which have a normal form; i.e. closed normal
form terms all map under this denotational semantics to
different objects in the λ-model, this being independent of ρ.
We’ll now go back to the handier ρ+ notation.
Proving the theorem of David Park below gives an
opportunity for illus- trations of this “denotational semantics of
the λ-calculus” (and especially for calculations to increase one’s
familiarity with the innards of Scott’s D∞).
Recall that Curry’s fixpoint (or “paradoxical”) combinator Y is a closed
term, so it is unambiguous to define
Y := ρ+(Y ) ∈ D∞ ,
i.e. it is independent of the particular ρ used.
1
here in VII-8. Park sketches how a different choice of (φ0, ψ0)
gives a different answer. So this the- orem definitely does not
generalize as is, to an arbitrary extensional λ-model
1
with compatible complete lattice structure. Later we use
Park’s theorem a couple of times to give other explicit results
about the denotational seman- tics of Λ, including another
fixpoint operator, though not Turing’s fixpoint combinator
(which the reader might like to think about).
1
Readers who wish to go further with this and consult the literature
will need to accustom themselves to two ‘identifications’ which
we are avoiding for conceptual clarity (but giving ourselves
messier notation) :
1
(i) identifying Λ with its image ρ+(Λ) ⊂ D∞, which is ambiguous on
non-closed terms in Λ , and also ambiguous in another sense ‘up to ≈’ ;
(ii) identifying Dn with its image πn(D∞) ⊂ D∞, that is, the
image of the map θn,∞ .
1
11. Using the definition of φ∞, and a · ⊥= φ∞(a)(⊥) , show that, for all
a ∈ D∞ , we have
π0a ± π0(a · ⊥) .
(Actually, equality holds here, but is unneeded below.)
19. Using the definitions of “·” and of fix, deduce from 2, 15 and 18 that
φ∞(Y )(x) ± fix(φ∞(x)) .
19.
.
φ∞(Y )(x) = Y · x = X · X = X · πn X
n
. . .
= X·πnX ± φ∞(x) (⊥) =
n+2
φ∞(x)n(⊥) =
fix(φ∞(x)) .
n n n
1
20.
φ∞(x)(φ∞(Y )(x)) = x·(Y ·x) = ρ+(x(Y x)) = ρ+(Y x) = Y ·x = φ∞(Y )(x) .
11.
.
π0(a · ⊥) = θ0∞(θ∞0(φ∞(a)(⊥))) = θ0∞θ∞0 (θk∞ ◦ θ∞,k+1(a) ◦ θ∞k)(⊥∞)
. k
= θ0∞(θk0(θ∞,k+1(a)(⊥k))) ± θ0∞(θ00(θ∞,1(a)(⊥0)))
k
= θ0∞(ψ0(θ∞,1(a))) = θ0∞(θ∞,0(a)) = π0a .
The step after the inequality uses the definition of ψ0 . To
strengthen the inequality to equality (which is needed further
down), one shows that all
terms in the lub just before the “ ” agree with π0a, by a slightly
±
longer argument. For example, with k = 1, we get down to
θ0∞(θ10(θ∞,2(a)(⊥1))) = θ0∞(ψ0(θ∞,2(a)(φ0(⊥0)))) = θ0∞((ψ0◦θ∞,2(a)◦φ0)(⊥0))
= θ0∞(ψ1(θ∞,2(a))(⊥0)) = θ0∞(ψ0(ψ1(θ∞,2(a)))) = θ0∞(θ∞,0(a)) = π0a .
1
Using the evident fact that I A ≈ A for all A ∈ Λ, it is clear
that I · d = d for all d ∈ D∞ ; that is, every element of D∞ is a
fixed point of φ∞(I), which is the identity map of D∞ . In
particular, the minimum fixed point of φ∞(I) is certainly ⊥ .
And so, from Park, we see that
ρ+(Y I) = ⊥ .
As mentioned earlier, by (∗) from about 7 pages back , we can only expect
ρ+ to be possibly injective after passing to equivalence classes under . And
≈
it certainly won’t be on non-closed terms in general unless ρ itself
is injective. But for closed terms without normal forms, it still
isn’t injective in general, as we see further down. However, we
do have the following ‘faithfulness of the semantics’ for terms
with normal forms.
Theorem VII-9.2. If E and F are distinct normal form terms in
Λ (i.e. they are not related by a change of bound variables), and
continuing with
Scott’s D∞ as our domain, for some ρ, we have ρ+(E) /= ρ+(F ) . In
partic- ular, the restriction of ρ+ to closed normal form terms
(which restriction is
independent of ρ) is injective.
This will be almost immediate from another quite deep
1
syntactical result whose proof will be omitted:
1
B¨ohm’s Theorem VII-9.3. (See [Cu], p.156) If E and F
are as in the previous theorem, and y1, · · · , yt are the free variables
in EF, then there are terms G1, , Gt, and H1, , Hk for some k, such
that for some· · distinct
· ·
variables u and ν
→ →
E [ →y→ G ]H1 · · · Hk uν ≈ u and F [ →y→ G ]1
H ···H
k
uν ≈ ν .
1
ρ+(M '') = ρ+(N ) ,
1
completing this part of the proof. To see this last equality, note
that, in the notation way back in VII-1.1, it is easy to see that
CA := = λy • A(yy) ,
x
C[x→A]
the leftmost reduction of J = Y F goes as follows :
1
A B C ≈ F A B C = (λfxy • x(fy))A B C ≈ B (A C)
1
Applying ρ+ , we get, for all B and C in D∞ ,
A · B · C = B · (A · C) (∗∗)
To show that A = I, we’ll now use only (∗∗). It suffices to show that
πnA = πnI for all n, by induction on n, involving the projections πn intro-
duced back in the discussion of Park’s theorem. Here is a
repeat of three of the exercises from back there, plus a list of
four new general exercises for the reader to work on. But if
needed, see after the end below of the present theorem’s proof
for some hints which make their proofs completely mechanical.
5. (π0a) · b = π0a .
9. (πn+1a) · b = (πn+1a) · (πnb) .
11. π0a = π0(a · ⊥) .
23. πn ⊥ = ⊥ .
24. ⊥·x= ⊥ .
25. I·C = C .
26. (πn+1a) · b = πn(a · πnb) .
In both initial cases and also in the inductive case, we’ll use
the exten- sionality of D∞ twice.
The initial case n = 0 :
Using 5, then 5, then 11, for all B, x and y in D∞ ,
π0B · x · y = π0B · y = π0B = π0(B· ⊥) .
Thus, with B = I, using the above, then 25,
π0I · x · y = π0(I· ⊥) = π0 ⊥ .
With B = A, using the above, then 11, then (∗∗), then 24,
π0A · x · y = π0 (A· ⊥) = π0 (A· ⊥ · ⊥) = π0(⊥ ·(A· ⊥)) = π0 ⊥ .
So π0A = π0I, by extensionality.
1
The initial case n = 1 :
Using 26, then 25, then idempotency of π0, then 5,
Now using 26, then the inductive assumption, then 26, then 25,
then idem- potency of πn,
So, using 26, then 26, then (∗∗), then 9, then 26, then idempotency of
πn+1, then the display just above,
1
As for 23, this just amounts to the fact that
G := λuν • ν(uν) .
1
Proposition VII-9.6.
∀Z ∈ Λ , [ G Z ≈ Z ⇐⇒ ∀A ∈ Λ , A (Z A) ≈ Z A ] .
That is, Z is a fixed point of G if and only if it is a fixpoint operator.
Proof. First note that
G Z = (λuν • ν(uν))Z ≈ λν • ν(Zν)
Proof. Both Y · A and Y ' · A are fixed by A · . But Park says that Y · A
is the smallest of all elements in D∞ fixed by A · .
Corollary VII-9.9 (of ⇐= in 9.6).
GY ≈ Y and so G · Y = Y
1
Corollary VII-9.10
1
References
[Al] Allison, Lloyd A Practical introduction to denotational semantics.
Cambridge U. Press, Cambridge, 1989.
[Ba] Barendregt, H. P. The Lambda Calculus : its syntax and semantics.
North-Holland, Amsterdam, 1984.
[BKK-ed] Barwise, J., Keisler, H.J. and Kunen, K. The Kleene
Sympo- sium. North-Holland, Amsterdam, 1980.
[ B¨o - ed] B¨ohm, C. λ-calculus and computer science theory : Proceedings.
LNCS # 37, Springer-Verlag, Berlin, 1975.
[Br-ed] Braffort, Paul. Computer Programming and Formal Systems.
North-Holland, Amsterdam, 1963.
[Ch] Church, Alonzo The Calculi of Lambda-Conversion. Princeton
U. Press, Princeton, 1941.
[CM] Hoffman, P. Computability for the Mathematical. this
website, 2005.
[Cu] Curry, H. B., Hindley, J. R. and Seldin, J. P. Combinatory Logic, Vol.
II. North-Holland, Amsterdam, 1972.
[En] Engeler, Erwin, et al. The Combinatory Programme.
Birkh¨auser, Basel, 1995.
[Fi] Fitch, F. B. Elements of Combinatory Logic. Yale U. Press,
New Haven, 1974.
[Go] Gordon, M. J. C. Denotational Description of Programming
Lan- guages. Springer-Verlag, Berlin, 1979.
[Go1] Gordon, M. J. C. Programming Language Theory and its
Imple- mentation. Prentice Hall, New York, 1988.
[Hi] Hindley, J. R. Standard and Normal Reductions. Trans.AMS, 1978.
[HS] Hindley, J. R. and Seldin, J. P. Introduction to Combinators
and the λ-calculus. London Mathematical Society Student Texts #
1, Cambridge U. Press, Cambridge, 1986.
1
[HS-ed] Hindley, J. R. and Seldin, J. P. To H. B. Curry : Essays on
Com- binatory Logic, lambda-calculus, and formalism. Academic Press,
London, 1980.
[Kl] Klop, J. W. Combinatory Reduction Systems. Mathematisch
Cen- trum, Amsterdam, 1980.
[Ko] Koymans, C. P. J. Models of the lambda calculus. CWI Tract,
Ams- terdam, 1984.
[La-ed] Lawvere, F. W. Toposes, Algebraic Geometry and Logic.
LNM # 274, Springer-Verlag, Berlin, 1972.
[LS] Lambek, J. and Scott, P. J. Introduction to higher order
categorical logic. Cambridge U. Press, Cambridge, 1986.
[LM] Hoffman, P. Logic for the Mathematical. this website, 2003.
[Pe] Penrose, R. The Emperor’s New Mind. Oxford U. Press,
Oxford, 1989.
[Ru-ed] Rustin, Randall Formal Semantics of Programming Languages.
Prentice-Hall, N.J., 1972.
[SAJM-ed] Suppes, P.—Henkin, L.—Athanase, J.—Moisil, GR. C. Logic,
Methodology and Philosophy of Science IV. North-Holland, Amsterdam,
1973.
[St-ed] Steel, T.B. Formal Language Description Languages for
Computer Programming. North-Holland, Amsterdam, 1966.
[ S ¨o ] Stenlund, S¨oren Combinators, λ-terms, and Proof Theory.
Reidel Pub. Co., Dordrecht, 1972.
[St] Stoy, Joseph Denotational Semantics : the Scott-Strachey
approach to programming language theory. MIT Press, Cambridge,
Mass., 1977.
[Wa] Wadsworth, Christopher P. The Relation between
Computational and Denotational Properties for Scott’s D∞-Models of the
Lambda-calculus. SIAM J. Comput. 5(3) 1976, 488-521.