tional Programming, September 8{10, 1994, Ayr, Scotland.
Springer
Workshops in Computing, 1995.
A Tutorial on Co-induction and
Functional Programming
Andrew D. Gordon
University of Cambridge Computer Laboratory,
New Museums Site, Cambridge CB2 3QG, United Kingdom.
[email protected]Abstract
Co-induction is an important tool for reasoning about unbounded
structures. This tutorial explains the foundations of co-induction, and
shows how it justies intuitive arguments about lazy streams, of central
importance to lazy functional programmers. We explain from rst principles a theory based on a new formulation of bisimilarity for functional
programs, which coincides exactly with Morris-style contextual equivalence. We show how to prove properties of lazy streams by co-induction
and derive Bird and Wadler's Take Lemma, a well-known proof technique
for lazy streams.
The aim of this paper is to explain why co-inductive denitions and proofs by
co-induction are useful to functional programmers.
Co-induction is dual to induction. To say a set is inductively dened just
means it is the least solution of a certain form of inequation. For instance, the
set of natural numbers N is the least solution (ordered by set inclusion, ) of
the inequation
f0g [ fS (x) j x 2 X g X:
(1)
The corresponding induction principle just says that if some other set satises the inequation, then it contains the inductively dened set. To prove a
property of all numbers, let X be the set of numbers with that property and
show that X satises inequation (1). If so then N X , since N is the least
such set. This is simply mathematical induction.
Dually, a set is co-inductively dened if it is the greatest solution of a certain
form of inequation. For instance, suppose that is the reduction relation in a
functional language. The set of divergent programs, ", is the greatest solution
of the inequation
X fa j 9b(a b & b 2 X )g:
(2)
The corresponding co-induction principle is just that if some other set satises the inequation, then the co-inductively dened set contains it. For instance,
Royal
Society University Research Fellow.
suppose that program
reduces to itself, that is,
. To see that
is
contained in ", consider set X = f
g. Since X satises inequation (2), X ",
as " is the greatest such set. Hence
is a member of ".
Bisimilarity is an equality based on operational behaviour. This paper seeks
to explain why bisimilarity is an important co-inductive denition for functional programmers. Bisimilarity was introduced into computer science by
Park (1981) and developed by Milner in his theory of CCS (1989). Bisimilarity in CCS is based on labelled transitions. A transition a ! b means that
program (process) a can perform an observable action to become successor
program b. Any program gives rise to a (possibly innite) derivation tree,
whose nodes are programs and whose arcs are transitions, labelled by actions.
Two programs are bisimilar if they root the same derivation trees, when one
ignores the syntactic structure at the nodes. Bisimilarity is a way to compare
behaviour, represented by actions, whilst discarding syntactic structure.
Contextual equivalence (Morris 1968) is widely accepted as the natural notion
of operational equivalence for PCF-like languages (Milner 1977; Plotkin 1977).
Two programs are contextually equivalent if, whenever they are each inserted
into a hole in a larger program of integer type, the resulting programs either
both converge or both diverge. The main technical novelty of this paper is
to show how to dene a labelled transition system for PCF-like languages (for
instance, Miranda and Haskell) such that bisimilarity|operationally-dened
behavioural equivalence|coincides with Morris' contextual equivalence. By
virtue of this characterisation of contextual equivalence we can prove properties
of functional programs using co-induction. We intend in a series of examples
to show how co-induction formally captures and justies intuitive operational
arguments.
We begin in Section 1 by showing how induction and co-induction derive, dually, from the Tarski-Knaster xpoint theorem. Section 2 introduces the small
call-by-name functional language, essentially PCF extended with pairing and
streams, that is the vehicle for the paper. We make two conventional denitions
of divergence and contextual equivalence. In Section 3 we make a co-inductive
denition of divergence, prove it equals the conventional one, and give an example of a co-inductive proof. The heart of the paper is Section 4 in which
we introduce bisimilarity and prove it coincides with contextual equivalence.
We give examples of co-inductive proofs and state a collection of useful equational properties. We derive the Take Lemma of Bird and Wadler (1988) by
co-induction. Section 5 explains why bisimilarity is a precongruence, that is,
preserved by arbitrary contexts, using Howe's method (1989). We summarise
the paper in Section 6 and discuss related work.
This paper is intended to introduce the basic ideas of bisimilarity and coinduction from rst principles. It should be possible to apply the theory developed in Section 4 without working through the details of Section 5, the hardest
of the paper. In a companion paper (Gordon 1994a) we develop further coinductive tools for functional programs. For more examples of bisimulation
proofs see Milner (1989) or Gordon (1994b), for instance.
Here are our mathematical conventions. As usual we regard a relation R on
a set X to be a subset of X X . If R is a relation then we write x R y to
mean (x; y) 2 R. If R and R0 are both relations on X then we write RR0 for
their relational composition, that is, the relation such that xRR0 y i there
is z such that x R z and z R0 y. If R is a relation then Rop is its opposite,
the relation such that x Rop y i y R x. If R is a relation, we write R+ for its
transitive closure, and R for its re
exive and transitive closure.
1 A Tutorial on Induction and Co-induction
Let U be some universal set and F : }(U ) ! }(U ) be a monotone function
(that is, F (X ) F (Y ) whenever X Y ). Induction and co-induction are
dual proof principles that derive from the denition of a set to be the least or
greatest solution, respectively, of equations of the form X = F (X ).
First some denitions. A set X U is F -closed i F (X ) X . Dually, a set
X U is F -dense i X F (X ). A xpoint of F is a solution of the equation
X = F (X ). Let X: F (X ) and X: F (X ) be the following subsets of U .
T
X: F (X ) def
= fX j F (X ) X g
S
X: F (X ) def
= fX j X F (X )g
Lemma 1
(1) X: F (X ) is the least F -closed set.
(2) X: F (X ) is the greatest F -dense set.
Proof We prove (2); (1) follows by a dual argument. Since X: F (X ) contains every F -dense set by construction, we need only show that it is itself
F -dense, for which the following lemma suces.
S
If every Xi is F -dense, so is the union i Xi .
S
S
Since Xi SF (Xi ) for every i, i Xi S i F (Xi ). SinceS F is monotone,
F (XiS
) F ( i XSi ) for each i. Therefore i FS(Xi ) F ( i Xi ), and so we
have i Xi F ( i Xi ) by transitivity, that is, i Xi is F -dense.
Theorem 1 (Tarski-Knaster)
(1) X: F (X ) is the least xpoint of F .
(2) X: F (X ) is the greatest xpoint of F .
Proof Again we prove (2) alone; (1) follows by a dual argument. Let
= X: F (X ). We have F ( ) by Lemma 1. So F ( ) F (F ( )) by
monotonicity of F . But then F ( ) is F -dense, and therefore F ( ) . Combining the inequalities we have = F ( ); it is the greatest xpoint because
any other is F -dense, and hence contained in .
We say that X: F (X ), the least solution of X = F (X ), is the set inductively
dened by F , and dually, that X: F (X ), the greatest solution of X = F (X ),
is the set co-inductively dened by F . We obtain two dual proof principles
associated with these denitions.
Induction:
X: F (X ) X if X is F -closed.
Co-induction: X X: F (X ) if X is F -dense.
Let us revisit the example of mathematical induction, mentioned in the introduce. Suppose there is an element 0 2 U and an injective function S : U ! U .
If we dene a monotone function F : }(U ) ! }(U ) by
F (X ) def
= f0g [ fS (x) j x 2 X g
and set N def
= X: F (X ), the associated principle of induction is that N X if
F (X ) X , which is to say that
N X if both 0 2 X and S (x) 2 X whenever x 2 X .
In other words, mathematical induction is a special case of this general framework. Winskel (1993) shows in detail how structural induction and rule induction, proof principles familiar to computer scientists, are induction principles
obtained from particular kinds of inductive denition. As for examples of coinduction, Sections 3 and 4 are devoted to co-inductive denitions of program
divergence and equivalence respectively. Aczel (1977) is the standard reference
on inductive denitions. Davey and Priestley (1990) give a more recent account
of xpoint theory, including the Tarski-Knaster theorem.
2 A Small Functional Language
In this section we introduce a small call-by-name functional language. It is
PCF extended with pairing and streams, a core fragment of a lazy language
like Miranda or Haskell. We dene its syntax, a type assignment relation, a
`one-step' reduction relation, , and a `big-step' evaluation relation, +.
Let x and y range over a countable set of variables. The types, A, B , and
expressions, e, are given by the following grammars.
A; B ::= Int j Bool j A ! A j (A; A) j [A]
e ::= x j e e j x:A: e j if e then e else e j kA j
A j A
where k ranges over a nite collection of builtin constants,
is the divergent
constant and ranges over a nite collection of user-dened constants.
We assume these include map, iterate, take and filter; we give informal
denitions below. The builtin constants are listed below. We say that kA is
admissable if k:A is an instance of one of the following schemas.
;
tt : Bool
succ pred : Int
fst : (
)
Pair :
Cons :
[ ]
! Int
A; B ! A
A ! B ! (A; B )
A ! A ! [A]
: Int
zero : Int
Bool
snd : (
)
Nil : [ ]
scase :
(
!
A; B ! B
A
B ! A ! [A] ! B ) ! [A] ! B
For each user-dened constant we assume given a denition :A def
= e . In
eect these are denitions by mutual recursion, as each body e can contain
;
occurrences of any constant; hence there is no need for an explicit fix operator.
We identify expressions0 up to alpha-conversion; that is, renaming of bound
variables. We write e[e =x] for the substitution of expression e0 for each variable
x free in expression e. A context, C , is an expression with one or more holes.
A hole is written as [ ] and we write C [e] for the outcome of lling each hole in
C with the expression e.
The type assignment relation
` e : A where is x1 :A1 ; : : : ; xn :An ,
is given inductively by rules of simply typed -calculus plus
kA admissable
:A def
=e
A
A
`
:A
`k :A
`:A
` e1 : Bool
` e2 : A
` e3 : A
` if e1 then e2 else e3 : A
We assume that ? ` e : A is derivable whenever :A def
= e is a denition of a
user-dened constant. Type assignment is unique in the sense that whenever
` e : A and ` e : B , then A = B .
Given the type assignment relation, we can construct the following universal
sets and relations.
Prog (A) def
= fe j ? ` e : Ag
(programs of type A)
S
def
(programs of any type)
a; b 2 Prog = A Prog (A)
def
Rel (A) = f(a; b) j fa; bg Prog (A)g (total relation on A programs)
S
(total relation on programs)
R; S Rel def
= A Rel (A)
The operational semantics is a one-step reduction relation, Rel . It is
inductively dened by the axiom schemes
= ei
(x: e)a e[a=x]
i ei if i def
a` ` 2 ftt ; g
if ` then att else a
succ i
i+1
pred i + 1
i
zero 0
tt
zero i
if i 6= 0
fst (Pair a b)
a
snd (Pair a b)
b
scase f b Nil
b
scase f b (Cons a as)
f a as
together with the scheme of structural rules
a b
E [a] E [b]
where E is an experiment (a kind of atomic evaluation context (Felleisen
and Friedman 1986)), a context generated by the grammar
E ::= [ ] a j succ [ ] j pred [ ] j zero [ ] j if [ ] then a else b
j fst [ ] j snd [ ] j scase a b [ ]:
In other words the single structural rule above abbreviates eight dierent rules,
one for each kind of experiment. Together they specify a deterministic, call-byname evaluation strategy. Now we can make the usual denitions of evaluation,
convergence and divergence.
a def
= 9b(a b)
`p reduces'
def
a + b = a b & :(b )
`a evaluates to b'
def
a+ = 9b(a + b)
`a converges'
a* def
= whenever a b, then b `a diverges'
By expanding the denition we can easily check that + and * are complementary, that is, a* i :a+. We can characterise the answers returned by the
evaluation relation, +, as follows. Let an normal program be a program a
such that :(a ). Let a value, u or v, be a program generated by the grammar
v ::= x: e j k j k2 a j k2 a b where k2 2 fPair; Cons; scaseg:
Lemma 2 A program is a value i it is normal.
Proof By inspection, each value is clearly normal. For the other direction,
one can easily prove by structural induction on a, that a is a value if it is
normal.
Two programs are contextually equivalent if they can be freely interchanged for
one another in a larger program, without changing its observable behaviour.
This is a form of Morris' \extensional equivalence" (Morris 1968). Here is the
formal denition of contextual equivalence, ' Rel . Recall that C stands
for contexts.
a@
b i whenever (C@[a]; C [b]) 2 Rel (Int), that C [a]+ implies C [b]+.
a ' b i a @
b and b a.
We have formalised `observable behaviour' as termination at integer type. The
relation is unchanged if we specify that C [a] and C [b] should both evaluate to
the same integer. Contextual equivalence does not discriminate on grounds of
termination at function or pair type. For instance, we will be able to prove that
A!B ' x:A:
B . The two would be distinguished in a call-by-value setting,
since one diverges and the other converges, but in our call-by-name setting no
context of integer type can tell them apart.
We have introduced the syntax and operational semantics of a small functional
language. Our denitions of divergence and contextual equivalence are natural
and intuitive, but do not lend themselves to proof. In the next two sections
we develop co-inductive characterisations of both divergence and contextual
equivalence. Hence we obtain a theory admitting proofs of program properties
by co-induction.
3 A Co-inductive Denition of Divergence
We can characterise divergence co-inductively in terms of unbounded reduction.
Let D : }(Prog ) ! }(Prog ) and " Prog be
D(X ) def
= fa j 9b(a b & b 2 X )g
def
" = X: D(X )
We can easily see that D is monotone. Hence by its co-inductive denition we
have:
" is the greatest D-dense set and " = D(").
Hughes and Moran (1993) give an alternative, `big-step', co-inductive formulation of divergence.
= f
g. X
is D-dense, that
As a simple example we can show that
". Let X
def
is, X
D(X
), because
and
2 X
. So X
" by co-induction,
and therefore
".
We have an obligation to show that this co-inductive denition matches the
earlier one, that a* i whenever a b, then b .
Theorem 2 * = ".
Proof (" *). Suppose that a". We must show whenever a b, that
b . If a", then a 2 D(") so there is an a0 with a a0 and a0 ". Furthermore
since reduction is deterministic, a0 is unique. Hence, whenever a" and a b
it must be that b". Therefore b .
(* "). By co-induction it suces to prove that set * is D-dense. Suppose
that a*. Since a a, we have a , that is, a b for some b. But whenever
b b0 it must be that a b0 too, and in fact b0 since a*. Hence b* too,
a 2 D(*) and * is D-dense.
4 A Co-inductive Denition of Equivalence
We begin with a labelled transition system that characterises the immediate
observations one can make of a program. It is dened in terms of the one-step
operational semantics, and in some sense characterises the interface between
the language's interpreter and the outside world. It is a a family of relations
( ! Prog Prog j 2 Act ), indexed by the set Act of actions. If we let
Lit , the set of literals, indexed by `, be ftt ; g [ f: : : ; 2; 1; 0; 1; 2; : : :g, the
actions are given as follows.
; 2 Act def
= Lit [ f@a j a 2 Prog g [ ffst; snd; Nil; hd; tlg
We partition the set of types into active and passive types. The intention is
that we can directly observe termination of programs of active type, but not
those of passive type. Let a type be active i it has the form Bool, Int or [A].
Let a type be passive i it has the form A ! B or Pair A B . Arbitrarily we
dene 0 def
=
Int . Given these denitions, the labelled transition system may
be dened inductively as follows.
` 0
` !
Nil
!0
Nil
Cons
a b hd! a
Cons
a b tl! b
a b 2 Prog a 2 Prog ((A; B )) a 2 Prog ((A; B ))
a fst
! fst a
a snd
! snd a
a @!b a b
00
00
0
a a
a ! a a 2 Prog (A)
A active
a ! a0
The derivation tree of a program a is the potentially innite tree whose nodes
are programs, whose arcs are labelled transitions, and which is rooted at a. For
instance, the trees of the constant
A are empty if A is active. In particular, the
tree of 0 is empty. We use 0 in dening the transition system to indicate that
after observing the value of a literal there is nothing more to observe. Following
Milner (1989), we wish to regard two programs as behaviourally equivalent i
their derivation trees are isomorphic when we ignore the syntactic structure
of the programs labelling the nodes. We formalise this idea by requiring our
behavioural equivalence to be a relation Rel that satises property ():
whenever (a; b) 2 Rel , a b i
(1) Whenever a ! a0 there is b0 with b ! b0 and a0 b0 ;
(2) Whenever b ! b0 there is a0 with a ! a0 and a0 b0 .
In fact there are many such relations; the empty set is one. We are after
the largest or most generous such relation. We can dene it co-inductively as
follows. First dene two functions [ ]; h i : }(Rel ) ! }(Rel ) by
[S ] def
= f(a; b) j whenever a ! a0 there is b0 with b ! b0 and a0 S b0g
hSi def
= [S ] \ [S op ]op
where S Rel . By examining element-wise expansions of these denitions, it
is not hard to check that a relation satises property () i it is a xpoint of
function h i. One can easily check that both functions [ ] and h i are monotone. Hence what we seek, the greatest relation to satisfy (), does exist, and
equals S : hSi, the greatest xpoint of h i. We make the following standard
denitions (Milner 1989).
Bisimilarity, Rel , is S : hSi.
A bisimulation is an h i-dense relation.
Bisimilarity is the greatest bisimulation and = hi. Again by expanding the
denitions we can see that relation S Rel is a bisimulation i a S b implies
Whenever a ! a0 there is b0 with b ! b0 and a0 S b0 ;
Whenever b ! b0 there is a0 with a ! a0 and a0 S b0 .
An asymmetric version of bisimilarity is of interest too.
Similarity, . Rel , is S : [S ].
A simulation is an [ ]-dense relation.
We can easily establish the following basic facts.
Lemma 3
(1) . is a preorder and an equivalence relation.
(2) = . \ .op .
(3) Both and + .
Proof These are easily proved by co-induction. We omit the details. Parts
(2) and (3) depend on the determinacy of . Part (1) corresponds to Proposition 4.2 of Milner (1989).
4.1 A co-inductive proof about lazy streams
To motivate study of bisimilarity, let us see how straightforward it is to use
co-induction to establish that two lazy streams are bisimilar. Suppose map and
iterate are a couple of builtin constants specied by the following equations.
map f Nil = Nil
map f (Cons x xs) = Cons (f x) (map f xs)
iterate f x = Cons x (iterate f (f x))
These could easily be turned into formal denitions of two user-dened constants, but we omit the details. Pattern matching on streams would be accomplished using scase. Intuitively the streams
iterate f (f x)
and map f (iterate f x)
are equal, because they both consist of the sequence
f x; f (f x); f (f (f x)); f (f (f (f x))); : : :
We cannot directly prove this equality by induction, because there is no argument to induct on. Instead we can easily prove it by co-induction, via the
following lemma.
Lemma 4 If S Rel is
f(iterate f (f x); map f (iterate f x)) j
9A(x 2 Prog (A) & f 2 Prog (A ! A))g
then (S [ ) hS [ i.
Proof It suces to show that S hS [ i and hS [ i. The latter is
obvious, as = hi. To show S hS [ i we must consider arbitrary a and
b such that a S b, and establish that each transition a ! a0 is matched by a
transition b ! b0 , such that either a0 S b0 or a0 b0 , and vice versa. Suppose
then that a is iterate f (f x), and b is map f (iterate f x). We can calculate
the following reductions.
a + Cons (f x) (iterate f (f (f x)))
b + Cons (f x) (map f (iterate f (f x)))
Whenever a a0 we can check that a ! a00 i a0 ! a00 . Using the
reductions above we can enumerate all the transitions of a and b.
a hd! f x
(1)
tl
a ! iterate f (f (f x))
(2)
hd
b ! fx
(3)
tl
b ! map f (iterate f (f x))
(4)
Now it is plain that (a; b) 2 hS [i. Transition (1) is matched by (3), and vice
versa, with f x f x (since is re
exive). Transition (2) is matched by (4),
and vice versa, with iterate f (f (f x)) S map f (iterate f (f x)).
Since S [ is h i-dense, it follows that (S [ ) . A corollary then is that
iterate f (f x) map f (iterate f x)
for any suitable f and x, what we set out to show.
4.2 Operational Extensionality
We have an obligation to show that bisimilarity, , equals contextual equivalence, '. The key fact we need is the following, that bisimilarity is a precongruence.
Theorem 3 (Precongruence) If a b then C [a] C [b] for any suitable
context C . The same holds for similarity, ..
The proof is non-trivial; we shall postpone it till Section 5.
Lemma 5 @ = ..
Proof (. @) Suppose a . b, that (C [a]; C [b]) 2 Rel (Int) and that C [a]+.
By precongruence, C [a] . C [b], so C [b]+ too. Hence a @
b as required.
(@
.
)
This
follows
if
we
can
prove
that
contextual
order @
is a simulation.
The details are not hard, and we omit them. For full details of a similar proof
see Lemma 4.29 of Gordon (1994b), which was based on Theorem 3 of Howe
(1989).
Contextual equivalence and bisimilarity are the symmetrisations of contextual
order and similarity, respectively. Hence a corollary, usually known as operational extensionality (Bloom 1988), is that bisimilarity equals contextual
equivalence.
Theorem 4 (Operational Extensionality) ' = .
4.3 A Theory of Bisimilarity
We have dened bisimilarity as a greatest xpoint, shown it to be a co-inductive
characterisation of contextual equivalence, and illustrated how it admits coinductive proofs of lazy streams. In this section we shall note without proof
various equational properties needed in a theory of functional programming.
Proofs of similar properties, but for a dierent form of bisimilarity, can be
found in Gordon (1994b). We noted already that , which justies a
collection of beta laws. We can easily prove the following unrestricted eta laws
by co-induction.
Proposition 1 (Eta) If a 2 Prog (A ! B), a x: a x.
Proposition 2 (Surjective Pairing)
If a 2 Prog ((A; B )), a Pair (fst a) (snd a).
Furthermore we have an unrestricted principle of extensionality for functions.
Proposition 3 (Extensionality) Suppose ff; gg Prog (A ! B). If f a
g a for any a 2 Prog (A), then f g.
Here are two properties relating
and divergence.
Proposition 4 (Divergence)
(1) E [
]
for any experiment E .
(2) If a* then a
.
As promised, we can prove that x:A:
B '
A!B , in fact by proving
x:A:
B
A!B . Consider any a 2 Prog (A). We have (x:A:
B ) a
B
by beta reduction and
A!B a
B by part (1) of the last proposition. Hence
x:A:
B
A!B by extensionality. In fact, then, the converse of (2) is false,
for x:A:
B
A!B but x:A:
B +.
We can easily prove the following adequacy result.
Proposition 5 (Adequacy) If a 2 Prog (A) and A is active, a* i a
.
The condition that A be active is critical, because of our example x:A:
B
A!B , for instance.
Every convergent program equals a value, but the syntax of values includes
partial applications of curried function constants. Instead we can characterise
each of the types by the simpler grammar of canonical programs.
c ::= ` j x: e j Pair a b j Nil j Cons a b:
Proposition 6 (Exhaustion) For any program a 2 Prog (A) there is a canonical program c with a c i either a converges or A is passive.
The , Pair and Cons operations are injective in the following sense.
Proposition 7 (Canonical Freeness)
(1) If x:A: e x:A: e0 then e[a=x] e0 [a=x] for any a 2 Prog (A).
(2) If Pair a1 a2 Pair b1 b2 then a1 b1 and a2 b2 .
(3) If Cons a1 a2 Cons b1 b2 then a1 b1 and a2 b2 .
4.4 Bird and Wadler's Take Lemma
Our nal example in this paper is to derive Bird and Wadler's Take Lemma
(1988) to illustrate how a proof principle usually derived by domain-theoretic
xpoint induction follows also from co-induction.
We begin with the take function, which returns a nite approximation to an
innite list.
take 0 xs = Nil
take n Nil = Nil
take (n+1) (Cons x xs) = Cons x (take n xs)
Here is the key lemma.
Lemma 6 Dene S Rel by a S b i 8n 2 N (take n + 1 a take n + 1 b).
(1) Whenever a S b and a + Nil, b + Nil too.
(2) Whenever a S b and a + Cons a0 a00 there are b0 and b00 with b + Cons b0 b00 ,
a0 b0 and a00 S b00 .
(3) (S [ ) hS [ i.
Proof Recall that values of stream type take the form Nil or Cons a b. For
any program, a, of stream type, either a* or there is a value v with a + v.
Hence for any stream a, either a
(from a* by adequacy, Proposition 5) or
a + Nil or a + Cons a0 a00 . Note also the following easily proved lemma about
transitions of programs of active type, such as streams.
Whenever a 2 Prog (A) and A active, a ! b i 9value v (a + v ! b).
(1) Using a S b and n = 0 we have take 1 a take 1 b. Since a + Nil, we have
a Nil, and in fact that Nil take 1 b by denition of take. We know that
either b
, b + Nil or b + Cons b0 b00 . The rst and third possibilities would
contradict Nil take 1 b, so it must be that b + Nil.
(2) We have
take n + 1 (Cons a0 a00 ) take n + 1 b:
With n = 0 we have
Cons a0 Nil take 1 b
which rules out the possibilities that b
or b + Nil, so it must be that
b + Cons b0 b00 . So we have
Cons a0 (take n a00 ) Cons b0 (take n b00 )
for any n, and hence a0 b0 and a00 S b00 by canonical freeness, Proposition 7.
(3) As before it suces to prove that S hS [ i. Suppose that a S b. For
each transition a ! a0 we must exhibit b0 satisfying b ! b0 and either a0 S b0
or a0 b0 . Since a and b are streams, there are three possible actions to
consider.
(1) Action is Nil. Hence a + Nil and a0 is 0. By part (1), b + Nil too.
Hence b Nil
! 0, and 0 0 as required.
(2) Action is hd. Hence a + Cons a0 a00 . By part (2), there are b0 and b00
with b + Cons b0 b00 , hence b hd! b0 , and in fact a0 b0 by part (2).
(3) Action is tl. Hence a + Cons a0 a00 . By part (2), there are b0 and b00
with b + Cons b0 b00 , hence b tl! b00 , and in fact a00 S b00 by part (2).
This completes the proof of (3).
The Take Lemma is a corollary of (3) by co-induction.
Theorem 5 (Take Lemma) Suppose a; b 2 Prog ([A]).
Then a b i 8n 2 N (take n + 1 a take n + 1 b).
See Bird and Wadler (1988) and Sander (1992), for instance, for examples of
how the Take Lemma reduces a proof of equality of innite streams to an
induction over all their nite approximations.
Example equations such as
map (f o g ) as map f (map g as)
(where o is function composition) in which the stream processing function preserves the size of its argument are easily proved using either co-induction or the
Take Lemma. In either case we proceed by a simple case analysis of whether
as*, as + Nil or as + Cons a as0 . Suppose however that filter f is the stream
processing function that returns a stream of all the elements a of its argument
such that f a + tt . Intuitively the following equation should hold
filter f (map g as) map g (filter (f o g ) as)
but straightforward attacks on this problem using either the Take Lemma or
co-induction in the style of Lemma 4 fail. The trouble is that the result stream
may not have as many elements as the argument stream.
These proof attempts can be repaired by resorting to a more sophisticated
analysis of as than above. Lack of space prevents their inclusion, but in this way
we can obtain proofs of the equation using either the Take Lemma or a simple
co-induction. Alternatively, by more rened forms of co-induction|developed
elsewhere (Gordon 1994a)|we can prove such equations using a simple-minded
case analysis of the behaviour of as. These proof principles need more eort to
justify than the Take Lemma, but in problems like the map/filter equation
are easier to use.
5 Proof that Bisimilarity is a Precongruence
In this section we make good our promise to show that bisimilarity and similarity are precongruences, Theorem 3. We need to extend relations such as
bisimilarity to open expressions rather than simply programs. Let a proved
expression be a triple ( ; e; A) such that ` e : A. If = x1 :A1 ; : : : ; xn :An,
a -closure is a substitution [~a=~x] where each ai 2 Prog (Ai ). Now if R Rel ,
let its open extension, R , be the least relation between proved expressions
such that
( ; e; A) R ( ; e0 ; A) i e[~a=~x] R e0 [~a=~x] for any -closure [~a=~x].
For instance, relation Rel holds between any two proved expressions ( ; e; A)
and ( 0 ; e0 ; A0 ) provided only that = 0 and A = A0 . As a matter of notation
we shall write ` e R e0 : A to mean that ( ; e; A) R ( ; e0 ; A) and, in fact, we
shall often omit the type information.
We need the following notion, of compatible renement, to characterise what it
means for a relation on open expressions to be a precongruence. If R Rel ,
its compatible renement, Rb Rel , is dened inductively by the following
rules.
` e Rb e if e 2 fx; k;
; j g
` e1 R e01
` e2 R e02
; x:A ` e R e0
` e1 e2 Rb e01 e02
` x:A: e Rb x:A: e0
` ei R e0i (i = 1; 2; 3)
` if e1 then e2 else e3 Rb if e01 then e02 else e03
Dene a relation R Rel to be a precongruence i it contains its own
compatible renement, that is, Rb R. This denition is equivalent to saying
that a relation is preserved by substitution into any context.
Lemma 7 Assume that R Rel is a preorder. R is a precongruence i
` C [e] R C [e0 ] whenever ` e R e0 and C is a context.
The proof of the `only if' direction is by induction on the size of context C ; the
other direction is straightforward. Note that whenever a and b are programs
of type A, that a b i (?; a; A) (?; b; A), and similarly for similarity, ..
Hence given the Lemma 7, to prove Theorem 3 it will be enough to show that
and . are precongruences, that is c and .c . .
We shall use a general method established by Howe (1989). First we prove
that the open extension of similarity is a precongruence. We dene a second
relation . , which by construction satises .c . and . . . We prove
by co-induction that . . . Hence . and . are one and the same relation,
and . is a precongruence because . is.
Second we prove that the open extension of bisimilarity is a precongruence. Let
& = .op. Recall Lemma 3(2), that = . \ &. Furthermore = . \ &
follows by denition of open extension. We can easily prove another fact, that
R\
\ S = Rb \ Sb whenever R; S Rel . We have
c \ &c . \ & =
\ & ) = .
c = (.\
which is to say that is a precongruence. Indeed, being an equivalence
relation, it is a congruence.
We have only sketched the rst part, that . is a precongruence. We devote the
remainder of this section to a more detailed account. Compatible renment, b ,
b ,
permits a concise inductive induction of Howe's relation . Rel as S : S.
which is to say that . is the least relation to satisfy the rule
` e .c e00
` e00 . e0
` e . e0
Sands (1992) found the following neat presentation of some basic properties of
. from Howe's paper.
Lemma 8 (Sands) . is the least relation closed under the rules
` e . e00
` e00 . e0
` e .c e0
:
` e . e0
` e . e0
We claimed earlier that .c . and . . ; these follow from the lemma.
` e . e0
` e . e0
The proof is routine, as is that of the following substitution lemma.
Lemma
9 If ; x:B ` e1 . e2 and ` e01 . e02 : B then ` e1[e01=x] .
0
e2[e2=x].
What remains of Howe's method is to prove that . ., which we do by
co-induction. First note the following lemma|which is the crux of the proof|
relating . and transition.
Lemma 10 Let S def
= f(a; b) j ? ` a . bg.
(1) Whenever a S b and a
a0 then a0 S b.
(2) Whenever a S b and a ! a0 there is b0 with b ! b0 and a0 S b0 .
Proof The proofs
are induction on the depth of inference of reduction a a0
and transition a ! a0 respectively. Details of similar proofs may be found in
Howe (1989) and Gordon (1994b).
By this lemma, S is a simulation, and hence S . by co-induction. Open
extension is monotone, so S . . Now . S follows from the substitution
lemma (Lemma 9) and the re
exivity of . (Lemma 8 and re
exivity of . ).
Hence we have . .. But the reverse inclusion follows from Lemma 8, so
in fact . = . and hence . is a precongruence.
6 Summary and Related Work
We explained the dual foundations of induction and co-induction. We dened
notions of divergence and contextual equivalence for a small functional language, an extension of PCF. We gave co-inductive characterisations of both
divergence and contextual equivalence, and illustrated their utility by a series
of examples and properties. In particular we derived the `Take Lemma' of Bird
and Wadler (1988). We explained Howe's method for proving that bisimilarity, our co-inductive formulation of contextual equivalence, is a precongruence.
We hope to have shown both by general principles and specic examples that
there is an easy path leading from the reduction rules that dene a functional
language to a powerful theory of program equivalence based on co-induction.
Although our particular formulation is new, bisimilarity for functional languages is not. Often it is known as `applicative bisimulation' and is based
on a natural semantics style evaluation relation. The earliest reference I can
nd is to Abramsky's unpublished 1984 work on Martin-Lof's type theory,
which eventually led to his study of lazy lambda-calculus1 (Abramsky and Ong
1993). Other work includes papers by Howe (1989), Smith (1991), Sands (1992,
1994), Ong (1993), Pitts and Stark (1993), Ritter and Pitts (1994), Crole and
Gordon (1994) and my book (1994b). The present formulation is the rst to
coincide with contextual equivalence for PCF-like languages. It amounts to
a co-inductive generalisation of Milner's original term model for PCF (1977).
Since it equals contextual equivalence it answers Turner's (1990, Preface) concern that Abramsky's applicative bisimulation makes more distinctions than
are observable by well-typed program contexts.
Domain theory is one perspective on the foundations of lazy functional programming; this paper oers another. Any subject benets from multiple perspectives. In this case the two are of about equal expressiveness. Domain
theory is independent of syntax and operational semantics, and provides xpoint induction for proving program properties. If we take care to distinguish
denotations from texts of programs, the theory of bisimilarity set out in Section 4 can be paralleled by a theory based on a domain-theoretic denotational
semantics. Winskel (1993), for instance, shows how to prove adequacy for a
lazy language with recursive types (albeit one in which functions and pairs
are active types). Pitts (1994) develops a co-induction principle from domain
theory. On the other hand, Smith (1991) shows how operational methods
based on a form of bisimilarity can support xpoint induction. One advantage
of the operational approach is that bisimilarity coincides exactly with contextual equivalence. The corresponding property of a denotational semantics|full
abstraction|is notoriously hard to achieve (Ong 1994).
1 The earliest presentation of lazy lambda-calculus appears to be Abramsky's thesis (1987,
Chapter 6), in which he explains that the \main results of Chapter 6 were obtained in the
setting of Martin-Lof's Domain Interpretation of his Type Theory, during and shortly after
a visit to Chalmers in March 1984."
Acknowledgements
The idea of dening bisimilarity on a deterministic functional language via a
labelled transition system arose in joint work with Roy Crole (1994). Martin
Coen pointed out the map/filter example to me. I hold a Royal Society University Research Fellowship. This work has been partially supported by the
CEC TYPES BRA, but was begun while I was a member of the Programming Methodology Group at Chalmers. I benetted greatly from presenting a
tutorial on this work to the Functional Programming group at Glasgow University. I am grateful to colleagues at the Ayr workshop, and at Chalmers and
Cambridge, for many useful conversations.
References
Abramsky, S. (1987, October 5). Domain Theory and the Logic of Observable Properties. Ph. D. thesis, Queen Mary College, University of
London.
Abramsky, S. and L. Ong (1993). Full abstraction in the lazy lambda calculus.
Information and Computation 105, 159{267.
Aczel, P. (1977). An introduction to inductive denitions. In J. Barwise (Ed.),
Handbook of Mathematical Logic, pp. 739{782. North-Holland.
Bird, R. and P. Wadler (1988). Introduction to Functional Programming. Prentice-Hall.
Bloom, B. (1988). Can LCF be topped? Flat lattice models of typed lambda
calculus. In Proceedings 3rd LICS, pp. 282{295.
Crole, R. L. and A. D. Gordon (1994, September). A sound metalogical
semantics for input/output eects. In Computer Science Logic'94,
Kazimierz, Poland. Proceedings to appear in Springer LNCS.
Davey, B. A. and H. A. Priestley (1990). Introduction to Lattices and
Order. Cambridge University Press.
Felleisen, M. and D. Friedman (1986). Control operators, the SECD-machine,
and the -calculus. In Formal Description of Programming Concepts III, pp. 193{217. North-Holland.
Gordon, A. D. (1994a). Bisimilarity as a theory of functional programming.
Submitted for publication.
Gordon, A. D. (1994b). Functional Programming and Input/Output.
Cambridge University Press. Revision of 1992 PhD dissertation.
Howe, D. J. (1989). Equality in lazy computation systems. In Proceedings
4th LICS, pp. 198{203.
Hughes, J. and A. Moran (1993, June). Natural semantics for nondeterminism. In Proceedings of El Wintermote, pp. 211{222.
Chalmers PMG. Available as Report 73.
Milner, R. (1977). Fully abstract models of typed lambda-calculi. TCS 4,
1{23.
Milner, R. (1989). Communication and Concurrency. Prentice-Hall.
Morris, J. H. (1968, December). Lambda-Calculus Models of Programming Languages. Ph. D. thesis, MIT.
Ong, C.-H. L. (1993, June). Non-determinism in a functional setting (extended abstract). In Proceedings 8th LICS, pp. 275{286.
Ong, C.-H. L. (1994, January). Correspondence between operational and
denotational semantics: The full abstraction problem for PCF. Submitted
to Handbook of Logic in Computer Science Volume 3, OUP 1994.
Park, D. (1981, March). Concurrency and automata on innite sequences.
In P. Deussen (Ed.), Theoretical Computer Science: 5th GIConference, Volume 104 of Lecture Notes in Computer Science,
pp. 167{183. Springer-Verlag.
Pitts, A. and I. Stark (1993, June). On the observable properties of higher
order functions that dynamically create local names (preliminary report).
In SIPL'93, pp. 31{45.
Pitts, A. M. (1994). A co-induction principle for recursively dened domains.
TCS 124, 195{219.
Plotkin, G. D. (1977). LCF considered as a programming language. TCS 5,
223{255.
Ritter, E. and A. M. Pitts (1994, September). A fully abstract translation
between a -calculus with reference types and Standard ML. To appear
in TLCA'95.
Sander, H. (1992). A Logic of Functional Programs with an Application to Concurrency. Ph. D. thesis, Chalmers PMG.
Sands, D. (1992). Operational theories of improvement in functional languages (extended abstract). In Functional Programming, Glasgow
1991, Workshops in Computing, pp. 298{311. Springer-Verlag.
Sands, D. (1994, May). Total correctness and improvement in the transformation of functional programs (1st draft). DIKU, University of Copenhagen.
Smith, S. F. (1991). From operational to denotational semantics. In MFPS
VII, Pittsburgh, Volume 598 of Lecture Notes in Computer Science, pp. 54{76. Springer-Verlag.
Turner, D. (Ed.) (1990). Research Topics in Functional Programming.
Addison-Wesley.
Winskel, G. (1993). The Formal Semantics of Programming Languages. MIT Press, Cambridge, Mass.