Recursive types for free!
Philip Wadler
University of Glasgow
July 1990 [some typos fixed Aug 2008, Oct 2014]
DRAFT DRAFT DRAFT DRAFT DRAFT
Recursive types pervade programming: lists, trees, and streams being
three of the most common examples. Recursive types come in two
principle flavours, least fixpoint or greatest fixpoint. For example,
the least fixpoint
Lfix X. 1 + A*X
yields lists with elements of type A, the least fixpoint
Lfix X. A + X*X
yields binary trees with leaves of type A, and the greatest fixpoint
Gfix X. A*X
yields streams with elements of type A.
Adding recursive types can alter the nature of a type system. The
polymorphic lambda calculus has the pleasant propery of being strongly
normalising: all reduction sequences terminate in a normal form. But
augmenting this calculus with the type
Lfix X. 1 + (X -> X)
has the unpleasant consequence of introducing terms with no normal
form. Fortunately, strong normalisation can be preserved by a mild
restriction: don't allow the recursive type variable to appear in a
negative position. The example violates this constraint, because the
recursive type variable X appears to the left of the function arrow.
Thus, it is safe to extend the polymorphic lambda calculus by adding
least fixpoint types with type variables in positive position. Indeed,
no extension is required: such types already exist in the language! If
F X represents a type containing X in positive position only, then least
fixpoints may be defined in terms of universal quantification:
Lfix X. F X = All X. (F X -> X) -> X.
This introduces a new type, T = Lfix X. F X, satisfying the isomorphism
T ~ F T. Note that it is an isomorphism, not an equality: the type
comes equipped with functions in : T -> F T and out : F T -> T. This
formula can be found, for instance, in [Freyd 89] and [Wraith 89]. It
is not as widely known as it should be -- I know of several computer
scientists who have re-invented this particular wheel. The excellent
textbook by Girard, Lafont, and Taylor [GLT 89] gives several special
cases, but not this general form.
More interestingly, polymorphic lambda calculus also contains greatest
fixpoints. These may be defined in terms of existential quantification
Gfix X. F X = Exists X. (X -> F X) * X,
again subject to the restriction that X appears only positively in F X.
This is a little surprising: greatest fixpoints allow infinite objects,
such as streams, yet the strong normalisation property is preserved.
Surprising, but not new: it was known previously that greatest
fixpoints and strong normalisation could co-exist: both [Hagino 87] and
[Mendler 87] describe type systems that include Lfix and Gfix and have
strong normalisation. But the encoding provides a simple proof of this
fact. Hagino refers to the coding of least fixpoints, but says that he
doesn't know of a coding for greatest fixpoints; Mendler doesn't refer
to either coding, but gives a lengthy proof that adding least and
greatest fixpoints to polymorphic lambda calculus preserves strong
normalisation.
(As it turns out, not new either: I have since discovered that Wraith
also describes this encoding [Wraith 89], although there is a small
technical error: he writes
Gfix X. F X = Exists X. X -> (X -> F X),
which is incorrect.)
LEAST FIXPOINTS AS WEAK INITIAL ALGEBRAS
Let's now look at the fixpoint result in a little more detail.
This will require a mild dose of category theory. Don't panic:
all terms will be explained as we go along.
By a functor F in polymorphic lambda calculus, we will mean an operation
taking types into types, and terms into terms, such that if t : U -> V
then F t : F U -> F V, and preserving identities and composition:
F id = id and F (f . g) = F f . F g. Every type U containing a type
variable X in positive positions only corresponds to a functor F X = U,
which takes the type T into the the type F T = U[T/X].
An object in a category is weakly initial if there is a map from it to
every other object, and initial if this map is unique. In categorical
terms, the least fixpoint of F corresponds to an initial F-algebra. An
F-algebra is a pair (X,k) consisting of an object X and an arrow
k : F X -> X. These form a category, where a morphism between (X,k) and
(X',k') is given by an arrow h : X -> X' such that the diagram
k
F X ----------> X
| |
| |
(1) F h | | h
| |
| |
F X' ---------> X'
k'
commutes.
Assume Lfix is given by the equation:
T = Lfix X. F X = All X. (F X -> X) -> X.
As a convenient abbreviation, we will write T for Lfix X. F X.
Then we can define two functions:
fold : All X. (F X -> X) -> T -> X
fold = \X. \k: F X -> X. \t:T. t X k
in : F T -> T
in = \s: F T. \X. \k: F X -> X. k (F (fold X k) s).
It follows immediately that the diagram
in
F T ----------> T
| |
| |
(2) F (fold X k) | | fold X k
| |
| |
F X ----------> X
k
commutes. In other words, (T,in) is an F-algebra from which
there is a map, called (fold X k), to every other F-algebra;
that is, (T,in) is a weakly initial F-algebra.
Let's consider what this means in a particular case. Take
F X = 1+X.
The values of type 1+X have the forms (inl ()) and (inr x), where x:X.
If f : X -> Y, then F f : 1+X -> 1+Y is given by
F f (inl ()) = inl ()
F f (inr x) = inr (f x).
Now, define Nat to be the least fixpoint of F:
Nat = Lfix X. 1+X.
This corresponds to the natural numbers: (in (inl ())) represents zero,
and (in (inr n)) represents the successor of n. Let k : 1+X -> X.
Then diagram (2) states that
fold X k (inl ()) = k (inl ())
fold X k (inr n) = k (inr (fold X k n)).
If we take z = k (inl ()), and s x = k (inr x)), then we
can rewrite this in the familiar form:
fold X k 0 = z
fold X k (n+1) = s (fold X k n).
That is, the value of (fold X k) is given for zero (namely, z), and is
found for (n+1) by recursively applying (fold X k) to n, and then
applying a given function (namely, s) to this result.
As a second example, take
F X = 1 + A*X.
The values of type 1+A*X have the forms (inl ()) and (inr (a,x)),
where a:A and x:X. If f : X -> Y, then F f : 1+A*X -> 1+A*Y is
given by
F f (inl ()) = inl ()
F f (inr (a,x)) = inr (a, f x).
Now, define (List A) to be the least fixpoint of F:
List A = Lfix X. 1+A*X.
This corresponds to lists with elements of type A: (in (inl ())
represents the empty list, also written nil, and (in (inr (a,as))
represents the list constructed with head a and tail as, also written
(cons a as). Let k : 1+A*X -> X. Then diagram (2) states that
fold X k (inl ()) = k (inl ())
fold X k (inr (a,as)) = k (inr (a, fold X k as)).
If we take n = k (inl ()) and c a x = k (inr (a,x)), then
we can rewrite this in the familiar form
fold X k nil = n
fold X k (cons a as) = c a (fold (X k as)).
That is, the value of (fold X k) is given for nil (namely, n), and is
found for (cons a as) by recursively applying (fold X k) to as, and
then using a given function (namely, c) to combine a with the
result.
LEAST FIXPOINTS AS INITIAL ALGEBRAS
So far, we have considered only weak initial algebras. We now give
necessary and sufficient conditions for this to be a true initial
algebra. We first consider the problem for an arbitrary definitions of
T, fold, in, and then specialise to the particular definitions we have
given.
In order for (T,in) to be initial, the map (fold X k) must be
unique, i.e., the only map that makes diagram (2) commute.
Let h be an arbitrary map from (X,k) to (X',k'); then
combining (1) and (2) we have
in
F T ----------> T
| |
| |
F (fold X k) | | fold X k
| |
| k |
F X ----------> X
| |
| |
F h | | h
| |
| |
F X' ---------> X' ,
k'
yielding a map (h . fold X k) from (T,in) to (X',k'). But (fold X' k')
should be the only such map! Thus, initiality entails that
k fold X k
F X ----------> X T -----------> X
| | | |
| | | |
(3) F h | | h implies id | | h
| | | |
| | | |
F X' ---------> X' T -----------> X' .
k' fold X' k'
Further, since (fold T in) is a map T -> T, and id is also such
a map, initiality also implies that
(4) fold T in = id.
Conversely, (3) and (4) imply than (T,in) is initial.
Choosing an appropriate instance of (3) gives
in fold T in
F T ----------> T T -----------> T
| | | |
| | | |
F h | | h implies id | | h
| | | |
| | | |
F X ----------> X T -----------> X .
k fold X k
The left-hand square states that h is a map from (T,in) into (X,k);
and the right-hand square, combined with (4), states that h must
be (fold X k). That is, (fold X k) is the only map from (T,in)
into (X,k), as required.
If (T,in) is an initial F-algebra, then "in" is an isomorphism.
By functoriality, from in : F T -> T we have (F in) : F (F T) -> F T,
hence (F T, F in) is an F-algebra. The unique map from (F,in) into this
algebra is given by
out : T -> F T
out = fold (F T) (F in).
To see that "in" and "out" are inverses, stare at the following diagram:
in
F T ----------> T
| |
| |
F out | | out
| |
| F in |
F (F T) --------> F T
| |
| |
F in | | in
| |
| |
F T ----------> T .
in
The top square is an instance of (2), and the bottom square commutes
trivially. Hence (in . out) is a map from (T,in) to (T,in); but id
is also such a map, so by uniqueness we have id = in . out.
Now from the upper square we have
out . in = F in . F out = F (in . out) = F id = id,
completing the proof. It is precisely because "in" is an isomorphism
that we are justified in calling T a fixpoint of F, since we have
F T ~ T; it is because T is initial that we are justified in calling
it a least fixpoint.
The argument in the preceding two paragraphs works in any category. In our
given category, polymorphic lambda calculus, with the given definitions
of "fold" and "in", we can go further. Take (2) as the left-hand
square of (3); then the right-hand square becomes,
fold T in
T -----------> T
| |
| |
id | | fold X k
| |
| |
T -----------> X ,
fold X k
or, in symbols,
fold X k (fold T in t) = fold X k t.
Reducing both applications of (fold X k) yields
(fold T in t) X k = t X k
(this is where we use the definition of "fold"). By the eta rules,
it follows that
fold T in t = t
(this is where we use properties of polymorphic lambda calculus).
Thus, for the given definitions, (3) implies (4), and hence (T,in) is
an initial F-algebra exactly when (3) holds.
Law (3) does not follow from the reduction laws of polymorphic lambda
calculus, and indeed there are models that do not satisfy it. But this
law is satisfied in many models, including all those having the
property of PARAMETRICITY (see [Reynolds 83], [Freyd 89], [Wadler 89],
[Freyd 90], [AMSW 90]). In particular, in the jargon of [Wadler 89],
the "Theorem for Free" derived from the type of "fold" is just this
law.
(A technical point: The "Theorems for Free" result really deals with
relations, not functions. In the diagram (3) each arrow denotes a relation
rather than a function, namely, the relation induced by the function.
Except the arrow labeled "id" actually corresponds to the relation
id' defined by the relation
id' = (All r. (F r -> r) -> r).
Here r->s relates f:X->Y to f':X'->Y' if whenever r relates x to x'
then s relates (f x) to (f' x'). And if G is an operation on
relations and types such that if r:X<->X' then (G r):(G X)<->(G X'),
then (All r. G r) relates g : (All X. G X) to g' : (All X'. G X')
(these are both the same type), if whenever r is a relation from X to
X' then (g X) (G r) (g' X). It is not the case in all models that id'
is the identity relation! This is the purpose of the identity lemma
in [Reynolds 83] -- in any model satisfying this lemma, id' will be
the identity. A parametric model is just one that satisfies the
identity lemma. Hence, although the "Theorems for Free" result
applies in any model, it is only in a parametric model that (3)
and (4) must hold.)
ITERATORS AND RECURSORS
The "fold" operation is what has sometimes been called an iterator.
The value of (fold X k x) is computed by applying (fold X k)
recursively to each substructure of x and applying k to the result.
More formally, we can refer to the substructure of x by taking
x = in y, for some y : F T, and we can apply (fold X k) to each
substructure of x by taking (F (fold X k) y). Then the sentence above
can be expressed in symbols:
fold X k (in y) = k (F (fold X k)).
This fundamental property is just equivalent to saying that
"fold" is a map of F-algebras.
The recursor is a slight variant of the iterator. It is defined by
rec : All X. (F (X*T) -> X) -> T -> (X*T)
rec = \X. \g: F (X*T) -> X. fold (X*T) .
The value of (rec X g) at x is computed by applying (rec X g) recursively
to each substructure and applying g to the result, and pairing this with the
value of x. Thus where k expects a value computed on the substructure,
g expects a value computed on the substructure paired with the substructure
itself. This is expressed by the laws:
fst (rec X g (in y)) = g (F (rec X g) y),
snd (rec X g x) = x.
The first is an easy consequence of the definition of "rec". To derive
the second, instantiate (3) to yield:
fold (X*T)
F (X*T) -------> X*T T ----------> X*T
| | | |
| | | |
F snd | | snd implies id | | snd
| | | |
| | | |
F T ----------> T T -----------> T
in fold T in
Then the left square commutes trivially, the top line of the right square
is "rec", and the bottom line is "id" by (4), yielding the desired law.
GREATEST FIXPOINTS
Greatest fixpoints are exactly dual. We will need a little notation
for dealing with existential types: unfortunately, there isn't a
standard one. The following typing and reduction rules should serve to
introduce the notation, which is based on viewing existential types as
generalised products:
Products:
A |- u:U A |- v:V
-----------------------
A |- (u,v) : U*V
A |- t : U*V A, x:U, y:V |- w:W
-------------------------------------
A |- (case t of {(x,y) -> w}) : W
(case (u,v) of {(x,y) -> w}) ---> w[u/x, v/y]
Existentials:
A |- v:V[U/X]
-----------------------
A |- (U,v) : Exists X.V
A |- t : Exists X.V A, y:V |- w:W
------------------------------------- (X not in A, W)
A |- (case t of {(X,y) -> w}) : W
(case (U,v) of {(X,y) -> w}) ---> w[U/X, v/y]
We will combine these in a straightforward way, so that, for instance,
case t of {(X,(k,x)) -> w}
will be used as an abbreviation for
case t of {(X,y) -> case y of {(k,x) -> w}}.
Mitchell and Plotkin [MP 88] write (X,u) as (pack X u) and
(case t of {(X,y) -> w}) as (abstype X y is t in w).
We now take the dual of the previous development. The greatest
fixpoint of F corresponds to a terminal F-coalgebra. An F-coalgebra is
a pair (X,k) consisting of an object X and an arrow k : X -> F X, and a
morphism between (X,k) and (X',k') is given by an arrow h : X -> X'
such that the diagram
k
X ----------> F X
| |
| |
h | | F h
| |
| |
X' ---------> F X'
k'
commutes. Assume Gfix is given by the equation:
T = Gfix X. F X = Exists X. (X -> F X) * X.
Then we can define two functions:
unfold : All X. (X -> F X) -> X -> T
unfold = \X. \k: X -> F X. \x:T. (X,(k,x)),
out : T -> F T
out = \t:T. case t of {(X,(k,x)) -> F (unfold X k) (k x)}.
It follows immediately that the diagram
k
X ----------> F X
| |
| |
unfold X k | | F (unfold X k)
| |
| |
T ----------> F T
out
commutes. In other words, (T,out) is an F-algebra into which
there is a map, called (unfold X k), from every other F-algebra;
that is, (T,out) is a weakly terminal F-coalgebra.
As before, if (T,out) is truly a terminal F-coalgebra, then "out"
is an isomorphism, with the inverse given by
in : F T -> T
in = unfold (F T) (F out).
Furthermore, we will have that (T,out) is terminal iff the condition
k unfold X k
X ----------> F X X -----------> T
| | | |
| | | |
h | | F h implies h | | id
| | | |
| | | |
X' ---------> F X' X' ----------> T
k' unfold X' k'
holds.
(A technical point: this equivalence depends on the equivalence
of the surjective pairing rule and the equivalent rule for pair types:
case t of {(x,y) -> h (x,y)} = h t,
case t of {(X,y) -> h (X,y)} = h t.
If our calculus contains pairs and existentials as primitives, then
it is reasonable to insist on these rules. But if, as is often the
case, we define pairs and existentials in terms of universals, then
these rules will not necessarily hold. But they will necessarily
hold in all models satisfying parametricity! So in parametric
models we are assured the existence of true greatest fixpoints.)
Example: Streams of integers are yielded by the greatest fixpoint of
T(X) = Int * X. The formula above instantiates to
IntStream
= Gfix X. Int * X
= Exists X. (X -> Int * X) * X
~ Exists X. (X -> Int) * (X -> X) * X.
Essentially, what is going on here is that from the abstract type
(X, (h, t, x)) one can only extract terms of the form (h (t^i x)),
which corresponds to the i'th element of the stream.
Hagino suggests that streams be defined by the operations
unfold, head, and tail [Hagino 87]. We can define these by:
unfold : All X. ((X -> Int) * (X -> X) * X) -> IntStream
= \X. \(h, t, x). (X, (h, t, x))
head : IntStream -> Int
= \s. case s of {(X, (h, t, x)) -> h x}
tail : IntStream -> IntStream
= \s. case s of {(X, (h, t, x)) -> (X, (h, t, t x))}
These are just the transposition of "unfold" and "in"
across the isomorphism (X -> Int * X) ~ (X -> Int) * (X -> X).
PRAGMATICS
Regarding pragmatics, it is well known that the embedding of least
fixpoints is less efficient than one would like. For instance, the
operation to find the tail of a list takes time proportional to the
length of the list: one would hope that this takes constant time.
Greatest fixpoints have a dual problem: finding the tail of a stream is
cheap, but consing an element onto the front of a stream is more
expensive than one would like. So one would still be tempted to add
least and greatest fixpoints to a language for pragmatic reasons, though
it is good to know that in doing so one does not change the language in
any fundamental way (in particular, strong normalisation is still
preserved).
Is there a way of coding lists in polymorphic lambda calculus that (a)
uses space proportional to the length of the list, (b) performs cons in
constant time, and (c) performs tail in constant time? Or is there a
proof that this is impossible? So far as I know, this is an open
question.
ACKNOWLEDGEMENT
I'm grateful to John Hughes for comments on an earlier version of this
text. [Update: And to Fruhwirth Clemens and Gabor Greif for spotting
typos.]
BIBLIOGRAPHY
[AMSW 90] S. Abramsky, J. Mitchell, A. Scedrov, and P. Wadler,
Theorems for free, categorically. In preparation.
[Freyd 89] P. Freyd, Structural polymorphism. Series of three e-mail
messages to "types" newsgroup, March 1989.
[Freyd 90] P. Freyd, Recursive types reduced to inductive types. In
J. Mitchell, editor, 5'th Symposium on Logic in Computer Science,
Philadelphia, June 1990. IEEE.
[GLT 89] J.-Y. Girard, Y. Lafont, and P. Taylor, Types and Proofs.
Cambridge University Press, 1989.
[Hagino 87] T. Hagino, A typed lambda calculus with categorical type
constructors. In Pitt, et al., editors, Category Theory in Computer
Science, Edinburgh, September 1987. LNC 283, Springer Verlag.
[Mendler 87] N. P. Mendler, Recursive types and type constraints
in second-order lambda calculus. In 2'nd Symposium on Logic in
Computer Science, Ithaca, New York, June 1987. IEEE.
[MP 88] J. C. Mitchell and G. D. Plotkin, Abstract types have
existential types. ACM Transactions of Programming Languages,
10(3):470--502, 1988. Preliminary version appeared in Proceedings
12'th ACM Symposium on Principles of Programming Languages, 1985.
[Reynolds 83] J. C. Reynolds, Types, abstraction, and parametric
polymorphism. In R. E. A. Mason, editor, Information Processing 83,
pp. 513--523. North-Holland, Amsterdam.
[Wadler 89] P. Wadler, Theorems for free! In 4'th International
Conference on Functional Programming Languages and Computer
Architecture, London, September 1989. ACM.
[Wraith 89] G. C. Wraith, A note on categorical data types. In Pitt,
et al., editors, Category Theory in Computer Science, Manchester,
September 1989. LNCS 389, Springer Verlag.