Return-Path: wadler@nslocum.cs.bell-labs.com Delivery-Date: Thu Nov 12 14:40:33 1998 Received: from mercury.Sun.COM (mercury.Sun.COM [192.9.25.1]) by cis.ohio-state.edu (8.9.1/8.9.1) with SMTP id OAA16374 for ; Thu, 12 Nov 1998 14:40:31 -0500 (EST) Received: from East.Sun.COM ([129.148.1.241]) by mercury.Sun.COM (SMI-8.6/mail.byaddr) with SMTP id LAA24392; Thu, 12 Nov 1998 11:38:18 -0800 Received: from suneast.East.Sun.COM by East.Sun.COM (SMI-8.6/SMI-5.3) id OAA19822; Thu, 12 Nov 1998 14:37:41 -0500 Received: from galileo.East.Sun.COM (galileo [129.148.75.30]) by suneast.East.Sun.COM (8.9.1b+Sun/8.8.8) with SMTP id OAA08962; Thu, 12 Nov 1998 14:30:13 -0500 (EST) Received: from East.Sun.COM by galileo.East.Sun.COM (SMI-8.6/SMI-SVR4) id OAA21524; Thu, 12 Nov 1998 14:30:11 -0500 Received: from earth.sun.com by East.Sun.COM (SMI-8.6/SMI-5.3) id OAA21266; Thu, 12 Nov 1998 14:30:05 -0500 Received: from dirty.research.bell-labs.com (dirty.research.bell-labs.com [204.178.16.6]) by earth.sun.com (8.9.1/8.9.1) with SMTP id LAA16917 for ; Thu, 12 Nov 1998 11:30:02 -0800 (PST) Received: from nslocum.cs.bell-labs.com ([135.104.8.38]) by dirty; Thu Nov 12 14:28:14 EST 1998 Received: from nslocum.cs.bell-labs.com (localhost [127.0.0.1]) by nslocum.cs.bell-labs.com (8.9.1a/8.9.1) with ESMTP id OAA08903; Thu, 12 Nov 1998 14:28:07 -0500 (EST) Message-Id: <199811121928.OAA08903@nslocum.cs.bell-labs.com> To: java-genericity@galileo.East.Sun.COM, kim@bull.cs.williams.edu (Kim Bruce), pierce@cs.indiana.edu, Didier.Remy@inria.fr, luca@luca.demon.co.uk, matthias@rice.edu, shriram@cs.rice.edu, cork@rice.edu, 99jcv@bull.cs.williams.edu, odersky@cis.unisa.edu.au, Yannis Smaragdakis , "Mads Torgersen" , Kresten Krab Thorup , gilad.bracha@eng.Sun.COM, david.stoutamire@eng.Sun.COM, simonpj@microsoft.com Cc: Philip Wadler Subject: The Expression Problem Date: Thu, 12 Nov 1998 14:27:55 -0500 From: Philip Wadler The Expression Problem Philip Wadler, 12 November 1998 The Expression Problem is a new name for an old problem. The goal is to define a datatype by cases, where one can add new cases to the datatype and new functions over the datatype, without recompiling existing code, and while retaining static type safety (e.g., no casts). For the concrete example, we take expressions as the data type, begin with one case (constants) and one function (evaluators), then add one more construct (plus) and one more function (conversion to a string). Whether a language can solve the Expression Problem is a salient indicator of its capacity for expression. One can think of cases as rows and functions as columns in a table. In a functional language, the rows are fixed (cases in a datatype declaration) but it is easy to add new columns (functions). In an object-oriented language, the columns are fixed (methods in a class declaration) but it is easy to add new rows (subclasses). We want to make it easy to add either rows or columns. The Expresion Problem delineates a central tension in language design. Accordingly, it has been widely discussed, including Reynolds (1975), Cook (1990), and Krishnamurthi, Felleisen and Friedman (1998); the latter includes a more extensive list of references. It has also been discussed on this mailing list by Corky Cartwright and Kim Bruce. Yet I know of no widely-used language that solves The Expression Problem while satisfying the constraints of independent compilation and static typing. Until now, that is. Here I present a solution to this problem in GJ, as extended by the mechanism I described in my previous note `Do parametric types beat virtual types?'. (However, there is a caveat with regard to inner interfaces, see below.) 1. A solution Figure 1 shows a solution to the Expression Problem in GJ. The two phases of the problem are clumped into two classes, LangF and Lang2F, each of which defines several mutually recursive inner classes and interfaces. In the first phase, the class LangF define an interface Exp with a subclass Num representing constants, and an interface Visitor with a method forNum specifying functions over constants. The Visitor class is parameterized on the result type of the function. Class Eval implements Visitor and specifies evaluation of expressions. In the second phase, the class Lang2F extends Exp with an additional subclass Sum representing the sum of two expressions, and extends Visitor with an additional method forSum specifying how to act on sums. Class Eval is extended appropriately, and class Show implements Visitor and specifies conversion of an expression to a string. Finally, a test class creates, evaluates, and shows expressions from both phases. The class Eval in the second phase extends the class Eval from the first phase and implements the interface Visitor from the second phase. So it is essential that Visitor be an interface, not an abstract class. The LangF class is parameterised on a type parameter This that is itself bounded by LangF, and the Lang2F class is parameterized on a type parameter This that is bounded by Lang2F; further, Lang2F extends LangF. This use of `This' is the standard trick to provide accurate static typing in the prescence of subtypes (sometimes called MyType or ThisType). As usual, we tie the knot with fixpoint classes Lang and Lang2. The key trick here is the use of This.Exp and This.Visitor, via the mechanism described in `Do parametric types beat virtual types?'. Recall that mechanism allows a type variable to be indexed by any inner class defined in the variable's bound; in order for this to be sound, any type which instantiates a type parameter must define inner classes that extend those in the bound. Here we can refer to This.Exp and This.Visitor because This is bound by LangF which defines Exp and Vistor; soundness is satisfied since Lang2F.Exp extends LangF.Exp, and Lang2F.Visitor extends Lang2F.Visitor. This solution is remarkably straightforward, once one is familiar with the techniques for simulating ThisType and virtual types. However, I must admit it took me a while to see the solution, even after I went looking for it. (Some of you will have seen an earlier solution, similar in structure but impossible to implement since it had the type variable This.Exp as a supertype of Num and Sum; the current version has no such problem.) ------------------------------------------------------------------------ Figure 1: A solution to The Expression Problem ------------------------------------------------------------------------ class LangF> { interface Visitor { public R forNum(int n); } interface Exp { public R visit(This.Visitor v); } class Num implements Exp { protected final int n_; public Num(int n) {n_=n;} public R visit(This.Visitor v) { return v.forNum(n_); } } class Eval implements Visitor { public Integer forNum(int n) { return new Integer(n); } } } final class Lang extends LangF {} class Lang2F> extends LangF { interface Visitor extends LangF.Visitor { public R forPlus(This.Exp e1, This.Exp e2); } class Plus implements Exp { protected final This.Exp e1_,e2_; public Plus(This.Exp e1, This.Exp e2) {e1_=e1; e2_=e2;} public R visit(This.Visitor v) { return v.forPlus(e1_,e2_); } } class Eval extends LangF.Eval implements Visitor { public Integer forPlus(This.Exp e1, This.Exp e2) { return new Integer( e1.visit(this).intValue() + e2.visit(this).intValue() ); } } class Show implements Visitor { public String forNum(int n) { return Integer.toString(n); } public String forPlus(This.Exp e1, This.Exp e2) { return "(" + e1.visit(this) + "+" + e2.visit(this) +")"; } } } final class Lang2 extends Lang2F {} final class Main { static public void main(String[] args) { Lang l = new Lang(); Lang.Exp e = l.new Num(42); System.out.println("eval: " + e.visit(l.new Eval())); Lang2 l2 = new Lang2(); Lang2.Exp e2 = l2.new Plus(l2.new Num(5), l2.new Num(37)); System.out.println("eval: "+e2.visit(l2.new Eval())); System.out.println("show: "+e2.visit(l2.new Show())); } } ------------------------------------------------------------------------ 2. A caveat with regard to inner interfaces In GJ as it is currently implemented, type parameters do not scope over static members, and further, a type parameter may be indexed only by non-static classes or interfaces defined in the bound. And in Java as it is currently defined, all inner interfaces are taken as static, whether declared so are not. This makes the mechanism for indexing type variables by inner classes useless for interfaces, greatly reducing its utility. In particular, it invalidates the solution just presented, which depends on Visitor being an interface. Fortunately, it looks possible to relax either the GJ or the Java constraint. So far as I can see, the only difference in making an interface non-static is that it can now include non-static inner classes; so the change would not render invalid any existing Java programs. But this point requires further study. Also, I should note that since the changes have not been implemented yet, I have not actually run the proposed solution. (I did translate from GJ to Java by hand, and run that.) 3. Related work It is instructive to compare this solution with previous solutions circulated by Corky Cartwright and Kim Bruce. Corky's solution requires contravariant extension -- that is, even though Lang2F.Exp extends LangF.Exp, one may use LangF.Exp in place of Lang2F.Exp and not conversely. This partly explains why fixpoints are required here: though LangF is a superclass of Lang2F, the classes Lang and Lang2 are unrelated. Short of complicating the language with contravariance, unrelated classes is the best we could hope for. LangF / \ Lang Lang2F / Lang2 Kim's solution required a type to be parameterized over a type constructor (rather than another type). In terms of our example, it required Exp to be parameterized on Visitor. Here, instead of paramerizing Exp on Visitor, Exp refers to This.Vistor. Although GJ supports parametization over types, it does not support parameterization over higher-order type constructors. However, virtual types (as simulated by GJ) in effect support higher-order type parameters for free. I'm grateful to Mads Torgersen and Kresten Krab Thorup for this insight, which they passed on when we discussed this problem at OOPSLA a few weeks ago. (Ironically, though, it looks like this solution won't work in Beta, which lacks interfaces or any other form of multiple supertyping; there also may be a problem in having a single expression type that allows visitors with different result types, like Integer and String.) The solution presented here is similar to the Extended Visitor pattern described by Krishnamurthi et al. Their solution differs in that it is not statically typed; they cannot distinguish Lang.Exp from Lang2.Exp, and as a result must depend on dynamic casts at some key points. This isn't due to a lack of cleverness on their part, rather it is due to a lack of expressiveness in Pizza. I am aware of two solutions to the expression problem, but both depend on special-purpose language extensions designed specifically for that problem. One appears in the Krishnamurthi et al. paper, the other in a master's thesis by a student of Martin Odersky. In contrast, the solution presented here arises from the general purpose mechanisms of type parameters and virtual types. I'd be grateful for pointers to other solutions to the Expression Problem. How do Beta, Sather, Ocaml, and others fare? Cheers, -- P References W. R. Cook (1990). Object-oriented programming versus abstract data types. REX workshop on Foundations of Object-Oriented Languages, Springer-Verlag LNCS 489, 1990. S. Krishamurthis, M. Felleisen, and D. Friedman (1998). Synthesizing object-oriented and functional design to promote re-use. ECOOP 1998, Springer-Verlag LNCS 1445, July 1998. J. C. Reynolds (1975). User-defined types and procedural data as complementary approaches to data abstraction. In S. A. Schuman, editor, New Directions in Algorithmic Languages, IFIP Working Group 2.1 on Algol, INRIA, 1975. Reprinted in D. Gries, editor, Programming Methodology, Springer-Verlag, 1978, and in C. A. Gunter and J. C. Mitchell, editors, Theoretical Aspects of Object-Oriented Programming, MIT Press, 1994.