In earlier lectures, we have discussed the use of signatures and structures to specify and group together related types and functions. In this note we introduce a final component of the ML modules system, functors, which support “top-down” software development, and the implementation of reusable code.
We begin with a brief review of signatures and structures, then introduce functors as a mechanism for code reuse, and finally discuss the use of functors in software development.
As we have seen, an implementation of a data-structure, such as a queue, provides a type, together with a collection of functions. We can group together a collection of values using a record. However, to provide a queue package, we need to group together some values and some types. To do this we need a new kind of structuring mechanism, a structure. Values have types associated with them and, analogously, structures have signatures.
We can access the components of a structure using a qualified name. Thus IntQueue.empty would represent the empty queue, and the function to add an element to a queue could be accessed as IntQueue.enq (e.g. IntQueue.enq(1, IntQueue.empty)). Structures are not values. We cannot pass them to other functions, embed them in other values etc.
Signatures can be used to specify what we require of a structure. A structure is an implementation of, or matches a signature provided it declares the specified types and values (in any order). Signatures are like types: a most general signature is inferred for each structure, just as a most general type is inferred for each value; just as we can constrain the type inferred for a value using a type constraint, we can constrain the signature for a structure using a signature constraint. We can use as signature to hide values and constructors, used to implement the structure, that should not be visible to the outside world.
Functions such as length are polymorphic; we can apply the length function to a list of any type of value. Similarly, we could easily use lists to provide a polymorphic implementation of stacks; to implement the stack operations, we don’t need to know what type of value is being manipulated. Life is not always so easy.
Consider the example of a Priority Queue. Our implementation of a priority queue of integers as an ordered list depends on the type of item in the queue, and, crucially, on the priority ordering we choose. Clearly we could use the “same” idea to implement other priority queues. We would like to parameterise our implementation, on a type of item, and a chosen priority ordering.
Given a type and an ordering on that type:
we can implement a priority queue, using the given predicate (which we assume is a total ordering), as the priority order. The result is a structure, containing a new type representing the queue and a collection of operations that act on this type. From a collection of types an values (a structure) we construct a new structure; we express this construction as an ML functor. Just as a function takes a value and produces a new value, so a functor takes a structure and produces a new structure. We can use the code for IntPQ to implement the priority queue construction:
The syntax for functors uses a new keyword, functor. The functor name is followed by a list of formal parameters, and then a body just like that of a structure declaration. In the declaration of PQUEUE the formal parameters are specified as types and values. Structures, each constrained by a signature, can also be introduced as formal parameters to functors; we will see examples of this later.
How can we use this functor? We provide appropriate types and values to match the formal parameters. Suppose we want to work with queues of integers, greater integers having higher priority, we apply the functor PQUEUE as follows
structure IntPQ = PQUEUE(type Item = int
val op > = op > : int * int -> bool) |
The structure IntPQ behaves just like our earlier implementation. But now the code is reusable; we can create many different argument structures—and so many different priority queues—without rewriting the code implementing the priority queue. Using functors we can write reusable code to implement datastructures that depend on a collection of types and values.
If, for example, we want a queue of pairs of integers, with a particular priority ordering, this could be given by
local
fun higher((x,y), (x’,y’)) = (x:int) > x’ orelse (x = x’ andalso (y:int) > y’) in structure PairPQ = PQUEUE(type Item = int*int val op > = higher) end |
We can also apply the functor directly to a structure that provides the appropriate parameter types and values. We can apply the functor PQUEUE to any structure that matches the signature OrdSig
For example, we could package various operations on integers in a structure IntItem
and then can apply the functor PQUEUE to this structure
structure IntPQ = PQUEUE(IntItem)
|
Instead of providing arguments to functors piecemeal, it is often more convenient to implement structures, such as IntItem, that will match a variety of signatures, and so can be used in a wide variety of contexts.
As well as supporting the development of reusable code, functors provide a mechanism for documenting and controlling the dependencies between software components. To illustrate this we revisit the stack-based evaluator for expressions introduced in Practical 4.
The code given in the appendix to Practical 4 consists of five inter-related structures; the implementation of one structure uses types and values provided by another. In this code, the dependencies are expressed using qualified names. Although this is a simple system, it requires close examination of the code, and a moment’s thought, to determine the right order in which to compile the various structures. It probably took you some time to realise that, for the purposes of Practical 4, you only needed to look at one structure Expn.
Using functors, we can make the dependencies explicit. Here is the code needed to build the structure TopLevel from a collection of functors.
The dependencies are made explicit: Expn and Environment have no dependencies, Machine depends on Environment, Compile depends on Machine and Expn, and TopLevel depends on everything else. Moreover, the code for the various functors can be written and modified independently, provided the interfaces are not changed. The language of functors can also help in specifying a coding task. For example, if in Practical 4 you had been asked to implement a functor with header
functor OPTIMISE(structure Expn:ExpnSig):OptimiseSig
|
you would have known immediately that only the signature ExpnSig was relevant to your task.
We use the rest of this note to look in more detail at the implementation of this example. The example provides code modelling the compilation and execution of code for a simple stack machine. The source language for the compiler consists of algebraic expressions.
We begin with the interfaces, given by signatures that will be used to specify the formal parameters to our functors. Starting at the beginning, the datatype Expn representing expressions is declared in a structure of its own. The signature carries a complete description of the type—this is the type we want; nothing else will do.
The system we are modelling will allow us to make a sequence of declarations, analogous to ML val declarations. A declaration is represented as a string*Expn pair. The TopLevel structure will provide a function that compiles a list of declarations to produce an environment, together with a structure Expn representing expressions, an environment type, and a function to allow the user to lookup the values of identifiers in the environment. All this is specified in the signature TopLevelSig
The Environment will be implemented by a structure Environment. This will be used by other modules that will need access to an empty environment, and a function to add new bindings to the environment, as well as the lookup function already mentioned.
The stack machine is modelled by the structure Machine this provides a type of Action, the primitive machine instructions, and a function that produces the result of executing a sequence of actions in a given environment
The compiler produces machine code for a given expression.
Each signature includes any types (except the built-in types) used in the types of the values it declares. This makes the signatures self-contained.
We don’t want to dwell on the details of the code used to implement the system—it uses tree traversal and lists in a straightforward way, and doesn’t introduce any new ideas—but the functor TOPLEVEL does illustrate a subtlety that often arises when using functors and structures. We begin with the more-straightforward examples. The code is just like that given in the appendix to Practical 4. It is described there.
The parameter specification for the TOPLEVEL functor includes a number of sharing constraints. This is the subtlety we spoke of: consider the following line of code
v = Machine.execute(env)(Compile.code(e))
|
From the signatures MachineSig and CompileSig we see that types of the functions are
Machine.execute : Environment -> Action list -> int
Compile.code : Expn -> Action list |
The signatures MachineSig and CompileSig both include a type called Action, but there is no guarantee that implementations of these signatures will both use the same implementation for this type. However, when we write the line of code given above, we certainly intend that the functor will only be applied to structures, MachineSig and CompileSig, that use the same implementation for the type Action. The sharing constraint tells the compiler that this is what we intend (so it can type check the code), and when we apply the functor the system checks that the constraining is satisfied. Unexpected type errors when developing a new functor may result from a missing sharing constraint.
Functors provide yet another abstraction mechanism: when we write the body of a functor we can only rely on the specification of the formal parameters—there is no implementation. So if we always use functors, and always access our abstract data types via formal parameters, we don’t need to use abstype as a extra abstraction barrier. However, using the abstype mechanism is the only way to erect an impregnable barrier in ML. We will make our implementations of datastructures such as queues abstract using abstype; when using functors to organise the code for an application, we take a more relaxed view and normally rely on the disciplined use of functors to ensure that we are not making unwarranted use of implementation details.
Books that talk about software engineering usually have a section that discusses bottom-up and top-down program development. In the bottom-up approach you start, as the name suggests, from the bottom and work your way towards the goal, i.e. the problem to be solved. Your initial environment might not support sets, for example, and so your first step might be to implement a set package. Using this you can then write a package to support graphs. At each step you are increasing your programming ‘vocabulary’ until you eventually reach the stage where you can express the solution to the original problem in a concise form. In the top-down approach you express the solution to the original goal in terms of the solutions to some subgoals. You then develop algorithms to solve the subgoals, each of these depending on solutions to further subgoals. This process is repeated until you reach a goal that is trivial enough to be solved using the basic building blocks of the language directly.
Both of these approaches have their problems. The bottom-up approach is not goal directed – one might waste time developing some code that may never be used in the final program. The advantage of this strategy is that the code can be run and tested at each step. The top-down approach is clearly driven by the goal to be solved. Unfortunately, at each stage the solution of a goal depends on the code for subgoals that have yet to be solved. It can be difficult to test the program until all of the code has been written. In reality, of course, people will use a mixture of these two approaches. The hope is that they will reap the benefits of both approaches. The danger is that they will suffer the disadvantages of both.
Structures and functors can be used to support both styles of program design. In the bottom-up approach the basic building blocks are structures. We start by building some simple structures, and then apply functors to these to create more complex structures. We continue this process until we eventually produce a structure representing the final solution. At each stage in the development we will have a collection of structures that can be tested. In the top-down approach the basic building blocks are functors. We start by writing a functor to solve the main goal. This will be parameterised on signatures representing solutions to some subgoals. We can type-check this functor, but we cannot evaluate the code until we apply the functor to structures matching the argument signatures. To produce these we must first solve the subgoals using other functors. These will also depend on signatures representing smaller subgoals. You can visualise the process as building a tree, starting from the root. Eventually the leaves will be simple enough that we can construct structures for them directly, rather than using functors. At this point you can apply the functors to the structures, bottom-up, until you produce a structure representing the final program. The functor application phase is similar to program linking in a more conventional language. This analogy can be carried further, as functors can be separately compiled and loaded in some ML systems. There is nothing stopping you mixing these two styles of development. Intermediate goals can be identified and solved in a top-down fashion, and the resulting structures can then be used as the basis of a bottom-up solution of the original goal.
When developing structures, you will have discovered that error messages for a large block of faulty code may be un-informative. A useful strategy is to develop the code interactively at the top level, and then to package it as a structure once it is correct. When developing a functor we cannot use this strategy as our code depends on the formal parameters.
PolyML provides a function PolyML.ifunctor to assist in the interactive development of functors. Here is an example of its use
> PolyML.ifunctor "C" "CompileSig";
val it = () : unit > open C; type Expn type Action val code = <undefined> : Expn -> Action list > |
The function takes two string arguments, the first will be used as the name of a dummy structure matching the signature given by the second. Opening this dummy structure gives a collection of dummy types and values as specified by the signature. Using these, you can develop and type-check the functor code interactively. Since the dummy values haven’t been implemented, you can’t run it.
To develop a functor with parameters including structures and sharing constraints, just declare a signature corresponding to the parameter specification. For example, to develop the functor TOPLEVEL we could have declared a signature ArgumentSig
signature ArgumentSig = sig
structure Machine: MachineSig and Compile: CompileSig and Environment: EnvironmentSig and Expn : ExpnSig sharing type Compile.Action = Machine.Action and type Environment.Environment = Machine.Environment and type Compile.Expn = Expn.Expn end; |
and then used ifunctor as follows
> PolyML.ifunctor "Args" "ArgumentSig";
val it = () : unit > open Args; structure Environment : EnvironmentSig structure Expn : ExpnSig structure Machine : MachineSig structure Compile : CompileSig |
(C) Michael Fourman 1994-2006