Topiary

Michael P. Fourman

October 29, 2006

Aims

In this practical, you will learn to reshape trees.

Assessment

Your work will be assessed on the basis of the correctness of the functions you implement. Your code should have the structure described in the section Practicalities, and be placed in the file CS201/Prac4/Optimise.ML under your home directory. The assignment of marks to the various components of the exercise is given, as comment, in the signature OptimiseSig. This weighting is not intended to reflect the relative difficulty of implementing the various functions; it is designed to ensure that those who do a reasonable job on the heavily weighted functions get a reasonable mark, while those who seek an excellent mark have to work for it.

Deadline

The revised deadline for this practical is 6.00pm, Friday 15th April.

Introduction

Seldom does a programmer have the luxury of starting from a clean slate. In this practical, you will modify, and improve on, an existing system. Code for this system is given, and documented, in an appendix to this document. However, you don’t need to understand the implementation details of the code provided in order to complete the practical.

In this practical we use syntax trees to represent algebraic expressions. The existing system provides an implementation of the abstract syntax of expressions, and functions to compile, and execute, code to evaluate an expression. It uses a stack-based, evaluator; “compilation” is accomplished by post-fix traversal of the syntax tree.

Optimisation

The stack code produced for an expression may be unneccessarily inefficient. Evaluating the stack code for an expression may require a deeper or shallower stack, depending on the way the expression is written. Expressions involving only constants may be evaluated, once and for all, at ‘compile time; our code generator produces code to evaluate them at run time. Algebraic manipulation of the expression, before compilation, could lead to better code.

Your task is to apply simple algebraic transformations to the syntax tree, before passing it to the compiler, in order to optimise the code produced. You should perform four optimisations, in turn: reshaping, constant amalgamation, constant elimination, re-ordering. These are described individually below.

Before you start coding your solutions, you should make sure you understand what is required. To consolidate your understanding, draw diagrams of the trees involved, for some simple examples.

Four optimisations

reshaping
Code for the expression
u + (v + (w +  (x + (y + z))))
produces the RPN code “u v w x y z  +   +   +   +  +”, which stacks up all the arguments before doing any addition. The algebraically equivalent expression
((((u + v) + w) + x) + y) + z
produces the stack code “u v + w + x + y + z +”. This only requires a stack of depth 2.

You should write a function reshape: Expn -> Expn to implement this optimisation. Your function should apply one of the left-rotations1 :

x + (y + z) ⇒    (x + y) + z

x × (y × z) ⇒    (x × y) × z
to reshape the tree, wherever possible. It seems easiest to do this top-down; you should apply the rules repeatedly to a tree, and then apply the same rules to the subtrees of the result. To reshape a tree:
constant amalgamation
The expression, ((x + 3) + y) + 7, produces the stack code, “x 3  +  y  +  7 +”. The algebraically equivalent expression, (x + y) + 10, produces the code “x y  +  10 +”.

Our next two optimisations cooperate to combine and eliminate constants. Since constants may be separated in the tree, as in the example just given, we may have to collect them together before performing any arithmetic. Your second optimisation should be a function that collects together and amalgamates constants.

You should write a function amalgam: Expn -> Expn. Your implementation should assume that sequences of additions or multiplications are associated to the left; later you will use the optimisation of the previous section to reshape the tree before applying this function. The following rules2 , applied top-down, will amalgamate multiple constants, occurring on the right-hand-side of an operator, in a sequence of multiplications, and push the product down the tree:

(x × m ) × n  ⇒   x × m  × n
     --    --         -------
 (x × y) × n- ⇒   (x × n) × y
Your function should include similar rules for addition.
constant elimination
Once constants have been amalgamated by the function amalgam, we can complete the elimination by evaluating constant expressions. In a few special cases—addition of 0, multiplication by 0 or 1—we can eliminate constants altogether. Here are the revelant algebraic transformations:
m- × n- ⇒    m-×-n--
 m + n  ⇒    m + n
 --  --      -------
 0-× x  ⇒    0-
 x × 0- ⇒    0-

 1-× x  ⇒    x
 x × 1- ⇒    x

 0-+ x  ⇒    x
 x + 0- ⇒    x
You should write a function delim : Expn -> Expn that performs the transformation if any of these apply, and otherwise returns the expression unchanged. Then write a function elim : Expn -> Expn that applies delim bottom-up: given a tree,
re-ordering
The depth of stack required to evaluate the RPN code for a given binary tree is one more than the right-height of the tree; the right-height of a leaf is 0, the right-height of a node is the maximum of,

The commutative laws for addition and mutiplication

x × y  ⇒    y × x

x + y  ⇒    y + x
may be applied, judiciously, to reduce the right-height of a syntax tree. For example, the tree representing a + (b× (c + d)) has right-height 3, while the algebraically equivalent expression, ((c + d) × b) + a has a right-height of 1.

You should write a function rightHeight : Expn -> int to compute the right-height of a syntax tree, and another reorder : Expn -> Expn that applies the transformations given above, bottom-up, whenever the right-height of x is less than the right-height of y. This means that you should recursively apply the transformation to the two subtrees before seeing if you need to adjust a node.

Finally, you should combine your optimisations into a single transformation, using the ML infix operator o for function composition:

val optimise = reorder o elim o amalgam o reshape;

Practicalities

If you run ML with the command ml prac4, the identifiers ++ and ** will be set up as infix, with their usual precedences. But, in order to make it easy to exercise your optimisations, they have been made right-associative. All the structures documented in the appendix are predefined in the prac4 database.


signature OptimiseSig =  
sig  
    val reshape     : Expn -> Expn (* 15 marks *)  
    val amalgam     : Expn -> Expn (* 10 marks *)  
    val elim        : Expn -> Expn (* 8  marks *)  
    val rightHeight : Expn -> int  (* 3  marks *)  
    val reorder     : Expn -> Expn (* 4  marks *)  
    val optimise    : Expn -> Expn  
end;

Figure 1: The functions you should implement.

As usual, a signature, OptimiseSig, (see Figure 1) has been given for the code you are asked to write. You should place your code in a structure Optimise:OptimiseSig in a file Prac4/Optimise.ML. Any functions you have been unable to implement should be replaced by dummies of the correct type.

Appendix - stack-based evaluation of expressions

This appendix documents the code provided for the practical. The information provided here goes beyond what you will need to complete the practical, but it should be of general interest.

The code provided has four main components:

These components are used by the structure TopLevel to implement a simple, but fairly powerful, expression evaluator, that will compile a list of declarations.

Abstract syntax of expressions

infixr 6 ** infixr 4 ++  
 
structure Expn = struct  
datatype Expn =  
         Id    of string      (* identifiers    *)  
       | Lit   of int         (* literals       *)  
       | op ++ of Expn * Expn (* addition       *)  
       | op ** of Expn * Expn (* multiplication *)  
end

The abstract syntax provides for algebraic expressions in + and ×, with integer constants, and arbitrary strings as identifiers.

The environment


signature EnvironmentSig =  
sig  
   type Environment  
   val empty  : Environment  
   val lookup : Environment -> string -> int  
   val enter  : (string*int) * Environment -> Environment  
end

Figure 2: A signature for the structure Environment

The values associated with identifiers will be stored in a datatstructure called the environment. The signature EnvironmentSig, see Figure 2, provides an interface to the structure Environment, which uses an association list, a (string*int) list of pairs, each consisting of a string and the associated integer value, as an underlying datastructure. This implementation has, intentionally, been made transparent, in order that you can see the effects of declarations as they are made.

We will use the value empty to represent a new environment, the function enter to add new bindings to an environment, and the function lookup to find the value associated with a given string.

The abstract machine


structure Machine =  
struct  
datatype Action = PushLit of int  
                | PushVal of string  
                | Mul  
                | Add  
 
fun execute env code =  
let exception Eval  
 
    fun v s = Environment.lookup env s  
 
    fun run(args, PushLit n :: ops) = run(  n :: args, ops)  
      | run(args, PushVal e :: ops) = run(v e :: args, ops)  
      | run(a::b::args, Mul :: ops) = run(a*b :: args, ops)  
      | run(a::b::args, Add :: ops) = run(a+b :: args, ops)  
      | run([result],           []) = result  
      | run _ = raise Eval  
in  
    run([], code)  
end  
end

Figure 3: Code for the abstract machine

The structure Machine, given in Figure 3, provides a model for a stack-based evaluator. The machine has four actions: we can push a literal, or the value of an identifier, onto the stack, or apply one of the arithmetic operations, +, and × to the top two elements of the stack.

Code for the evaluator consists of a list of actions. The function run is constructed by specifying the state transition corresponding to the execution of each Action. The components of the state are: args, an argument stack; and code, a list of actions. The environment, needed by the machine to look up the values of identifiers, is passed as a parameter, env.

To execute a given code, we perform each of the actions in turn, starting with an empty stack. Running code compiled from a syntax tree should leave a single value on the stack. This value is returned as the result.

Compilation

infixr 4 ++  
infixr 6 **  
 
structure Compile =  
struct  
local open Expn Machine  
      fun codeacc (Id s,   rest) = PushVal s :: rest  
        | codeacc (Lit n,  rest) = PushLit n :: rest  
        | codeacc (a ++ b, rest) = codeacc(a, codeacc(b, Add :: rest))  
        | codeacc (a ** b, rest) = codeacc(a, codeacc(b, Mul :: rest))  
in  
   fun code expn = codeacc(expn, [])  
end  
end;

The function, code, produces the stack code for a given expression. It is based on the post-order traversal of a binary tree described in the notes.

infixr 4 ++  
infixr 6 **  
 
structure TopLevel =  
struct  
local  
fun adddecs ((s,e) :: decs) env =  
       let val v = Machine.execute env (Compile.code e)  
       in adddecs decs (Environment.enter((s,v), env)) end  
  | adddecs [] env = env  
in  
  fun compile decs = adddecs decs Environment.empty  
end  
end;

The structure TopLevel uses the evaluator to compile and run a sequence of declarations. As an example of its use, consider the following ML code

open Expn TopLevel;  
 
val a = Id "a" and b = Id "b" and c = Id "c"  
 
val mydecs =[ ("a", Lit 4),  
              ("b", a ++ Lit 1),  
              ("c", a ** b),  
              ("a", a ++ c) ];  
 
val finalEnv = compile mydecs;

Running this produces the output

val finalEnv = Env [("a", 24), ("c", 20), ("b", 5)] : Dict

Compare this with the following ML code:

val a = 4;  
val b = a + 1;  
val c = a * b;  
val a = a + c;  
 
(a,b,c);

which produces the final response

 > val it = (24, 5, 20) : int * int * int

Concluding remarks: Functions and Environments

Interaction with the ML system generates an environment in which future declarations are evaluated, just like the compile function provided by the structure TopLevel. The similarity goes much deeper; by choosing the oppropriate environment for evaluating the code for an expression, we can implement let expressions and functions. The body of a let expression is evaluated in an environment including the bindings generated by the local declarations, but these are not added to the top-level environment. When we apply a function, the body is evaluated in an environment which binds the formal parameters to the values of the actual parameters. Taking these ideas a litte further would allow us to implement recursive functions, and curried functions. But we are already well away from the substance of the practical.