The **type** mechanism cannot be used to produce a fresh type: only
to re-name an existing type. A Standard ML programmer can introduce a
new type, distinct from all the others, through the use of datatypes.

datatypecolour = red | blue | green;

This introduces a new type, colour, and three *
constructors* for that type, red,
blue and green. Equality is defined for this type
with the expected behaviour that, for example, red =
red and red <> green. No significance is
attached to the order in which the constructors were listed in the
type definition and no ordering is defined for the type. Constructors
differ from values because constructors may be used to form the
patterns which appear in the definition of a function by pattern
matching, as in the following function:
(**fn** red => 1 | blue => 2 | green => 3). The
pre-defined type bool behaves as if defined thus.

datatypebool = true | false;

In Standard ML it is illegal to rebind the constructors of built-in
datatypes such as bool. The motivation for this is to prevent
confusion about the interaction between the derived-forms translation
and runt datatypes such as this--**data****type** bool = true--intended to replace the built-in booleans.
Thus the constructors of built-in datatypes have an importance which
places them somewhere between the constructors of programmer-defined
datatypes and the reserved words of the language.

In constrast to the reverence accorded to the built-in constructors, programmer-defined constructors can be re-defined and these new definitions hide the ones which can before. So imagine that after elaborating the definition of the colour datatype we elaborate the definition of traffic_light shown below.

datatypetraffic_light = red | green | amber;

Now we have available four constructors of two different types.

amber: | traffic_light |

blue: | colour |

green: | traffic_light |

red: | traffic_light |

Another distinctive difference between datatype definitions and type
abbreviations is that the type abbreviation mechanism cannot be used
to describe recursive data structures; the type name is not in scope
on the right-hand side of the definition. This is the real reason why
we need another keyword, ``**datatype**'', to mark our type
definitions as being [potentially] recursive just as we needed a
new keyword, **rec**, to mark our function definitions as being
recursive. One recursive datatype we might wish to define is the
datatype of binary trees. If we wished to store integers in the tree
we could use the following definition.

datatypeinttree = empty | nodeofint * inttree * inttree;

Note the use of yet another keyword, ``**of**''.
The declaration introduces the empty binary tree and a constructor
function which, when given an integer n and two integer trees,
t1 and
t2, builds a tree with n at the
root and with t1 and
t2 as left and right
sub-trees. This tree is simply node (n,
t1,
t2). The reason for the use of the term
``constructor'' now becomes clearer, larger trees are really being
constructed from smaller ones by the use of these functions. The
constructors of the colour datatype are a degenerate form of
constructors since they are nullary constructors.

The mechanism for destructing a constructed value into its component parts is to match it against a pattern which uses the constructor and, in so doing, bind the value identifiers which occur in the pattern. This convenient facility removes the need to implement `destructors' for every new type and thereby reduces the amount of code which must be produced, enabling more effort to be expended on the more taxing parts of software development.

Since Standard ML type abbreviations may define families of types, it would seem natural that the datatypes of the language should be able to define families of datatypes. A datatype definition with a type parameter may be used to build objects of different types. The following tree definition generalises the integer trees given above.

datatype'a tree = empty | nodeof'a * 'a tree * 'a tree;

We see that the node constructor is of type ('a * ('a tree) * ('a tree)) -> ('a tree). There is a peculiar consequence of allowing datatypes to be defined in this way since we might make the type increase every time it is passed back to the type constructor thus making a so-called ``stuttering'' datatype.

datatype'a ttree = empty | nodeof'a * ('a * 'a) ttree * ('a * 'a) ttree;

Standard ML functions cannot be used within their own definitions on values of different types so there is no way to write a recursive Standard ML function which can process these trees, say to count the number of values stored in the tree or even to calculate its depth. We could comment that allowing the parameter in a datatype definition to be inflated in this way when it is passed back has created a slight imbalance in the language because it is possible to define recursive datatypes which cannot be processed recursively. This is not anything more serious than an imbalance; it is not a serious flaw in the language.

A built-in parameterised datatype of the language is the type of lists. These are ordered collections of elements of the same type. The pre-defined Standard ML type constructor list is a parameterised datatype for representing lists. The parameter is the type of elements which will appear in the list. Thus, int list describes a list of integers, char list describes a list of characters and so on.

The list which contains no elements is called nil and if
h is of type 'a and t is of type 'a
list then h :: t--pronounced ``h cons t''--is also
of type 'a list and represents the list with first
element h and following elements the elements of t in
the order that they appear in t. Thus 1 :: nil is a
one-element integer list; 2 :: 1 :: nil is a two-element
integer list and so on. Evidently to be correctly typed an expression
with multiple uses of cons associates to the right. Thus the datatype
definition for 'a list is as shown below. It declares
the cons symbol to be used infix with right associativity and priority
five. The keywords **infix** and **infixr** specify left and
right associativity respectively.

5 ::infixrdatatype'a list = nil | ::of'a * 'a list;

Lists come with derived forms. The notation [2, 1] is the derived form for 2 :: 1 :: nil; and similarly. For consistency, [] is the derived form for nil. As with the bool datatype we cannot re-define :: or nil although, rather curiously, we can tinker with their fixity status and associativity--perhaps a small oversight by the language designers.

All of the parameterised datatypes which we have declared so far have been parameterised by a single type variable but they can be parameterised by a tuple of type variables. We can define lookup tables to be lists of pairs as shown below.

type('a, 'b) lookup = ('a * 'b) list;

**Exercise**

This amusing puzzle is due to Bruce Duba of Rice University. At first it does not seem that it is possible at all. [Hint: you will need a tuple of type variables.]Define the constructors, Nil and Cons, such that the following code type checks.

funlength (Nil) = 0 | length (Cons (_, x)) = 1 + length (x);valheterogeneous = Cons (1, Cons (true, Cons (fnx => x, Nil)))

**Exercise**

It is possible to introduce two values at once by introducing a pair with the values as the elements, e.g. val (x, y) = (6, 7) defines x to be six and y to be seven. Why is it not possible to get around the need to use the keywordandby defining functions in pairs as shown below?val(odd, even) = (fn0 => false | n => even (n - 1),fn0 => true | n => odd (n - 1) );

Datatype definitions can also be mutually recursive. An example of an application where this arises is in defining a programming language with integer expressions where operations such as addition, subtraction, multiplication and division can be used together with parentheses. A Standard ML datatype for integer expressions is shown here.

datatypeint_exp = plusofint_term * int_term | minusofint_term * int_termandint_term = timesofint_factor * int_factor | divideofint_factor * int_factor | moduloofint_factor * int_factorandint_factor = int_constofint | parenofint_exp;

**Exercise**

Define the following functions.

eval_int_exp: int_exp -> inteval_int_term: int_term -> inteval_int_factor: int_factor -> int