The ML type system controls access to data, ensuring that functions are only applied to arguments of the appropriate type. The basic types are unit, bool, int, real, and string. Product types, whose values are tuples or records allow us to group several data values as a single object. For example, we can make the type abbreviation
However, the type declaration only introduces an abbreviation; it does not prevent us from applying a function, such as Argand.++, intended to add two complex numbers represented by polar coordinates to data-values using the cartesian representation, or even to a pair of reals representing the weight and volume of some solid object! So, type abbreviations allow us to confuse values that shouldn’t be confused.
Sometimes, we want to apply the same function to objects with different representations. For example, in a drawing package, we might want to write a single function to move a variety of geometric shapes, with a variety of representations. The ML type system doesn’t allow us to apply the same function to different types of object. So we seem to have the worst of both worlds: the type system allows us to make mistakes, but doesn’t allow us to do what we want.
In this note, we introduce the ML datatype declaration. It provides a solution to both these difficulties. Consider the problem of implementing a type of geometric objects: circles, squares, lines, and rectangles. Here are some type abbreviations we might use to model these different types of object individually:
This example exhibits both problems. First, we can all too easily pass lines off as rectangles, and squares as circles. For example, a function for drawing Circles could be applied to a data value representing a square. Second, because, for example, squares and rectangles are represented by different types, we cannot define a single drawing function that will take a geometric object, and draw it, or move it.
The ML datatype declaration allows us to define a single type of geometric objects
This declaration introduces a new type, Object, that can be used in the same ways as any existing type. It also introduces four constructor functions:
These can be used to construct values of the new type:
We can use constructor functions in expressions, just like any other functions. However, we can also use constructors to form patterns that allow us to define functions on the new type:
ML datatype declarations allow us to introduce a new type by saying how to construct values of that type. Any value of the new type must have been constructed by applying one of the constructor functions, so we can use pattern-matching to decide which constructor function was used, and to decompose the value into its components.
We can picture the new type introduced by a datatype declaration like the one above as the disjoint union of the old types. This is called a sum, partly because the number of elements in the new type is the sum of the numbers in each of the constituent types.
A special case of the datatype declaration allows us to introduce constants of the new type, as in the following example
Here we introduce a new type Colour, and constants
Since these constants are constructors, we can use them in patterns
This example highlights the difference between constructors and other identifiers: they are treated quite differently in patterns. If we had made a slight mistake (using a lower-case y for Yellow) in typing the previous example,
the effect of the function would be quite different. Since yellow has not been declared as a constructor, it is treated in the pattern as a variable, and can match any value of type Colour.
To minimise the risk of this kind of mistake, we normally give constructors an initial capital, and begin other identifiers in lower-case. This convention also helps the human reader to understand the programmer’s intentions.
Add something here to explain the difference between constructor and type constraint. (This may not seem necessary, but there is much confusion on this point!
We can also use the datatype construction to introduce a new type, even when we don’t want to form a sum, or enumeration type. For example, the declaration
creates a new type Point that cannot be confused with real * real. Formally, the constructor Point is a function that converts pairs of reals to values of type Point; although, in fact, an implementation will normally use the same internal representation for both types, and optimise away this function. However, to declare a function on such a type we must use the constructor in patterns:
fun distance (Point(x1,y1)) (Point(x2,y2)) =
sqrt( square(x2 - x1) + square(y2 - y1) ) |
We have introduced three ideas
> datatype Example = Value ;
datatype Example = Value > val a = Value ; val a = Value : Example > datatype Example = Value ; datatype Example = Value > val b = Value ; val b = Value : Example > a = b; Error Can’t unify Example with Example Exception static_errors raised |
Here, the values bound to a and b have different types (which, confusingly, have the same name).
Later we will see that the datatype declaration can also be used recursively. (C) Michael Fourman 1994-2006