NEXT ·  UP ·  PREVIOUS ·  CONTENTS ·  INDEX

The String structure

The String structure provides functions to extract substrings and to process strings at a character-by-character level. These include String.substring which takes a triple of a string and two integers to denote the start of the substring and its length. Indexing is from zero, a convention which is used throughout the library. The String.extract function enables the programmer to leave the end point of the substring unspecified using the option NONE. The meaning of this is that there is to be no limit on the amount of the string which is taken, save that imposed by the string itself. Thus for example, String.extract (s, 0, NONE) is always the same as s itself, even if s is empty. The String.sub function may be used to subscript a string to obtain the character at a given position. Lists of strings may be concatenated using String.concat.

Several high-level functions are provided for working with strings. Two of the most useful are a tokeniser, String.tokens, and a separator, String.fields. These are parameterised by a function which can be used to specify the delimiter which comes between tokens or between fields. The principal distinction between a token and a field is that a token may not be empty, whereas a field may. Used together with the character classification functions from the Char structure these functions provide a simple method to perform an initial processing step on a string, such as dividing it into separate words. Another useful function is String.translate which maps individual characters of a string to strings which are then joined. Used together with String.tokens, this function provides a very simple method to write an accurate and efficient lexical analyser for a formal language. The method is simply to pad out special characters with spaces and then to tokenize the result naïvely.

NEXT ·  UP ·  PREVIOUS ·  CONTENTS ·  INDEX