\chapter{The Symbol Table} @i pool.w \section{The Symbol Table} The same name may appear many times in a program. We use a hash table so that only one copy of the name needs to be saved. These strings are not accessed directly, but via structures that collect further information about the names. The \verb|name_info| structures are linked together into (hopefully) short lists sharing the same hash key by means of the \verb|hash_link| field. \label{td_name_info} @d Symbol table typedef declarations @{ typedef struct name_info { char *name; @< Additional information about the name @> struct name_info *hash_link; } name_info, *name, *names; @| name names name_info hash_link @} The pointers to the heads of the hash lists are stored in the array~\verb|hash|. Identifiers can be found by computing a hash code~\verb|h| and then looking at the identifiers represented by the names \verb|hash[h]|, \verb|hash[h]->hash_link|, \verb|hash[h]->hash_link->hash_link|,~\dots, until either finding the desired name or encountering the null pointer. The table is maintained by the function \verb|find_name|, which finds a given identifier and returns the appropriate \verb|name|. If there is no match for the identifier, it is inserted into the table. The \verb|hash| table has size \verb|hash_size|, which should be a prime number. Our hashing scheme is represented by Figure~\ref{fig:hashing}. \begin{figure}[hbt] \centering \includegraphics{hash.EPSF} \label{fig:hashing} \caption{A hashing scheme} \end{figure} @d Symbol table global variables @{ #define hash_size 353 /* should be prime */ names hash[hash_size]; /* heads of hash lists */ @| hash hash_size @} Initially all the hash lists are empty. Although strictly speaking the array \verb|hash| should be initialised to null pointers by the compiler, we don't count on this. @d Initialise symbol table globals @{ { int i=hash_size; do hash[--i]=NULL; while(i>0); } @} Here is the main function for finding names in the hash table. The parameter \verb|ident| points to a null-terminated string. @d Symbol table prototypes @{ extern name find_name (char *ident); @} @d Symbol table functions @{ name find_name (char *ident) { int h; @< Compute the hash code \verb|h| of the string \verb|ident| @> @< Find and \verb|return| the associated name, % possibly by entering a new identifier into the table @> } @| find_name @} \label{fn_find_name} A simple hash code is used: If the sequence of character codes is $c_1c_2\ldots c_n$, its hash value will be $$ (2^{n-1}c_1+2^{n-2}c_2+\cdots+c_n)\bmod\hbox{\verb|hash_size|}. $$ This isn't a particularly inspired choice of hash function, but it is adequate for the practical. @d Compute the hash code \verb|h| of the string \verb|ident| @{ { char *p = ident; h=*p; while (*(++p) != 0) h=((h<<1)+*p)%hash_size; } @} Here we either find an existing node for the identifier, or create one, inserting it into the name table and the hash list. The maximal average length of hash lists should be sufficiently small that linearly searching them is quite fast. @d Find and \verb|return|... @{ { name p=hash[h]; /* the head of the hash list */ while (p!=NULL && strcmp(p->name,ident)) p=p->hash_link; if (p==NULL) /* we haven't seen this identifier before */ @< Make \verb|p| point to ... @> return p; } @} Here is the code to create an entry for a new identifier. We just use \verb|malloc| to create the storage for the name. If the allocation fails we stop the program. @d Make \verb|p| point to a fresh \verb|name_node| % and link the node to hash list \verb|h| @{ { p = new(name_info); if ((p->name = malloc(strlen(ident)+1)) == 0) @< Report memory exhausted and exit @> p->hash_link = hash[h]; hash[h] = p; /* insert p at beginning of hash list */ strcpy(p->name, ident); p->table = NULL; } @} \section{Handling Scopes} The purpose of a symbol table is not just to share the storage for identical names. It is also used to store information associated with each name, such as its size, type and location in the current scope. Attached to each name in the hash tabel is a symbol table entry. The intention is that the lexer converts a string of characters into a name and passes this to the parser. The parser then hangs the semantic information off of this name. The information we need to store for procedures is slightly different from other names, and so we use a union for these fields. \label{td_symbol_table_info} @d Symbol table typedef declarations @{ typedef union symbol_table_info { struct { @< Symbol table shared fields @> @< Symbol table name fields @> } name; struct { @< Symbol table shared fields @> @< Symbol table proc fields @> } proc; } symbol_table_info, *symbol_table_entry; @| symbol_table_info symbol_table_entry name proc @} @d Additional information about the name @{ union symbol_table_info *table; @| table @} The simplest thing we might record about a name is its size. @d Symbol table name fields @{int size; @| size @} Names in \OCCAM\ can denote integers and channels, and arrays of these. They can also be used as procedure names, and formal parameters to procedures. \verb|VAL| declarations introduce constant integers. A channel will be represented by a pointer to a block of storage holding information about the channel, such as whether a process is waiting on the channel. These blocks will also be held on the stack, and hence in the symbol table. Procedures will use displays (see Section~\ref{proc_calls}) to access non-local names and these will also be held on the stack. This discussion motivates the following definition. \label{td_name_type} @d Symbol table type typedef @{ typedef enum name_type { INT_T, CHAN_T , INT_ARRAY_T, CHAN_ARRAY_T, PROC_T, VAL_T, CHAN_BLOCK_T, DISPLAY_T } type; @| name_type type INT_T CHAN_T CHAN_BLOCK_T INT_ARRAY_T CHAN_ARRAY_T PROC_T VAL_T DISPLAY_T @} @d Symbol table shared fields @{type type; @| type @} The symbol table also keeps track of the location of a name. We associate a {\em nesting level} with each part of a program. The nesting level is initally one, and is incremented whenever we enter the scope of a procedure body and also when we enter the scope of a \verb|PAR|. @d Symbol table prototypes @{extern int nesting_level; @} @d Symbol table shared fields @{int nesting_level; @| nesting_level @} The current nesting level is held in a global variable of that name. @d Symbol table global variables @{int nesting_level; @| nesting_level @} @d Initialise symbol table globals @{nesting_level = 1;@} To access a name we need to know its offset in the stack (once we have used the nesting level to find the appropriate stack frame). The \verb|offset| field holds this value. Offsets are calculated from the base of the stack frame, {\em not} relative to the current stack pointer. @d Symbol table name fields @{int offset; @| offset @} We keep track of the current stack offset in the global variable \verb|stack_offset|. @d Symbol table prototypes @{extern int stack_offset; @} @d Symbol table global variables @{int stack_offset; @| stack_offset @} @d Initialise symbol table globals @{stack_offset = 1;@} What happens when we declare a name in a scope where the name already has a declaration? We add a new symbol table entry, but keep hold of the old one using the \verb|prev_defn| link. @d Symbol table shared fields @{union symbol_table_info *prev_defn; @} To add an entry to the symbol table for a name we must supply its size, type and stack offset. @d Symbol table prototypes @{ extern void insert_name_with_offset(name name, int offset, int size, type type); @} @d Symbol table functions @{ void insert_name_with_offset(name name, int offset, int size, type type) { symbol_table_entry s; s = new(symbol_table_info); s->name.prev_defn = name->table; name->table = s; s->name.nesting_level = nesting_level; s->name.size = size; s->name.type = type; @< Add \verb|name| to current scope @> s->name.offset = offset; } @| insert_name_with_offset @} \label{fn_insert_name_with_offset} It becomes tedious to have to explicitly manipulate the \verb|stack_offset| field and so we provide a convenient abbreviation. Stacks grow downwards in the interpreter, and so stack offsets are usually negative (remember the offset is relative to the base of the stack frame). If the \verb|type| is \verb|VAL_T| we treat the \verb|size| argument as the value of the name. The stack offset allocated for this name is returned as result. Note that if the object represented by the name occupies more than one stack slot, e.g. it is an array, then the offset returned will be to the beginning of the object (see Figure~\ref{fig:stack_offsets}). \begin{figure}[hbt] \centering %\BoxedEPSF{stackoffset.EPSF scaled 1000} \includegraphics{stackoffset.EPSF} \label{fig:stack_offsets} \caption{Effect of the call {\tt insert\_name($\ldots$, 2, $\ldots$)}} \end{figure} @d Symbol table prototypes @{ extern int insert_name(name name, int size, type type); @} @d Symbol table functions @{ int insert_name(name name, int size, type type) { symbol_table_entry s; if (stack_offset > 1) stack_offset = 1; /* In case we have inserted args */ if (type != VAL_T) stack_offset -= size; insert_name_with_offset(name, stack_offset, size, type); return stack_offset; } @| insert_name @} \label{fn_insert_name} Arguments to procedures will be accessed with positive offsets from the base of the stack frame (see Section~\ref{proc_calls}) and so we provide a variant of the \verb|insert_name| function that adds \verb|size| to \verb|stack_offset| at each call. In this case we know that \verb|type| is restricted by the language to \verb|INT_T| or \verb|CHAN_T|, and hence \verb|size| is always 1. @d Symbol table prototypes @{ extern int insert_arg(name name, type type); @} \label{fn_insert_arg} @d Symbol table functions @{ int insert_arg(name name, type type) { symbol_table_entry s; insert_name_with_offset(name, stack_offset, 1, type); return(stack_offset++); } @| insert_name @} We treat inserting a procedure specially. The important things we need to know about a procedure are its starting location in memory (a label -- see Section~\ref{codemanip}), the maximum amount of stack space it requires at runtime, and the types of its arguments. \label{td_arg} @d Symbol table typedef declarations @{ typedef struct arg {type type; struct arg *next; } *arg, *args; @| arg args @} @d Symbol table proc fields @{ int start_loc; int stack_required; struct arg *args; @| start_loc stack_required args @} @d Symbol table prototypes @{ extern void insert_proc(name name, int label, args a, int stack_required); @} @d Symbol table functions @{ void insert_proc(name name, int label, args a, int stack_required) { symbol_table_entry s; s = new(symbol_table_info); s->proc.nesting_level = nesting_level+1; s->proc.stack_required = stack_required; s->proc.type = PROC_T; s->proc.start_loc = label; s->proc.args = a; s->proc.prev_defn = name->table; name->table = s; @< Add \verb|name| to current scope @> } @| insert_proc @} \label{fn_insert_proc} Because we store different information for procedures and other names, we also need different lookup functions. We use the function \verb|lookup_name| to retrieve the information currently associated with a non-procedure name. If no entry exists for the name we just give an error and stop. If the name represents a procedure at this point in the program we also complain and stop. The only non-obvious feature is that we return 0 for the nesting depth if the nesting depth of the name is the same as the current nesting depth (i.e. the name represents a local variable). @d Symbol table prototypes @{ extern void lookup_name(name name, int *nesting_level, int *stack_offset, int *size, type *type); @} \label{fn_lookup_name} @d Symbol table functions @{ void lookup_name(name name, int *nld, int *so, int *sz, type *type) { extern int line_number; if (name->table == NULL) error("Syntax error", "Identifier %s has been used before it was declared.\n", name->name); else if (name->table->name.type == PROC_T) error("Syntax error", "You cannot use procedure name %s in this context.\n", name->name); else { if (nesting_level == name->table->name.nesting_level) *nld = 0; else *nld = name->table->name.nesting_level; *type = name->table->name.type; *sz = name->table->name.size; *so = name->table->name.offset; } } @| lookup_name @} Looking up a procedure name is similar; we just return different information. @d Symbol table prototypes @{ extern void lookup_proc(name name, int *nesting_level, int *start_label, args *a, int *stack_required); @} @d Symbol table functions @{ void lookup_proc(name name, int *nld, int *sl, args *a, int *sr) { extern int line_number; if (name->table == NULL) error("Syntax error", "Identifier %s has been used before it was declared.\n", name->name); else if (name->table->proc.type != PROC_T) error("Syntax error", "The name %s was encountered where a procedure was expected.\n", name->name); else { *nld = name->table->proc.nesting_level; *sl = name->table->proc.start_loc; *a = name->table->proc.args; *sr = name->table->proc.stack_required; } } @| lookup_proc @} \label{fn_lookup_proc} What happens when we leave the scope of a declaration? We need to remove the declaration entry from the symbol table. But how do we find those entries that have been added during the current scope? A simple solution is to link together all names that have been given declarations during the current scope. We can't just link the table entries themselves as otherwise when we remove an entry we won't have access to the the pointer in the name structure and so cannot update it to point to the old table entry. This is a pity, as otherwise we could just add another link to the symbol table structure which would avoid having to define the following structure. \label{td_scope_table_entry} @d Symbol table typedef declarations @{ typedef struct scope_table_entry { struct name_info *name; struct scope_table_entry *next; } scope_table_entry, *scope_list; @| scope_table_entry scope_list @} Scopes nest, and so we can use a stack to hold these lists. We keep track of the current scope level in the global variable \verb|scope_level|. @d Symbol table global variables @{int scope_level;@| scope_level @} @d Initialise symbol table globals @{scope_level = 1;@} For each scope level we keep track of the associated nesting level, the stack offset at the start of the scope. and the list of names declared during this scope. We have to decide how big to make this stack. We choose fifty, which should be more than adequate. @d Symbol table global variables @{ #define MAX_SCOPE_STACK 50 struct { scope_list scope_entries; int nesting_level; int stack_offset; } scope_stack[MAX_SCOPE_STACK+1]; @| scope_entries scope_stack @} @d Initialise symbol table globals @{ scope_stack[1].nesting_level = 1; scope_stack[1].stack_offset = 1; scope_stack[1].scope_entries = NULL; @} I'd like to give a picture here illustrating the scope stacks. However, I fear that it will just look like a plate of spaghetti and so I'll just rely on the textual description. There may be some confusion due to the use of scope levels and nesting levels. The nesting level changes whenever we change stack frame, either by spawing a new process via \verb|PAR| or by entering a procedure body. In the latter case the stack can change because the stack in use at the time the procedure is called may not be the same as the stack in effect when the procedure was declared. The scope level changes whenever we enter a new scope. Thus the nesting level will never exceed the scope level. An example may be useful here. Each variable declaration in the following program is indexed by three integers; the scope level, the nesting level and the offset. Some of the offsets might seem surprising. However, as explained in Section~\ref{proc_calls}, the interpreter stores some information, the display, on the stack when a procedure is called, or a process created. The size of this information depends on the nesting depth of the procedure. A process at nesting depth $n$ will have $n+1$ such entries. A procedure at this depth will have $n+2$ entries because the return \verb|PC| also needs to be saved. The offsets also assume that a channel block occupies just one stack slot (true in the interpreter, but possibly not in a compiler for this language) and that a channel is represented by a pointer to this block. Thus each channel created will take up two stack slots. \begin{quote}\small\tt CHAN C$_{1,1,-3}$:\\ VAL a$_{1,1,\_}$ IS 1:\\ PROC P(INT b$_{2,2,1}$)\\ \hbox{}~~PROC Q(INT c$_{4,3,1}$)\\ \hbox{}~~~~SEQ\\ \hbox{}~~~~~~INT d$_{6,3,-5}$ = a+b:\\ \hbox{}~~~~~~C ! (d+c)\\ \hbox{}~~~~~~INT e$_{6,3,-5}$ = a:\\ \hbox{}~~~~~~C ! (e+c)\\ \hbox{}~~:\\ \hbox{}~~Q(a)\\ :\\ INT a$_{1,1,-4}$ = 3:\\ PAR\\ \hbox{}~~P(a)\\ \hbox{}~~INT a$_{2,2,-3}$:\\ \hbox{}~~SEQ\\ \hbox{}~~~~C ? a\\ \hbox{}~~~~INT a$_{3,2,-4}$:\\ \hbox{}~~~~C ? a \end{quote} Although there is no choice in the allocation of nesting level, there is some choice in the allocation of scope level. In the above example we enter a new scope (and nesting level) when we start to process the arguments to a procedure. For convenience, we also enter a new scope when we process the procedure body. However, we could process the body in the same scope as the arguments if we wanted to. The choice is often influenced by the code generation strategy. When we add a new entry to the symbol table we must add the entry to the appropriate scope list. @d Add \verb|name| to current scope @{ { scope_list n; name->table->name.scope_level = scope_level; n = new(scope_table_entry); n->name = name; n->next = scope_stack[scope_level].scope_entries; scope_stack[scope_level].scope_entries = n; } @} Note that we store the scope level with the name. This is to help us print out the symbol table in an informative fashion when debugging, but it is not essential. @d Symbol table shared fields @{int scope_level; @| scope_level @} When we enter a new scope we must call the \verb|enter_scope| function. This takes as argument a boolean indicating whether a change in nesting level is associated with the change of scope. @d Symbol table prototypes @{extern void enter_scope(bool new_nesting_level); @} \label{fn_enter_scope} @d Symbol table functions @{ void enter_scope(bool new_nesting_level) { scope_stack[scope_level++].stack_offset = stack_offset; scope_stack[scope_level].scope_entries = NULL; if (new_nesting_level) { scope_stack[scope_level].nesting_level = ++nesting_level; stack_offset = 1; } else scope_stack[scope_level].nesting_level = nesting_level; } @| enter_scope @} Note that the stack offset is set to 1 initially. We assume that the parser will set the stack offset to take into account the display at the current nesting level. We could do this in the \verb|enter_scope| function, but if we are entering the scope of a procedure with arguments, then we want to start with a stack offset of 1. The arguments are then loaded into the symbol table with positive offsets. Once all the arguments are loaded we should then set the stack offset to \verb|-nesting_level| (assuming we are using displays), by inserting a dummy name of size \verb|nesting_level+1| into the table (of type \verb|DISPLAY_T|). When we exit a scope we call the \verb|exit_scope| function. This removes the current symbol table entries for names introduced during this scope. If a name had a definition before this scope then that definition will be reinstated when we leave the scope (remember we saved the old entries using the \verb|prev_defn| field). We return the amount of stack space occupied by the freed names as result. Channels complicate the situation. If it wasn't for these we could just generate code to do one big pop of all the stack space occupied by names in the current scope. However, because we have to pop channel blocks explicitly, we must process the popped names one at a time. When \verb|exit_scope| is called it pops one name. If this is a value or a procedure then \verb|exit_scope| is called recursively. If a channel block is encountered then the size is returned as a negative value. Otherwise the size of the name is returned. Thus the caller has a simple mechanism for distinguishing between popped channel blocks and other values. After repeated calls to \verb|exit_scope| we will eventually reach the stage where there are no names left in the current scope. At this point the previous scope is reinstated and 0 returned to indicate that we have exited from the scope. @d Symbol table prototypes @{extern int exit_scope(); @} \label{fn_exit_scope} @D Symbol table functions @{ int exit_scope() { int size; scope_list entry = scope_stack[scope_level].scope_entries; if (entry == NULL) { --scope_level; nesting_level = scope_stack[scope_level].nesting_level; stack_offset = scope_stack[scope_level].stack_offset; return 0; } else { name name = entry->name; union symbol_table_info *table = name->table; scope_stack[scope_level].scope_entries = entry->next; name->table = name->table->name.prev_defn; switch (table->name.type) { case VAL_T: size = exit_scope(); break; case PROC_T: { args o, args = table->proc.args; while (args != NULL) { o = args; args = args->next; dispose(o); } size = exit_scope(); break; } case CHAN_BLOCK_T: size = -table->name.size; break; default: size = table->name.size; if (size == 0) size = exit_scope(); break; } dispose(table); dispose(entry); return size; } } @| exit_scope @} The stack is used to hold values without an associated name. Examples include the display, channel blocks, saved program counters and counters for replicated constructs. We could just manipulate the stack offset directly when we need to. However, it seems neater to introduce dummy names for these objects so that a dump of the symbol table (see function \verb|print_symbol_table| below) produces more informative output when debugging. We prefix these names with a space to avoid possible conflicts with `real' names. @d Symbol table prototypes @{extern name display, pc, chanblock, repl; @} @d Symbol table global variables @{name display, pc, chanblock, repl; @| display PC chanblock, repl@} @d Initialise symbol table globals @{display=find_name(" display"); pc=find_name(" PC"); chanblock=find_name(" chanblock"); repl=find_name(" repl");@} It's sometimes useful, for debugging purposes, to print out the current state of the symbol table. Here is a function that does just that. If there are two declarations with the same name in the same scope then only the details of the later one will be reported, but twice. It would be messy to detect this case, and it's not too important, so we don't bother. @d Symbol table prototypes @{extern void print_symbol_table(); @} \label{fn_print_symbol_table} @D Symbol table functions @{ void print_symbol_table() { int sl = scope_level; scope_list entry; symbol_table_entry tabent; printf("Symbol table entries...\n"); for (sl = scope_level; sl >= 1; sl--) { entry = scope_stack[sl].scope_entries; if (entry == NULL) continue; printf("%2d: nesting level = %d.\n", sl, scope_stack[sl].nesting_level); while (entry != NULL) { printf(" "); tabent = entry->name->table; while (tabent->name.scope_level != sl) tabent = tabent->name.prev_defn; switch (tabent->name.type) { case INT_T : printf("INT "); break; case CHAN_T : printf("CHAN "); break; case CHAN_BLOCK_T: printf("CHAN BLOCK"); break; case PROC_T : printf("PROC "); break; case DISPLAY_T : printf("DISPLAY "); break; case INT_ARRAY_T : printf("[%d]INT ", entry->name->table->name.size); break; case CHAN_ARRAY_T: printf("[%d]CHAN ", entry->name->table->name.size); break; } printf("%s: nesting level = %d, ", entry->name->name, tabent->name.nesting_level); if (tabent->proc.type == PROC_T) printf("start label = %d\n", tabent->proc.start_loc); else printf("offset = %d\n", tabent->name.offset); entry = entry->next; } } } @| print_symbol_table @} Names are never discarded from the hash table. This makes it difficult to detect storage leaks in other parts of the program. The following function can be called at the end of compilation to return this storage. It's not essential that it is called as the storage will be returned anyway when the program terminates. However, it might be useful for debugging purposes. @d Symbol table prototypes @{extern void clear_symbol_table(); @} \label{fn_clear_symbol_table} @D Symbol table functions @{ void clear_symbol_table() { int i; for (i=0; i < hash_size; i++) { name p = hash[i]; name o; while (p != NULL) { if (p->table != NULL) error("Internal error", "You should leave the outermost scope before calling " "clear_symbol_table.\n" ); o = p; p=p->hash_link; free(o->name); dispose(o); } hash[i] = NULL; } } @| clear_symbol_table @} Finally, here are the files for the symbol table. @O code/symbol_table.h @{@< Copyright message @> #ifndef _SYMBOL_TABLE_H #define _SYMBOL_TABLE_H @< Symbol table type typedef @> @< Symbol table typedef declarations @> @< Symbol table prototypes @> extern void initialise_symbol_table(); #endif @} @d Occam object files @{ symbol_table.o @} @d Makefile dependencies @{ symbol_table.c: occam.h pool.h symbol_table.h @} @O code/symbol_table.c @{@< Standard \OCCAM\ includes @> #include #include "occam.h" #include "pool.h" #include "symbol_table.h" @< Symbol table global variables @> @< Symbol table functions @> void initialise_symbol_table() { @< Initialise symbol table globals @> } @| initialise_symbol_table @} % Following code defines table of functions for preceeding chapter. \newpage \begin{table}[tp] \begin{center} \begin{tabular}{||l|l|l|r||} \hline Function name&Return value type&Argument types&Page No.\\ \hline extend\_pool&void *&int&\pageref{fn_extend_pool}\\ pool\_statistics&void&void&\pageref{fn_pool_statistics}\\ find\_name&name&char *&\pageref{fn_find_name}\\ insert\_name\_with\_offset&void&name, int, int, type&\pageref{fn_insert_name_with_offset}\\ insert\_name&int&name, int, type&\pageref{fn_insert_name}\\ insert\_arg&int&name, type&\pageref{fn_insert_arg}\\ insert\_proc&void&name, int, args, int&\pageref{fn_insert_proc}\\ lookup\_name&void&name, int *, int *, int *, type *&\pageref{fn_lookup_name}\\ lookup\_proc&void&name, int *, int *, args *, int *&\pageref{fn_lookup_proc}\\ enter\_scope&void&bool&\pageref{fn_enter_scope}\\ exit\_scope&int&void&\pageref{fn_exit_scope}\\ print\_symbol\_table&void&void&\pageref{fn_print_symbol_table}\\ clear\_symbol\_table&void&void&\pageref{fn_clear_symbol_table}\\ \hline \end{tabular} \bigskip This table shows all the functions defined in the preceeding chapter, their argument and return parameter types and the page on which the function is defined. \end{center} \end{table} %\vspace{50mm}