% © K.Mitchell 1995 \chapter{The \OCCAM\ Abstract Machine} \newcommand{\inst}[1]{{\large\bf\tt #1}} The \OCCAM\ abstract machine is basically a simple stack machine extended with instructions to support processes and communication. We start by describing the sequential aspects of the machine and then deal with the complications introduced by concurrency. The machine implementation errs on the side of clarity rather than efficiency. {\bf Note:} The machine doesn't make many safety checks. It is up to the code generator to ensure that stacks don't overflow, jumps are in range etc. \section{The Sequential Machine} The machine uses a stack to hold the program variables and intermediate values. The code for the program is stored in a separate array. Two machine registers, {\tt SP} and {\tt PC} are used to index into these components, as illustrated in Figure~\ref{fig:seqmachine}. \begin{figure}[hbt] \centering \includegraphics{stack.EPSF} \label{fig:seqmachine} \caption{The sequential machine state} \end{figure} An instruction consists of two components. The {\tt op} field determines the instruction type. The {\tt arg} field contains an argument for the operation. Only some of the operations take arguments. The value in the {\tt arg} field is ignored in the other cases. We assume that integers occupy thirty-two bits. It turns out to be convenient if instructions can occupy the same amount of space as ints, and so we restrict the size of the argument field to twenty-four bits. This should be more than adequate for our needs. Table~\ref{tab:instructions} on page~\pageref{tab:instructions} lists the complete instruction set of the abstract machine. \label{td_opcode} @d Machine opcodes {\tt typedef} @{ typedef enum opcode { @< Machine opcodes @> } opcode; @| opcode @} \label{td_instruction} @d Machine {\tt typedef}s @{ typedef struct instruction { opcode op : 8; int arg : 24; } instruction; @| instruction op arg @} \begin{table} \begin{center} \begin{tabular}{||l|l|c||} \hline Mnemonic & Operation Performed & Page \\ \hline {\tt Ldc}&Load constant value onto top of stack&\pageref{op_Ldc}\\ {\tt Ldo}&Load existing stack element onto top of stack&\pageref{op_Ldo}\\ {\tt Lda}&Load address of element onto top of stack&\pageref{op_Lda}\\ {\tt Ind}&Perform indirections to retrieve data from stack&\pageref{op_Ind}\\ {\tt Sto}&Write top value to stack location with constant address&\pageref{op_Sto}\\ {\tt Sta}&Write top value to stack location with variable address&\pageref{op_Sta}\\ {\tt Ccb}&Create new channel&\pageref{op_Ccb}\\ {\tt Dcb}&Dispose of channel control block&\pageref{op_Dcb}\\ {\tt Swap}&Swap topmost two elements on stack&\pageref{op_Swap}\\ {\tt Pop}&Discard element from top of stack&\pageref{op_Pop}\\ {\tt Ldn}&Reserve stack space for array&\pageref{op_Ldn}\\ {\tt Idx}&Access element of an array&\pageref{op_Idx}\\ {\tt Ida}&Calculate address of array element&\pageref{op_Ida}\\ \hline {\tt Add}&Add two numbers&\pageref{op_Add}\\ {\tt Sub}&Subtract one number from another&\pageref{op_Sub}\\ {\tt Mul}&Multiply two numbers&\pageref{op_Mul}\\ {\tt Div}&Divide one number by another&\pageref{op_Div}\\ {\tt Mod}&Modulus operator - find remainder of division&\pageref{op_Mod}\\ {\tt Lt}&Less than relational comparison&\pageref{op_Lt}\\ {\tt Leq}&Less than or equal to relational comparison&\pageref{op_Leq}\\ {\tt Gt}&Greater than relational comparison&\pageref{op_Gt}\\ {\tt Geq}&Greater than or equal to relational comparison&\pageref{op_Geq}\\ {\tt Eq}&Equal to relational comparison&\pageref{op_Eq}\\ {\tt Neq}&Not equal relational comparison&\pageref{op_Neq}\\ \hline {\tt Ujp}&Unconditional jump instruction&\pageref{op_Ujp}\\ {\tt Fjp}&Conditional jump instruction&\pageref{op_Fjp}\\ \hline {\tt Enter}&Setup display for procedure call&\pageref{op_Enter}\\ {\tt Leave}&Restore previous activation record&\pageref{op_Leave}\\ {\tt Call}&Transfer control to procedure&\pageref{op_Call}\\ {\tt Ret}&Return from procedure&\pageref{op_Ret}\\ \hline {\tt Spawn}&Spawn new process&\pageref{op_Spawn}\\ {\tt Spawnv}&Spawn new process with specified initial value&\pageref{op_Spawnv}\\ {\tt Stop}&Terminate current process&\pageref{op_Stop}\\ {\tt Wait}&Wait until all child processes have terminated before continuing&\pageref{op_Wait}\\ \hline {\tt Out}&Output a value onto channel&\pageref{op_Output}\\ {\tt In}&Take input value from channel&\pageref{op_Input}\\ {\tt Alt}&Choose value from set of alternatives&\pageref{op_Alt}\\ \hline {\tt Input}&Read integer from stdin and push onto stack&\pageref{op_Input}\\ {\tt Output}&Pop integer from stack and write to stdout&\pageref{op_Output}\\ \hline {\tt Trace}&Switch on/off machine state tracing in the debugger&\pageref{op_Trace}\\ \hline {\tt Lab}&Label pseudo-instruction&\pageref{op_Lab}\\ \hline \end{tabular} \label{tab:instructions} \caption{Instruction set for the {\OCCAM} abstract machine.} \end{center} \end{table} The instructions to be executed by the machine are held in an array of instructions called \verb|Code|. The program counter, \verb|PC|, indexes into this array. @d Machine state @{register int PC; /* Indexes into Code array */ @| PC @} Each stack element is thirty-two bits long. Most of the time the stack is used to hold integers. However, occasionally we have to store other kinds of value on the stack. These will be introduced as the need arises. We therefore use a {\tt union} to avoid lots of arbitrary casts. The variable \verb|BP| points into this stack; initially it points to the end of the stack, but it is changed whenever a procedure call is made (see Section~\ref{proc_calls}). The stack pointer, \verb|SP|, points to the top of the stack. Stacks grow downwards, and so usually \verb|SP| $\leq$ \verb|BP| except when the stack is empty. \label{td_stack_slot} @d Machine {\tt typedef}s @{ typedef union stack_slot { int as_int; @< Other \verb|stack_slot| options @> } stack_slot; @| stack_slot as_int @} @d Machine state @{stack_slot *BP = NULL, *SP = NULL; @| SP BP @} \subsection{Load and Store Instructions} A variety of instructions are provided to load values onto the stack, and to store values. \label{op_Ldc} The \inst{Ldc}\ instruction takes a constant as argument and pushes it onto the top of the stack. @d Machine opcodes @{ Ldc = 1, @| Ldc @} @d Execute instruction \verb|curr_inst| @{ case Ldc : { (--SP)->as_int = curr_inst.arg; break; } @} @d Stack effect of executing instruction \verb|curr_inst| @{ case Ldc : return(1); @} \label{op_Ldo} To access existing stack elements we use the \inst{Ldo}\ instruction. Its argument is the stack offset (relative to \verb|BP|). @d Machine opcodes @{ Ldo, @| Ldo @} @d Execute instruction \verb|curr_inst| @{ case Ldo : { (--SP)->as_int = BP[curr_inst.arg].as_int; break; } @} @d Stack effect of executing instruction \verb|curr_inst| @{ case Ldo : return(1); @} To access the {\em address} of a stack element we use the \label{op_Lda}\inst{Lda}\ instruction. @d Machine opcodes @{ Lda, @| Lda @} We extend the kinds of values that can be stored on the stack to include addresses. @d Other \verb|stack_slot| options @{ union stack_slot *as_addr; @| as_addr @} @d Execute instruction \verb|curr_inst| @{ case Lda : { (--SP)->as_addr = &BP[curr_inst.arg]; break; } @} @d Stack effect of executing instruction \verb|curr_inst| @{ case Lda : return(1); @} We can perform repeated indirections via an address on the top of the stack using the \label{op_Ind}\inst{Ind}\ instruction. The number of indirections to be performed is passed as the argument to the instruction. @d Machine opcodes @{ Ind, @| Ind @} @d Execute instruction \verb|curr_inst| @{ case Ind : { int n = curr_inst.arg; while (n--) *SP = *(SP->as_addr); break; } @} @d Stack effect of executing instruction \verb|curr_inst| @{ case Ind : return(0); @} We can overwrite an arbitrary stack location with the contents of the top of stack using the \label{op_Sto}\inst{Sto}\ instruction. The stack top is popped as a by-product of this operation. @d Machine opcodes @{ Sto, @| Sto @} @d Execute instruction \verb|curr_inst| @{ case Sto : { BP[curr_inst.arg] = *(SP++); break; } @} @d Stack effect of executing instruction \verb|curr_inst| @{ case Sto : return(-1); @} Sometimes we don't know the location to write to at compile time (e.g. an array access) and so we provide a variant, called \label{op_Sta}\inst{Sta}, that takes the destination address off the stack. The value is on the top of the stack, with the destination below it. @d Machine opcodes @{ Sta, @| Sta @} @d Execute instruction \verb|curr_inst| @{ case Sta : { *((SP+1)->as_addr) = *SP; SP += 2; break; } @} @d Stack effect of executing instruction \verb|curr_inst| @{ case Sta : return(-2); @} To create a new channel we use the \label{op_Ccb}\inst{Ccb}\ instruction. Channels in the interpreter occupy two stack slots. The first one, the channel control block, contains a pointer to a waiting process, or \verb|NULL| if no process is blocked. The second slot contains the address of the channel control block. The argument to the instruction contains the number of channels to create. If the argument is $> 1$ then all the channel control blocks are allocated first, followed by the pointers to these blocks. This allows channel arrays to be indexed like integer arrays. @d Machine opcodes @{ Ccb, @| Ccb @} @d Execute instruction \verb|curr_inst| @{ case Ccb : { int i = curr_inst.arg; stack_slot *start; while (i-- > 0) (--SP)->as_addr = NULL; i = curr_inst.arg; start = SP; while (i-- > 0) (--SP)->as_addr = start++; break; } @} @d Stack effect of executing instruction \verb|curr_inst| @{ case Ccb : return(2*curr_inst.arg); @} We also provide an instruction to dispose of the channel control blocks. This is not really necessary in the interpreter as we could just pop the stack. However, in the runtime system for a compiler we may have to tidy up system data associated with a channel control block (e.g. mutexes) and so we make channel block destruction explicit. The number of channel control blocks to dispose of is passed as argument to the instruction. Note: the \verb|Ccb| instruction creates the channel control blocks {\em and} the pointers to them. The \label{op_Dcb}\inst{Dcb}\ instruction just disposes of the channel control blocks. @d Machine opcodes @{ Dcb, @| Dcb @} @d Execute instruction \verb|curr_inst| @{ case Dcb: { SP += curr_inst.arg; break; } @} @d Stack effect of executing instruction \verb|curr_inst| @{ case Dcb : return(-curr_inst.arg); @} It is sometimes difficult to produce the arguments to some of these instructions in the correct order. The \label{op_Swap}\inst{Swap}\ instruction allows us to exchange the top two elements on the stack. @d Machine opcodes @{ Swap, @| Swap @} @d Execute instruction \verb|curr_inst| @{ case Swap : { stack_slot s = *SP; *SP = *(SP+1); *(SP+1) = s; break; } @} @d Stack effect of executing instruction \verb|curr_inst| @{ case Swap : return(0); @} We can also discard elements off the top of the stack using the \label{op_Pop}\inst{Pop}\ instruction. @d Machine opcodes @{ Pop, @| Pop @} @d Execute instruction \verb|curr_inst| @{ case Pop : { SP += curr_inst.arg; break; } @} @d Stack effect of executing instruction \verb|curr_inst| @{ case Pop : return(-curr_inst.arg); @} Arrays are stored on the stack just like any other values. To reserve $n$ slots on the stack, each initialised to 0, the \label{op_Ldn}\inst{Ldn}\ instruction is used. @d Machine opcodes @{ Ldn, @| Ldn @} @d Execute instruction \verb|curr_inst| @{ case Ldn : { int i = curr_inst.arg; while (i-- > 0) (--SP)->as_int = 0; break; } @} @d Stack effect of executing instruction \verb|curr_inst| @{ case Ldn : return(curr_inst.arg); @} To allow array elements to be accessed the machine provides an index instruction, \label{op_Idx}\inst{Idx}. Both the array base and index may be unknown at compile time and so these arguments are passed on the stack rather than one or other of them being arguments of the instruction itself. The index is on the top of the stack, and the base one down. The base is represented by an address rather than as a stack offset. The \verb|Lda| instruction is useful for setting up an appropriate base. @d Machine opcodes @{ Idx, @| Idx @} @d Execute instruction \verb|curr_inst| @{ case Idx : { *(SP+1) = ((SP+1)->as_addr)[SP->as_int]; SP++; break; } @} @d Stack effect of executing instruction \verb|curr_inst| @{ case Idx : return(-1); @} To get the address of an indexed expression, rather than its value, use the \label{op_Ida}\inst{Ida}\ instruction. @d Machine opcodes @{ Ida, @| Ida @} @d Execute instruction \verb|curr_inst| @{ case Ida : { (SP+1)->as_addr += SP->as_int; SP++; break; } @} @d Stack effect of executing instruction \verb|curr_inst| @{ case Ida : return(-1); @} \subsection{Arithmetic and Relational Instructions} A variety of arithmetic and relational operations are provided. These act on the top element(s) of the stack. In the case of binary operators be careful of the argument order; the left argument of the operator is one down in the stack and the right argument is at the stack top. The argument component of each of these instructions is ignored. \label{op_Add}\label{op_Sub}\label{op_Mul}\label{op_Div}\label{op_Mod} @d Machine opcodes @{ Add, Sub, Mul, Div, Mod, @| Add Sub Mul Div Mod @} @d Execute instruction \verb|curr_inst| @{ case Add : { (SP+1)->as_int += SP->as_int; SP++; break; } case Sub : { (SP+1)->as_int -= SP->as_int; SP++; break; } case Mul : { (SP+1)->as_int *= SP->as_int; SP++; break; } case Div : { (SP+1)->as_int /= SP->as_int; SP++; break; } case Mod : { (SP+1)->as_int %= SP->as_int; SP++; break; } @} @d Stack effect of executing instruction \verb|curr_inst| @{ case Add : case Sub: case Mul: case Div: case Mod: return(-1); @} The relational operators treat zero as representing false, and anything else as representing true. \label{op_Lt}\label{op_Leq}\label{op_Gt}\label{op_Geq} \label{op_Eq}\label{op_Neq} @d Machine opcodes @{ Lt, Leq, Gt, Geq, Eq, Neq, @| Lt Leq Gt Geq Eq Neq @} @d Execute instruction \verb|curr_inst| @{ case Lt : { (SP+1)->as_int = (SP+1)->as_int < SP->as_int; SP++; break; } case Leq : { (SP+1)->as_int = (SP+1)->as_int <= SP->as_int; SP++; break; } case Gt : { (SP+1)->as_int = (SP+1)->as_int > SP->as_int; SP++; break; } case Geq : { (SP+1)->as_int = (SP+1)->as_int >= SP->as_int; SP++; break; } case Eq : { (SP+1)->as_int = (SP+1)->as_int == SP->as_int; SP++; break; } case Neq : { (SP+1)->as_int = (SP+1)->as_int != SP->as_int; SP++; break; } @} @d Stack effect of executing instruction \verb|curr_inst| @{ case Lt : case Leq: case Gt: case Geq: case Eq: case Neq: return(-1); @} \subsection{Control Flow Instructions} In the sequential fragment of the language there are two instructions for altering the control flow. The \label{op_Ujp}\inst{Ujp}\ instruction is used to perform unconditional jumps. The argument indicates the new value of the program counter. @d Machine opcodes @{ Ujp, @| Ujp @} @d Execute instruction \verb|curr_inst| @{ case Ujp : { PC = curr_inst.arg; break; } @} @d Stack effect of executing instruction \verb|curr_inst| @{ case Ujp : return(0); @} We can also perform a conditional jump using the \label{op_Fjp}\inst{Fjp}\ instruction. The jump is only performed if the top of stack contains the value {\tt false}, i.e. is zero. The top of stack is popped by this instruction. @d Machine opcodes @{ Fjp, @| Fjp @} @d Execute instruction \verb|curr_inst| @{ case Fjp : { if (!(SP++)->as_int) PC = curr_inst.arg; break; } @} @d Stack effect of executing instruction \verb|curr_inst| @{ case Fjp : return(-1); @} \section{Procedure Calls} \label{proc_calls} The space occupied on the stack by each procedure invocation is called its {\em activation record}. This includes information such as the procedure arguments, the return address and the space required for declarations introduced in the procedure body. If we maintain a pointer, \verb|BP|, to a fixed point in this activation record then the procedure can access the fields of the record via offsets from this pointer. Whenever we make a new procedure call, and hence a new activation record, we must adjust \verb|BP| and \verb|SP| accordingly. Procedures in \OCCAM\ are statically scoped, i.e. if the procedure body refers to a name not declared in the procedure then this {\em free} name is looked up in the environment that existed at the time the procedure was declared, {\em not} the environment that existed at the time the procedure was called. Thus \OCCAM\ is similar to languages such as Pascal and, to some extent, C. Procedures can be nested, but they cannot be passed as arguments to other procedures. To allow access to free names we could either use {\em access links}, or {\em displays} (see the ``Dragon'' book for more details). We use displays in this interpreter. The general idea is quite simple. If we are currently executing the body of a procedure at nesting depth $n$, then then code can access the activation record at nesting depth $i < n$, and hence the local variables at this depth, by following the pointer \verb|BP[|$-i$\verb|]|. Remember that stacks grow downwards, hence the negative offset. Before calling a procedure we must set up a display appropriate for the nesting depth of the new procedure. The \label{op_Enter}\inst{Enter}\ instruction does this. It takes as argument the new nesting depth. @d Machine opcodes @{ Enter, @| Enter @} @d Execute instruction \verb|curr_inst| @{ case Enter : initialise_display(curr_inst.arg, &SP, &BP); break; @} \label{fn_initialise_display} @d Machine utility functions @{ void initialise_display(int nesting_level, stack_slot **the_SP, stack_slot **the_BP) { stack_slot *SP = *the_SP, *BP = *the_BP; (--SP)->as_addr = BP; *the_BP = SP; while (--nesting_level > 0) *(--SP) = *(--BP); (--SP)->as_addr = *the_BP; *the_SP = SP; } @| initialise_display @} @d Stack effect of executing instruction \verb|curr_inst| @{ case Enter : return(curr_inst.arg+1); @} The \verb|Enter| instruction updates the \verb|BP| and \verb|SP| registers to point to the new display, but the old values are saved on the stack. The previous activation record can be restored using the \label{op_Leave}\inst{Leave}\ instruction. @d Machine opcodes @{ Leave, @| Leave @} @d Execute instruction \verb|curr_inst| @{ case Leave : { SP = BP; BP = (SP++)->as_addr; break; } @} The argument to \verb|Leave| is the nesting level of the called procedure. It is ignored when evaluating the instruction, but allows the stack effect of the instruction to be easily computed. @d Stack effect of executing instruction \verb|curr_inst| @{ case Leave : return(-(curr_inst.arg+2)); @} To transfer control to a procedure, after the display has been set up, we use the \label{op_Call}\inst{Call}\ instruction which saves the current PC on the stack and then jumps to the start of the procedure. @d Machine opcodes @{ Call, @| Call @} @d Execute instruction \verb|curr_inst| @{ case Call : { (--SP)->as_int = PC; PC = curr_inst.arg; break; } @} @d Stack effect of executing instruction \verb|curr_inst| @{ case Call : return(1); @} To return we need to pop the stack, if necessary, until the return address is on top, and then execute the \label{op_Ret}\inst{Ret}\ instruction. @d Machine opcodes @{ Ret, @| Ret @} @d Execute instruction \verb|curr_inst| @{ case Ret : { PC = (SP++)->as_int; break; } @} The stack effect might seem a bit odd. However, the effect is from the view of the procedure currently executing, not the procedure being returned to. @d Stack effect of executing instruction \verb|curr_inst| @{ case Ret : return(0); @} Arguments to a procedure should be placed on the stack {\em before} the \verb|Enter| instruction is evaluated. They can then be accessed by the called procedure using positive offsets from \verb|BP|. Thus a typical calling sequence looks like \begin{quote} Evaluate $m$ arguments Enter $n$ Call $L$ /* Call procedure at nesting depth $n$ */ Leave $n$ Pop $m$ \end{quote} The stack during the execution of the procedure is illustrated in Figure~\ref{fig:display}. \begin{figure}[hbt] \centering %\BoxedEPSF{display.EPSF scaled 1000} \includegraphics{display.EPSF} \label{fig:display} \caption{A typical activation record at nesting level $n$} \end{figure} \section{The Parallel Machine} A process may spawn many subprocesses. Each of these must be allocated its own stack, otherwise chaos reigns. However, each subprocess must be able to access (but not update) variables in its parent's stack if they are in scope. We must therefore provide a mechanism for accessing the stacks of our ancestors. We can use the display mechanism for this purpose. We treat the spawning of a process as a degenerate kind of procedure call. The display at the beginning of the new stack contains a pointer to the activation record of the spawner. However, unlike a procedure call, we don't need to save the PC as processes don't return to the caller. The state of the abstract machine, from the viewpoint of a single process, now looks like Figure~\ref{fig:parmachine}. \begin{figure}[hbt] \centering %\BoxedEPSF{child.EPSF scaled 1000} \includegraphics{child.EPSF} \caption{The parallel machine state} \label{fig:parmachine} \end{figure} In order to create a process we must specify the stack size to be used for the new process, and the initial value of its program counter. Note that all processes share the same code array. The compiler needs to calculate the maximum stack size required by the process (including the display at the base of the stack). The language is constrained enough that the maximum can be calculated at compile time. The \label{op_Spawn}\inst{Spawn}\ instruction can be used to spawn a new process. This creates the data structures required by the new process and schedules it for execution. The process counter is supplied as an argument to the instruction and the stack size is passed on the top of the stack. The nesting level of the new process is supplied as the next element on the stack. @d Machine opcodes @{ Spawn, @| Spawn @} @d Execute instruction \verb|curr_inst| @{ case Spawn : { process p = create_process(curr_inst.arg, (SP++)->as_int); p->BP = BP; initialise_display((SP++)->as_int, &p->SP, &p->BP); @< Schedule process \verb|p| for execution @> break; } @} @d Stack effect of executing instruction \verb|curr_inst| @{ case Spawn : return(-2); @} Sometimes it is useful to be able to pass an initial value to the spawned process (e.g. in the case of a replicated \verb|PAR|). The \label{op_Spawnv}\inst{Spawnv}\ instruction allows you to do this. @d Machine opcodes @{ Spawnv, @| Spawnv @} @d Execute instruction \verb|curr_inst| @{ case Spawnv: { process p = create_process(curr_inst.arg, (SP++)->as_int); p->BP = BP; initialise_display((SP++)->as_int, &p->SP, &p->BP); *(--(p->SP)) = *(SP++); @< Schedule process \verb|p| for execution @> break; } @} @d Stack effect of executing instruction \verb|curr_inst| @{ case Spawnv : return(-3); @} To halt a process we use the \label{op_Stop}\inst{Stop}\ instruction. This terminates the current process and schedules the next process currently awaiting execution. If all other processes have terminated, or are blocked, then the interpreter Stops. @d Machine opcodes @{ Stop, @| Stop @} @d Execute instruction \verb|curr_inst| @{ case Stop: { @< Terminate current process @> @< Schedule next process or \verb|return| if none @> } @} @d Stack effect of executing instruction \verb|curr_inst| @{ case Stop : return(0); @} When we spawn processes in a \verb|PAR| construct we must wait until the spawned processes have terminated before we can proceed further. The \label{op_Wait}\inst{Wait}\ instruction performs this task. @d Machine opcodes @{ Wait, @| Wait @} @d Execute instruction \verb|curr_inst| @{ case Wait: { @< Suspend if children and schedule next process @> } @} @d Stack effect of executing instruction \verb|curr_inst| @{ case Wait : return(0); @} \subsection{Interprocess Communication} \OCCAM\ treats input and output asymmetrically. Inputs can appear in alternatives, but outputs cannot. Although this is often a pain from the programmer's viewpoint, it simplifies the implementation somewhat. The \label{op_Out}\inst{Out}\ instruction tries to output a value on a channel, and suspends the current process if no partner is waiting for input on this channel. The address of the channel to use for output is assumed to be on the top of the stack with the value to be transmitted just below it. Both of these are popped when the instruction completes. @d Machine opcodes @{ Out, @| Out @} @d Execute instruction \verb|curr_inst| @{ case Out : { @< Communicate with waiting input; suspend process if none @>; break; } @} @d Stack effect of executing instruction \verb|curr_inst| @{ case Out : return(-2); @} For simplicity, we treat a single input on a channel the same as an \verb|ALT| with one alternative. To process an \verb|ALT| we first load all the alternatives on the stack using the \label{op_In}\inst{In}\ instruction. This takes as argument the instruction to jump to if the input is successful. The top of the stack is assumed to contain the address of the channel to use for the input. The instruction just places itself on the stack for use by the \verb|Alt| instruction. @d Other \verb|stack_slot| options @{ instruction as_inst; @| as_inst @} @d Machine opcodes @{ In, @| In @} @d Execute instruction \verb|curr_inst| @{ case In : { (--SP)->as_inst = curr_inst; break; } @} @d Stack effect of executing instruction \verb|curr_inst| @{ case In : return(1); @} The argument of the \label{op_Alt}\inst{Alt}\ instruction indicates how many alternatives there are. It assumes that the stack has been set up appropriately using \verb|curr_inst.arg| input instructions. It looks for a matching partner for one of the alteratives. If it finds one then all the alternatives are popped from the stack and the index of the chosen alternative and the received value then pushed in their place. Alternatives are numbered from 0. The program counter is set to the location indicated by the successful alternative. The partner's stack is tidied up and then scheduled for execution. @d Machine opcodes @{ Alt, @| Alt @} @d Execute instruction \verb|curr_inst| @{ case Alt : { @< Execute \verb|Alt| instruction @>; break; } @} @d Stack effect of executing instruction \verb|curr_inst| @{ case Alt : return(1); @} \section{Input/Output} The interpreter provides two instructions for the input and output of integers. The user program doesn't have direct access to these, and accesses them via the channels \verb|stdin| and \verb|stdout|. The \label{op_Input}\inst{Input}\ instruction reads an integer from stdin and pushes the result on the stack, or -1 if EOF. We use a non-blocking read if the input is connected to the terminal so the system will not hang while waiting for input. If there is no input currently available we reset the PC to point to this instruction again and reschedule the process. @d Machine opcodes @{ Input, @| Input @} @d Machine globals @{ char input_buffer[80]; /* Input characters are buffered here. */ int bufptr = 0; /* A pointer into the above buffer. */ int eof_flag = FALSE; /* Have we exhausted the input yet? */ @| input_buffer bufptr eof_flag @} @d Execute instruction \verb|curr_inst| @{ case Input : { int i; char c; if (active_process->next == active_process && BP[-1].as_addr[-2].as_addr == NULL) return; /* Only process running and no-one waiting for input */ if (eof_flag) { (--SP)->as_int = -1; break; } i = read(input_fd, &c, 1); if (i == 0 && input_buffered) { i++; c = '\04'; } if (i == 1) { if (c == '\04') eof_flag = TRUE; if (isdigit(c) || (c == '-' && bufptr == 0)) input_buffer[bufptr++] = c; else if (bufptr > 0) { input_buffer[bufptr] = 0; (--SP)->as_int = atoi(input_buffer); bufptr = 0; break; } } PC--; goto save_state; } @} @d Stack effect of executing instruction \verb|curr_inst| @{ case Input : return(1); @} The \label{op_Output}\inst{Output}\ instruction pops an integer from the stack and outputs it on stdout followed be a newline. @d Machine opcodes @{ Output, @| Output @} @d Execute instruction \verb|curr_inst| @{ case Output : { printf("==> %d\n", (SP++)->as_int); break; } @} @d Stack effect of executing instruction \verb|curr_inst| @{ case Output : return(-1); @} Before the interpreter starts to run a user's program, it creates the initial display and two channels for \verb|stdin| and \verb|stdout|. A user program should therefore be compiled as if it were prefixed with the declaration \begin{quotation}\tt\noindent CHAN stdin:\\ CHAN stdout:\\ $\ldots$ \end{quotation} The following sequence of statements can perform this task: \begin{quote}\begin{verbatim} insert_name (display, 2, DISPLAY_T); insert_name (chanblock, CHANNEL_BLOCK_SIZE, CHAN_BLOCK_T); insert_name (find_name("stdin" ), 1, CHAN_T); insert_name (chanblock, CHANNEL_BLOCK_SIZE, CHAN_BLOCK_T); insert_name (find_name("stdout"), 1, CHAN_T); \end{verbatim}\end{quote} The interpreter also automatically starts up two processes to read and write the standard input and output to these channels. \section{Printing and Debugging} The function \verb|print_instruction| can be used to print individual instructions on \verb|std_out|. @d Machine prototypes @{ extern void print_instruction(instruction i); @| print_instruction @} Tracing of the machine state can be switched on and off by executing the \label{op_Trace}\inst{Trace}\ instruction. The intention is that such instructions can be added into the program code when debugging the compiler. The {\tt -t} command line option can be used to enable tracing for the whole program, not just for the current process. @d Other PCB fields @{ bool trace_process; @| trace_process @} @d Machine opcodes @{ Trace, @| Trace @} @d Execute instruction \verb|curr_inst| @{ case Trace:{ active_process->trace_process = curr_inst.arg; break; } @} @d Stack effect of executing instruction \verb|curr_inst| @{ case Trace : return(0); @} \section{Manipulating Code} \label{codemanip} It can be tedious building instruction sequences. If they are stored in arrays then we have to do lots of shuffling when building them. Some instructions contain code offsets as arguments and these have to be updated whenever we move code around. It would be simpler if we could store instructions in a linked list and refer to jump destinations via labels. This section describes such an interface. The interpreter uses the function \verb|load_code| (see Section~\ref{loader}) to build an array version of the instruction sequence, replacing labels by their true offsets in the process. Labels will just be integers. Any PC offsets refered to by instructions in \verb|code_entry| instructions will be assumed to refer to a label rather than an actual offset. We provide two functions to assist in the construction of code sequences. The functions \verb|build_icode| and \verb|build_ccode| take a variable number of arguments terminating with the distinguished argument \verb|END|. Each argument can either be an instruction or a (pointer to a) code sequence; for \verb|build_icode| the first argument must be an instruction, for \verb|build_ccode| a code pointer. These functions have no way of knowing how many arguments were passed: it is therefore most important that the \verb|END| is not omitted from the end of the argument list, or all sorts of nasty things are likely to occur. Similarly, if an instruction takes an argument, but this is omitted, then the function will get very confused. In other words, this function allows complex code sequences to be constructed simply, but has to be used carefully. There are two variants of the function, depending on whether the first argument is an instruction or a code sequence. See Section~\ref{io_proc} for an example of their use. @d Machine prototypes @{ extern code build_icode(int op, ...); extern code build_ccode(code c, ...); @} We use a global variable called \verb|label_counter| to keep track of the labels already allocated. This allows us to create fresh labels when required. We add a label to a code sequence by introducing a new pseudo-instruction, \label{op_Lab}\inst{Lab}, that is ignored by the interpreter. In fact the interpreter will never see such instructions as they are removed during the code loading phase. @d Machine globals @{int label_counter = 1; @| label_counter @} @d Machine opcodes @{Lab, @| Lab @} @d Stack effect of executing instruction \verb|curr_inst| @{ case Lab : return(0); @} To obtain a fresh label we use the \verb|new_label| macro. @d Machine prototypes @{ extern int label_counter; #define new_label() (label_counter++) @} The following function generates the correct code sequence to remove stack items when leaving a level of scope. @d Machine prototypes @{code exit_scope_code();@} @d Machine utility functions @{ code exit_scope_code() { int i = exit_scope(); code c = NULL; while (i != 0) { if (i > 0) c = build_ccode( c, Pop, i, END); else c = build_ccode( c, Dcb, -i / CHANNEL_BLOCK_SIZE, END); i = exit_scope(); } return(c); } @| exit_scope_code @} \label{fn_exit_scope_code} We can print out a code sequence using the \verb|print_code| function. Remember that jump destinations in such a printout refer to symbolic labels rather than actual offsets. @d Machine prototypes @{ extern void print_code(code c); @} Finally, to execute a code sequence we use the \verb|interpret| function. @d Machine prototypes @{ extern void interpret(code c); @} \section{Implementation Details} This section describes the remaining implementation details of the interpreter. Such details are probably unimportant if you just want to use the machine. We need to switch between processes to give the impression of parallel execution. This implies that we must be able to save the state of a process in a {\em process control block} and restore it later. We frequently need to build queues of processes, for example the queue of processes waiting to execute, or the queue of processes waiting for input on a particular channel. Each process can only be in one queue at a time, and so we add the queue links to the PCB. The PCB must allow us to determine the stack associated with this process. We must also be able to access the current activation record of the parent process (if any) so that we can look up non-local variables. When a PCB is allocated (as part of creating a new process) we allocate sufficient memory to hold the process block plus the stack. In other words, the stack for this process is assumed to start immediately after the block finishes. The last field of the PCB, the \verb|stack_overflow_check|, is used to detect stack overflows (see later). \label{td_process_block} @d Process block {\tt typedef} @{ typedef struct process_block { int PC; union stack_slot *BP, *SP; int stack_size; @< Other PCB fields @> struct process_block *next, *parent; int stack_overflow_check; } process_block, *process_queue, *process, *processes; @| PC SP stack_size process_block process_queue process processes BP @} A process can be in one of a number of states. When \verb|ACTIVE| the process is either currently running or waiting to run. If it is blocked waiting for a communication from another process then it is in either the \verb|SUSPENDED_IN| or \verb|SUSPENDED_OUT| states. A process that is waiting for its children to terminate before continuing execution is in the \verb|WAITING| state. When a process terminates we cannot immediately recycle its storage as it may have spawned some children who have access to its stack. In such a case we must keep the process until its children have terminated. Processes in this position are in the \verb|RETIRED| state. Finally, retired processes with no living children are put into the \verb|REINCARNATED| state where they wait until the system needs to create another process requiring the same size of stack. \label{td_process_status} @d Process status {\tt typedef} @{ typedef enum process_status { ACTIVE, SUSPENDED_IN, SUSPENDED_OUT, WAITING, RETIRED, REINCARNATED } process_status; @| SUSPENDED_IN SUSPENDED_OUT ACTIVE WAITING RETIRED REINCARNATED @} The process state is stored in the PCB. @d Other PCB fields @{ process_status status; @| process_status @} The processes currently awaiting execution are stored in a circularly-linked list. This allows the next process to be scheduled in a round-robin fashion by just following the \verb|next| pointers. We use the interpreter variable \verb|active_process| to point to this queue, where the process pointed at is the currently running process. We need to be able to remove the active process from this queue when it terminates or is suspended. To enable this to be done efficiently we also keep track of the process that points to the active process so that the queue can be respliced when the process is removed. @d Machine globals @{process_queue active_process = NULL; process_queue prev_process = NULL; @| active_process prev_process @} The \verb|prev_process| pointer does not necessarily point to the process that last ran. Processes that become scheduled for execution get added between the \verb|prev_process| and \verb|active_process| pointers, and the \verb|prev_process| pointer updated, as illustrated in Figure~\ref{fig:activeprocessqueue}. \begin{figure}[hbt] \centering %\BoxedEPSF{Pic3.EPSF scaled 1000} \includegraphics{Pic3.EPSF} \caption{The active process queue} \label{fig:activeprocessqueue} \end{figure} To create a process we need to know the stack size and the starting value of the PC. We allocate the appropriate storage, initialise the fields and return the result. The \verb|stack_overflow_check| field is initialised to a distinctive bit pattern. We check that the field still has this bit pattern whenever an instruction is about to be executed. @d Machine prototypes @{ extern process create_process(int PC, int stacksize); @} \label{fn_create_process} @D Machine utility functions @{ #define OVERFLOW_CHECKSUM 0x55555555 process create_process(int PC, int stacksize) { process p; @< Allocate storage for process \verb|p| with a stack of size \verb|stacksize| @> p->PC = PC; p->next = NULL; p->children = 0; p->parent = active_process; if (active_process) active_process->children++; p->stack_size = stacksize; p->status = ACTIVE; p->SP = (stack_slot *)((char *)p + sizeof(process_block)) + stacksize; p->BP = NULL; p->stack_overflow_check = OVERFLOW_CHECKSUM; p->step_process = p->trace_process = FALSE; return p; } @| create_process @} When a process dies by executing a \verb|Stop| instruction we must decide whether to retire it or reincarnate it straight away. This requires knowing how many spawned children are still alive. We therefore add a \verb|children| field to the PCB to keep track of this. @d Other PCB fields @{ int children; @| children @} @d Terminate current process @{ { process p = active_process; @< Remove \verb|p| from active process queue @> if (p->children) { /* Retire as there are still children alive */ p->status = RETIRED; } else @< Reincarnate process \verb|p| @> } @} When we remove a process from the active process queue we must be careful to deal with the cases where there is only one process. @d Remove \verb|p| from active process queue @{ if (p == prev_process) active_process = prev_process = NULL; else prev_process->next = active_process = p->next; @} Reincarnated processes are chained together in a queue accessed via the \verb|reincarnated_processes| variable. @d Machine globals @{ process_queue reincarnated_processes = NULL; @| reincarnated_processes @} To reincarnate a process we must add it to this queue. However, we must also decrement the \verb|children| count of the parent if there is one. If this count goes to zero, and the parent is retired, then it should be reincarnated as well. Thus reincarnating one process may trigger a whole bunch of ancestors being reincarnated as well. If the parent is in the \verb|WAITING| state when its count reaches 0 then we schedule it for execution again. @d Reincarnate process \verb|p| @{ do { @< Add process \verb|p| to reincarnated list @> p->status = REINCARNATED; if (!p->parent) break; p = p->parent; } while ((!--p->children) && (p->status == RETIRED)); if ((p->status == WAITING) && (!p->children)) { @< Schedule process \verb|p| for execution @> } @} We add the reincarnated process to the head of the reincarnated list. It would be more efficient to keep this list sorted by process size (i.e. stack size), but it is not worth worrying about for an interpreter. @d Add process \verb|p| to reincarnated list @{ p->next = reincarnated_processes; reincarnated_processes = p; @} When we allocate storage for a process we first check to see if there is a reincarnated process of the appropriate size. If not, or the reincarnated list is empty, then we must use \verb|malloc| to allocate some storage. For diagnostic purposes we keep track of the total amount of storage allocated using the \verb|memory_used| variable. @d Machine globals @{ int memory_used = 0; @| memory_used @} @D Allocate storage for process \verb|p| with a stack of size \verb|stacksize| @{ { process q = NULL; p = reincarnated_processes; while ((p != NULL) && (p->stack_size != stacksize)) { q = p; p = p->next; } if (p) /* Unlink it from the reincarnated_processes queue */ if (p == reincarnated_processes) reincarnated_processes = p->next; else q->next = p->next; else { /* No suitable reincarnated process */ int size = sizeof(struct process_block) + stacksize*sizeof(stack_slot); p = malloc(size); memory_used += size; } } @} Here is a simple function that gives us some statistics on process memory usage. @d Machine prototypes @{extern void process_statistics(); @} @d Machine utility functions @{ void process_statistics() { int free_space = 0; process p = reincarnated_processes; while (p != NULL) { free_space += sizeof(struct process_block) + p->stack_size*sizeof(stack_slot); p = p->next; } printf("Total process space allocated = %d, of which %d still in use at exit.\n", memory_used, memory_used - free_space); } @| process_statistics @} \label{fn_process_statistics} \subsection{ Establishing a Communication } Each channel control block will either contain \verb|NULL| if there is no process waiting, or will contain the pointer to a waiting process. We talk about channel queues, but the language is defined so that there can be at most one process waiting on each channel. Thus it is a bounded queue, with the bound equal to one. To output on a channel we must first check whether there is a waiting process. If there is then we must remove the partner from any other channels it might be waiting on. We then transmit the output value by pushing it onto the partner's stack. Finally we resume both processes. If there is no matching partner at this point in time we add the process to the channel queue and suspend. @d Other \verb|stack_slot| options @{ process as_process; @| as_process @} @d Communicate with waiting input; suspend process if none @{ { stack_slot *the_chan = SP->as_addr; if (the_chan->as_process != NULL) { /* We have a match */ @< Establish output communication on channel \verb|the_chan| @> goto save_state; /* Save state and relinquish control (for fairness ) */ } else @< Suspend ouput process until communication partner @> }@} @d Suspend ouput process until communication partner @{ { process p = active_process; the_chan->as_process = p; /* Add the process to the channel queue */ p->status = SUSPENDED_OUT; @< Suspend process \verb|p| @> } @} Consider the state of the stack when an \verb|Alt| $i$ instruction is encountered. Assuming the compiler hasn't goofed, there should be $i$ groups of input channel requests stacked up for us, each one consisting of a channel address followed by the corresponding \verb|In| instruction. The first step is to traverse these entries looking for a matching partner. If we find a non-empty channel then we establish a communication. Otherwise we must suspend until another process wishes to communicate with us. @D Execute \verb|Alt| instruction @{ { stack_slot *the_chan = NULL; int index; if (curr_inst.arg == 0) { @< Terminate current process @> @< Schedule next process or \verb|return| if none @> } @< Look for partner; if found place matching channel in \verb|the_chan| and set PC \& \verb|index| @> if (the_chan == NULL) /* We must wait */ @< Suspend input process until communication partner @> else { SP += 2*curr_inst.arg; /* Pop alternatives off the stack */ @< Establish input communication on channel \verb|the_chan| @> goto save_state; /* Save state and relinquish control (for fairness ) */ } } @} To find a match we just look through the stack entries that have been set up for the \verb|Alt|. If we find one whose channel has a waiting process we save the channel in the variable \verb|the_chan|, set up the new PC and break. We don't check whether any partner we find is of the opposite polarity. It should be in a valid \OCCAM program, but we check later anyway. We check the alternatives in a semi-random order to make things interesting. @D Look for partner; if found place matching channel in \verb|the_chan| ... @{ { int i = curr_inst.arg-1; stack_slot *sp, *a; if (counter%2) for (sp = SP; i>=0; i--, sp += 2) { if ((a = (sp+1)->as_addr) && a->as_process) { /* We have a match */ PC = sp->as_inst.arg; the_chan = a; index = i; break; } } else for (sp = SP + 2*i; i>=0; i--, sp -= 2) { if ((a = (sp+1)->as_addr) && a->as_process) { /* We have a match */ PC = sp->as_inst.arg; the_chan = a; index = curr_inst.arg-i-1; break; } } } @} When we suspend an input process we must put the address of the process in all the channels mentioned in the alternative. We save the number of alternatives on the top of the stack so our eventual partner will know how many stack frames to pop. If the same channel is mentioned in more than one branch of the alternative only one of them will be queued (the choice being made randomly). @D Suspend input process until communication partner @{ { process p = active_process; stack_slot *a, *sp; int i = curr_inst.arg-1; int alts = 0; if (counter%2) for (sp = SP; i>=0; i--, sp += 2) { if (a = (sp+1)->as_addr) { a->as_process = p; alts++; } } else for (sp = SP+2*i; i>=0; i--, sp -= 2) { if (a = (sp+1)->as_addr) { a->as_process = p; alts++; } } if (alts) { (--SP)->as_int = curr_inst.arg; /* Remember the alt count */ p->status = SUSPENDED_IN; @< Suspend process \verb|p| @> } else { @< Terminate current process @> @< Schedule next process or \verb|return| if none @> } }@} To establish a communication we must either transfer a value from the current process to the partner, or vice versa, put the partner into the active state, and fix up the stacks. The \OCCAM\ semantics states that the name of a channel may only be used in one component of a parallel for input, and in one other component of the parallel for output. This should ensure that there can be at most one process waiting on a channel at any one time. Furthermore, if another process wishes to communicate on the channel then they must be of opposite polarity, i.e. they can communicate. However, it is difficult for the compiler to always check this statically and so we check at run-time as well. After all, interpreters aren't supposed to be efficient anyway $\ldots$ @D Establish input communication on channel \verb|the_chan| @{ { process p = the_chan->as_process; /* The partner */ stack_slot *PSP = p->SP; /* The partner's stack pointer */ @< Remove output process \verb|p| from the queue for \verb|the_chan|@> *(--SP) = *(++PSP); PSP++; /* Copy across value */ (--SP)->as_int = index; p->SP = PSP; @< Schedule process \verb|p| for execution @> } @} @D Establish output communication on channel \verb|the_chan| @{ { process p = the_chan->as_process; /* The partner */ stack_slot *PSP = p->SP; /* The partner's stack pointer */ int index; @< Remove input process \verb|p| from all its channel queues and pop PSP; set \verb|index| @> *(--PSP) = *(++SP); SP++; /* Copy across value */ (--PSP)->as_int = index; p->SP = PSP; @< Schedule process \verb|p| for execution @> } @} When we remove a process from any queues we check its polarity to detect cases of two inputs on a channel, or two outputs. @D Remove input process \verb|p| from all its channel queues ... @{ { int i; if (p->status != SUSPENDED_IN) error("Interpreter error", "Two processes trying to output on the same channel.\n"); for (i = (PSP++)->as_int; i>0; i--) { stack_slot *pchan = (PSP+1)->as_addr; if (pchan == the_chan) { p->PC = PSP->as_inst.arg; index = i-1; } if (pchan) pchan->as_process = NULL; PSP += 2; } }@} @d Remove output process \verb|p| from the queue for \verb|the_chan| @{ if (p->status != SUSPENDED_OUT) error("Interpreter error", "Two processes trying to input on the same channel.\n"); the_chan->as_process = NULL; @} \section{Code Manipulation Functions} \subsection{Code construction} \label{sec:codeconstruction} We start by defining the support code. First, the type declarations. \label{td_code_entry} @d Machine {\tt typedef}s @{ typedef struct code_entry { instruction inst; struct code_entry *next; } code_entry; @| code_entry inst @} \label{td_instruction_sequence} @d Machine {\tt typedef}s @{ typedef struct instruction_sequence { code_entry *first, *last; } *code; @| code first last @} Now the functions themselves. We factor out the common code into the function \verb|code_args|. Note that the function needs to know which instructions take arguments. This is an obvious source of error -- it might be better if this information was provided as each instruction was introduced. @d Machine prototypes @{ extern code code_args(int op, va_list args); #define END -1 @| END @} \label{fn_code_args} @D Machine utility functions @{ code code_args(int op, va_list args) { code c, r; code_entry *n; int arg = 0; int next_op; if (op == END) return NULL; if ((void *)op == NULL) { /* Assume we have found null code pointer */ next_op = va_arg(args,int); c = code_args(next_op, args); } else if (op > 255) { /* Assume we have found some code */ c = (code) op; next_op = va_arg(args,int); if (r = code_args(next_op, args)) { c->last->next = r->first; c->last = r->last; dispose(r); } } else { switch (op) { case Stop: case Wait: case Add : case Sub : case Mul : case Div : case Mod : case Lt : case Leq : case Gt : case Geq : case Eq : case Neq : case Idx : case Ida : case Sta : case Ret : case Out : case Swap: case Input:case Output: { next_op = va_arg(args,int); break; } default: { arg = va_arg(args,int); next_op = va_arg(args,int); break; } } n = new(struct code_entry); n->inst.op = op; n->inst.arg = arg; if (c = code_args(next_op, args)) { n->next = c->first; c->first = n; } else { c = new(struct instruction_sequence); c->first = c->last = n; n->next = NULL; } } return c; } @| code_args @} \label{fn_build_icode}\label{fn_build_ccode} @d Machine utility functions @{ code build_icode(int op, ...) { code c; va_list args; va_start(args,op); c = code_args(op, args); va_end(args); return c; } code build_ccode(code c, ...) { code c1; va_list args; va_start(args,c); c1 = code_args((int) c, args); va_end(args); return c1; } @| build_icode build_ccode @} Next, a utility function for disposing of code sequences. @d Machine prototypes @{extern void dispose_code(code c); @} \label{fn_dispose_code} @d Machine utility functions @{ void dispose_code(code c) { if (c) { code_entry *e = c->first; while (e) { code_entry *n = e->next; dispose(e); e = n; } dispose(c); } } @| dispose_code @} \subsection{Constant expressions} \label{sec:constants} There are various places in the \OCCAM\ language where a constant expression is required. To simplify the task of checking for such an expression, and to calculate the value returned by such an expression, the following function can be used. It takes a code sequence corresponding to an expression and the address of an integer as arguments. If the effect of evaluating the code sequence can be computed at compile time then the function returns \verb|TRUE| and the value of the expression is passed back in the integer argument. Otherwise \verb|FALSE| is returned. The function may return \verb|FALSE| even when an expression does denote a constant. An obvious example would be an expression involving array accesses. However, no function is going to be perfect in this regard, and the function provided is adequate for our purposes. @d Machine prototypes @{extern int constant_expression(code c, int *v); @} \label{fn_constant_expression} @D Machine utility functions @{ int constant_expression(code c, int *v) { code_entry *e = c ? c->first : NULL; int S[20]; int sp = -1; instruction i; while(e) { i = e->inst; e = e->next; switch (i.op) { case Ldc: S[++sp] = i.arg; continue; case Add: if (sp > 0) S[sp-1] += S[sp]; break; case Sub: if (sp > 0) S[sp-1] -= S[sp]; break; case Mul: if (sp > 0) S[sp-1] *= S[sp]; break; case Div: if (sp > 0) S[sp-1] /= S[sp]; break; case Mod: if (sp > 0) S[sp-1] %= S[sp]; break; case Lt : if (sp > 0) S[sp-1] = S[sp-1] < S[sp]; break; case Leq: if (sp > 0) S[sp-1] = S[sp-1] <= S[sp]; break; case Gt : if (sp > 0) S[sp-1] = S[sp-1] > S[sp]; break; case Geq: if (sp > 0) S[sp-1] = S[sp-1] >= S[sp]; break; case Eq : if (sp > 0) S[sp-1] = S[sp-1] == S[sp]; break; case Neq: if (sp > 0) S[sp-1] = S[sp-1] != S[sp]; break; default: return(FALSE); } sp--; } if (sp == 0) { *v = S[0]; return(TRUE); } return(FALSE); } @} \subsection{Printing code} Printing a code sequence is easy. We just traverse the list calling \verb|print_instruction| at each node. @d Machine utility functions @{ void print_code(code c) { code_entry *e = c ? c->first : NULL; while (e != NULL) { if (e->inst.op == Lab) { printf("%3d : ", e->inst.arg); e = e->next; if (e->inst.op == Lab) { printf("\n"); continue; } } else printf(" "); print_instruction(e->inst); printf("\n"); e = e->next; } } @| print_code @} \label{fn_print_code} Printing an instruction is even more tedious; it's just a big switch. \label{fn_print_instruction} @D Machine utility functions @{ void print_instruction(instruction i) { switch (i.op) { case Stop: { printf("Stop ");break; } case Wait: { printf("Wait ");break; } case Add : { printf("Add "); break; } case Sub : { printf("Sub "); break; } case Mul : { printf("Mul "); break; } case Div : { printf("Div "); break; } case Mod : { printf("Mod "); break; } case Lt : { printf("Lt "); break; } case Leq : { printf("Leq "); break; } case Gt : { printf("Gt "); break; } case Geq : { printf("Geq "); break; } case Eq : { printf("Eq "); break; } case Neq : { printf("Neq "); break; } case Idx : { printf("Idx "); break; } case Ida : { printf("Ida "); break; } case Sta : { printf("Sta "); break; } case Ret : { printf("Ret "); break; } case Out : { printf("Out "); break; } case Swap: { printf("Swap "); break; } case Input:{ printf("Input "); break; } case Output:{printf("Output "); break; } case In : { printf("In %6d", i.arg); break; } case Ind : { printf("Ind %5d", i.arg); break; } case Ldc : { printf("Ldc %5d", i.arg); break; } case Ldo : { printf("Ldo %5d", i.arg); break; } case Lda : { printf("Lda %5d", i.arg); break; } case Ldn : { printf("Ldn %5d", i.arg); break; } case Sto : { printf("Sto %5d", i.arg); break; } case Pop : { printf("Pop %5d", i.arg); break; } case Ujp : { printf("Ujp %5d", i.arg); break; } case Fjp : { printf("Fjp %5d", i.arg); break; } case Alt : { printf("Alt %5d", i.arg); break; } case Ccb : { printf("Ccb %5d", i.arg); break; } case Dcb : { printf("Dcb %5d", i.arg); break; } case Pstk: { printf("Pstk %4d",i.arg); break; } case Call: { printf("Call %4d",i.arg); break; } case Spawn:{ printf("Spawn %3d",i.arg);break; } case Trace:{ printf("Trace %3d",i.arg);break; } case Enter:{ printf("Enter %3d",i.arg);break; } case Leave:{ printf("Leave %3d",i.arg);break; } case Spawnv:{printf("Spawnv %2d",i.arg); break; } default: { printf("?%d",i.op); break; } } } @| print_instruction @} \subsection{Stack Utilisation} For maximum efficiency and flexibility, we should really calculate the stack movements as we construct the program code. However, this can be error prone, and so we provide a simpler alternative. The function \verb|compute_max_stack| takes a code sequence as argument and computes the maximum amount of stack space it requires. To do this, without knowing anything about the code sequence, is obviously tricky. We therefore make a simplifying assumption. If the code sequence has a loop in it then the size of the stack at the end of the loop must be the same as at the start. This rules out code that loops $n$ times pushing a value onto the stack each time around the loop. This is an inconvenience, but not a major problem. For example, we can allocate $n$ elements on the stack before starting the loop , using the \verb|Ldn| instruction. We can then update each element of this block in turn in a loop. The stack no longer grows each time around the loop and the \verb|compute_max_stack| function is then happy. Probably the only place in the compiler where this is an issue is in the replicated \verb|ALT| construct. Before presenting the code for the \verb|compute_max_stack| we need a bit of infrastructure. We obviously need to take into account the stack requirements of any procedure calls in the code sequence. However, the code for the procedure body may not be contained within the code sequence. We therefore assume that a \inst{Pstk}\ instruction is placed immediately after the \verb|Call| instruction. The argument to this instruction is the amount of stack space required by the procedure body. The instruction is ignored by the interpreter (in fact it is removed when converting the sequence to an array of instructions). @d Machine opcodes @{ Pstk @| Pstk @} The following function computes the effect on the size of the stack of executing a single instruction. Most of the effects were defined at the point each instruction was introduced. @d Machine prototypes @{ extern int stack_effect(instruction curr_inst); @} \label{fn_stack_effect} @d Machine utility functions @{ int stack_effect(instruction curr_inst) { switch (curr_inst.op) { @< Stack effect of executing instruction \verb|curr_inst| @> case Pstk: return(0); /* Will be handled specially in compute_max_stack */ default: print_instruction(curr_inst); printf("\n"); error("Fatal error", "Missing case in stack_effect\n"); } } @| stack_effect @} As mentioned above, we need to check that the stack pointer at the beginning and end of a loop is the same. While traversing the code we keep track of any labels encountered along the path from the start of the code sequence to the current position. With each label we store the associated value of the stack pointer. Because we are essentially performing a depth-first traversal a stack is suitable for this task. We should really check for stack overflow here, but it's not worth it on such a simple practical. @d Machine prototypes @{ struct { int label; int SP; } path_stack[50]; @| path_stack @} We make two traversals of the code. The first builds a simple table associating labels with their associated code pointers. This makes it easier to follow jumps in the second pass. For simplicity we represent the table as a linked list; there is obvious scope for improvement here... \label{td_jump_entry} @d Machine prototypes @{ typedef struct jump_entry { int label; code_entry *dest; struct jump_entry *next; } jump_entry, *jump_table; @| jump_table jump_entry @} We build a jump table by making a linear traversal of the code. @d Machine prototypes @{extern jump_table build_jump_table(code c);@} \label{fn_build_jump_table} @d Machine utility functions @{ jump_table build_jump_table(code c) { code_entry *p = c ? c->first : NULL; jump_table jt = NULL; while (p != NULL) { if (p->inst.op == Lab) { /* Add the entry (p->inst.arg, p) to jump table */ jump_table l; l = new(jump_entry); l->next = jt; l->label = p->inst.arg; l->dest = p; jt = l; } p = p->next; } return(jt); } @| build_jump_table @} To be neat and tidy, we also provide a function to dispose of these tables. @d Machine prototypes @{extern void dispose_jump_table(jump_table jt);@} @d Machine utility functions @{ void dispose_jump_table(jump_table jt) { jump_table n; while (jt != NULL) { n = jt->next; dispose(jt); jt = n; } } @| dispose_jump_table @} \label{fn_dispose_jump_table} Looking up a label is simple giving our dubious choice of data structure. @d Machine prototypes @{extern code_entry *lookup_jt(int label, jump_table jt);@} \label{fn_lookup_jt} @d Machine utility functions @{ code_entry *lookup_jt(int label, jump_table jt) { while (jt != NULL && jt->label != label) jt = jt->next; if (jt == NULL) error("Fatal error", "Jumping to a non-existent label\n"); return(jt->dest); } @| lookup_jt @} Now for the second phase of the function. We use an auxiliary function, \verb|max_stack|, to perform this task. It takes as argument the code sequence to analyse, a jump table, the path to the starting point and the initial value of the stack pointer. These last two arguments are used when the function calls itself recursively. We also pass the starting point to the whole program fragment to aid debugging, but this is not strictly necessary. \label{fn_max_stack} @D Machine utility functions @{ int max_stack(code c, code_entry *e, int initial_SP, int path_SP, jump_table jt) { instruction curr; int SP = initial_SP, MAX_SP = initial_SP, i; while (e != NULL) { curr = e->inst; e = e->next; i = stack_effect(curr); if (i < 0 && SP < i) { print_code(c); error("Fatal error", "Stack underflow in function max_stack\n"); } SP += i; MAX_SP = (SP > MAX_SP) ? SP : MAX_SP; switch (curr.op) { case Lab: @< Compute stack effect of label @> case Ujp : e = lookup_jt(curr.arg, jt); break; case Pstk: { /* Make sure there is enough space for procedure call */ int newSP = SP + curr.arg; MAX_SP = (newSP > MAX_SP) ? newSP : MAX_SP; break; } case Stop : case Ret : return(MAX_SP); case Fjp : @< Compute stack effect of Fjp @> case In : @< Compute stack effect of In @> case Alt : @< Compute stack effect of Alt @> } } return(MAX_SP); } @| max_stack @} Processing a label is simple, given the restrictions we have placed on the input code to the function. We just have to make sure we don't keep looping for ever... @d Compute stack effect of label @{ { /* First check to see if we have already seen this label on current path */ int i = 0; while (path_stack[i].label != curr.arg && i < path_SP) i++; if (i == path_SP) { /* i.e. label not seen so update path stack */ path_stack[path_SP].label = curr.arg; path_stack[path_SP++].SP = SP; break; } else if (path_stack[i].SP != SP) { /* Loop encountered with changing SP */ print_code(c); error("Fatal error", "Loop encountered with stack %s in " "function max_stack.\n", (SP > path_stack[i].SP) ? "increasing" : "decreasing"); } else /* We have found a well behaved loop. Stop this particular branch of the traversal and return the maximum value of SP encountered. */ return(MAX_SP); } @} Now comes the tricky bit. The complication arises due to the \verb|In| and \verb|Alt| instructions. These act like conditional jumps. When processing an occurrence of an \verb|In| instruction we should record, in a list, the destination of the instruction. When the corresponding \verb|Alt| instruction is encountered we should adjust the stack pointer to the value it would have immediately after the \verb|Alt| instruction has finished. We must then recursively call the function on all the code sequences queued up as a result of processing the \verb|In| instructions, and find out which one of these has the biggest stack requirements. We use a global variable, \verb|in_jt|, to store the jump table constructed by the \verb|In| instructions. The structure of the language insures that such constructs cannot be nested, and so this is safe. @d Machine globals @{jump_table in_jt = NULL; @| in_jt @} @d Compute stack effect of In @{ { jump_table l; l = new(jump_entry); l->next = in_jt; l->dest = lookup_jt(curr.arg, jt); in_jt = l; break; } @} @d Compute stack effect of Alt @{ { jump_table bodies = in_jt; if ((SP -= (2*curr.arg - 1)) < 1) { print_code(c); error("Fatal error", "Stack underflow in function max_stack\n"); } if (in_fjp) return(MAX_SP); /* see following paragraphs for explanation */ in_jt = NULL; while (bodies != NULL) { int fj_max; jump_table n; fj_max = max_stack(c,bodies->dest, SP, path_SP, jt); MAX_SP = (fj_max > MAX_SP) ? fj_max : MAX_SP; n = bodies->next; dispose(bodies); bodies = n; } return(MAX_SP); } @} The only instruction we haven't considered yet is \verb|Fjp|. We treat this like the alternatives above, i.e. we try each alternative, and return the maximum stack required. However, in this case, the order in which we try the alternatives is important. Consider how we might compile a replicated \verb|ALT|. The code will look something like \begin{quote}\tt\small \hbox{}L:~$\ldots$ \hbox{}~~~Fjp M \hbox{}~~~$\ldots$ \hbox{}~~~In N \hbox{}~~~$\ldots$ \hbox{}~~~Ujp L \hbox{}M:~$\ldots$ \hbox{}~~~Alt n \end{quote} If we process the jump to {\tt M} {\em before} processing the \verb|In| instruction then \verb|in_jt| will still be \verb|NULL| when we reach the \verb|Alt| instruction. In this case it seems better to process the jump {\em after} we have processed the code containing the \verb|In| instruction. However, the problem is more subtle than this. Suppose one of the alternatives is guarded, and the guard expression generates code involving \verb|Fjp|. A conjunction of expressions would be an obvious example. This would generate two traversals leading to the \verb|Alt| instruction. The first would result in \verb|in_jt| being cleared, and the second traversal would therefore produce wrong results. If we make the assumption that the stack size will be identical when the \verb|Alt| instruction is reached, no matter which path is taken, then we can fix the problem by making sure that the first traversal just returns when the \verb|Alt| instruction is reached, leaving \verb|in_jt| unchanged. We introduce a counter, \verb|in_fjp|, initially 0. @d Machine globals @{int in_fjp = 0; @| in_fjp @} Suppose we are in the middle of processing some inputs, i.e. \verb|in_jt| is not \verb|NULL|, and we encounter a conditional jump. We then have two paths leading to the \verb|Alt|. We increment \verb|in_jt| while we process one of them, and decrement it before processing the other. The \verb|Alt| instruction then only processes the contents of \verb|in_jt| when the count is equal to zero. This ensures that we only process the contents of \verb|in_jt| once. This is a bit of a hack, but it's probably the best I can do given the restrictions a function like \verb|compute_max_stack| is working under. @d Compute stack effect of Fjp @{ { int fj_max; jump_table old_in_jt = in_jt; if (old_in_jt) in_fjp++; fj_max = max_stack(c, e, SP, path_SP, jt); /* Process code after jump */ if (old_in_jt) in_fjp--; MAX_SP = (fj_max > MAX_SP) ? fj_max : MAX_SP; e = lookup_jt(curr.arg, jt); /* Process the jump */ break; } @} This whole argument sounds plausible, but I'd like a slightly more systematic argument about its correctness, and a more rigorous description of the assumptions being made about the input code for the correctness to hold -- any suggestions? The main procedure to compute stack sizes is now simple to state. @d Machine prototypes @{extern int compute_max_stack(code c);@} \label{fn_compute_max_stack} @d Machine utility functions @{ int compute_max_stack(code c) { int m; jump_table jt; jt = build_jump_table(c); m = max_stack(c, c ? c->first : NULL, 0, 0, jt); dispose_jump_table(jt); return(m); } @| compute_max_stack @} \subsection{The Loader} \label{loader} The compiler builds the abstract machine code program in the form of a sequence with symbolic labels. However, the interpreter requires the instructions to be in an array with all labels resolved to their associated locations. The \verb|load_code| function performs this transformation. @d Machine prototypes @{extern instruction *load_code(code c); @} The function needs to keep track of the association between labels and their locations in the code array. We use the \verb|label_entry| structure to build a simple list of such associations. We don't attempt to sort it as there shouldn't be a vast number of labels in the programs we are likely to encounter. We traverse the code list twice. The first time we build the table and count how many instructions there are. We then allocate storage for the instruction array and then traverse the code again stuffing the instructions into the array and resolving jump destinations. We dispose the storage for the code sequence along the way, so the argument \verb|c| shouldn't be used after the call without assigning something to it first. \label{fn_load_code} @d Machine utility functions @{ instruction *load_code(code c) { typedef struct label_entry { int label, code_location; struct label_entry *next; } label_entry, *label_table; int n; label_table labels = NULL; instruction *instruction_array; code_entry *s = c ? c->first : NULL; @< Compute \verb|n|, the length of code sequence \verb|s|, % building label table \verb|labels| in the process @> instruction_array = malloc(n * sizeof(instruction)); @< Pack code sequence \verb|s| into \verb|instruction_array| using \verb|labels| table @> @< Dispose of \verb|labels| table @> dispose(c); @< Create and initialise breakpoint array of size \verb|n| @> code_size = n; return(instruction_array); } @| load_code @} @d Machine globals @{int code_size = 0; @| code_size@} When we encounter a label we don't bother checking whether an instruction with this label has already been encountered. This shouldn't happen with properly constructed code anyway. @d Compute \verb|n|, the length of code sequence \verb|s|, % building label table \verb|labels| in the process @{ { code_entry *p = s; n = 0; while (p != NULL) { if (p->inst.op == Lab) { /* Add the entry (p->inst.arg, n) to labels table */ label_table l; l = new(label_entry); l->next = labels; l->label = p->inst.arg; l->code_location = n; labels = l; n--; } if (p->inst.op == Pstk) n--; n++; p = p->next; } } @} When packing instructions into the array we must look out for instructions whose arguments are jump destinations. When we encounter one of these we use the \verb|labels| table to replace the label by its actual destination in the array. If we can't find the label we report a fatal error -- someone has messed up the code generation... A typical cause of this is to forget to concatenate all the bits of code together before calling the interpreter. @d Pack code sequence \verb|s| ... @{ { int i = 0; while (iinst.op == Lab || s->inst.op == Pstk) { s = s->next; dispose(p); p = s; continue; } instruction_array[i++] = s->inst; if ((s->inst.op == Ujp) /* Check to see if the argument */ || (s->inst.op == Fjp) /* is a reference to a label */ || (s->inst.op == Spawn) /* and replace it if it is. */ || (s->inst.op == Spawnv) || (s->inst.op == Call) || (s->inst.op == In)) { int label = s->inst.arg; label_table p = labels; while ((p != NULL) && (p->label != label)) p = p -> next; if (p == NULL) error("Interpreter error", "Missing label (%d) in code.\n", label); instruction_array[i-1].arg = p->code_location; } s = s->next; dispose(p); } } @} @d Dispose of \verb|labels| table @{ { label_table p = labels; label_table o = p; while (p != NULL) { o = p; p = p -> next; dispose(o); } } @} \section{The Interpreter} \subsection{The I/O Processes} \label{io_proc} The interpreter starts up two processes in addition to those specified by the user. These support input and output via the two channels at the base of the original stack. \noindent The input process is a simple loop of the form @d Input process code @{ il = new_label(); input_process_code = build_icode( Lab,il, Input, Ldo, -1, Ldc,-CHANNEL_BLOCK_SIZE-2, Idx, Out, Ujp,il, END); @| input_process_code @} The output process is a simple loop of the form @d Output process code @{ { int l1 = new_label(), l2 = new_label(); ol = new_label(); output_process_code = build_icode( Lab,ol, Ldo,-1, Ldc,-2*CHANNEL_BLOCK_SIZE-3, Idx, Lab,l1, Ldo,-3, In, l2, Alt, 1, Lab,l2, Pop, 1, Output, Ujp,l1, END); } @| output_process_code @} To spawn them is easy. We just execute the code returned by the following function. @d Machine prototypes @{extern code io_code();@} \label{fn_io_code} @d Machine utility functions @{ code io_code() { code c, input_process_code, output_process_code; int iss, oss, il, ol; @< Input process code @> @< Output process code @> /* The occurrence of 3 in the following statements is due to the display */ iss = compute_max_stack(input_process_code) + 3; oss = compute_max_stack(output_process_code)+ 3; return( build_icode( Ujp, 0, input_process_code, output_process_code, Lab,0,Ccb, 1, /* stdin */ Ccb, 1, /* stdout */ Ldc, 2, Ldc, iss, Spawn, il, Ldc, 2, Ldc, oss, Spawn, ol, END)); } @| io_code @} \subsection{The Interpreter Loop} To run the interpreter we compute the stack required by the main process, load the code into an array, create a PCB for the main process, set up the initial display, and let it rip! After execution has finished we print out a few statistics if required. @d Machine globals @{instruction *Code; @| Code @} @d Machine prototypes @{extern void interpret(code c); @} \label{fn_interpret} @d The interpreter @{ void interpret(code program) { int stacksize; clock_t start_time; process p; stacksize = compute_max_stack(program) + 2; /* for initial display */ Code = load_code(program); p = create_process(0, stacksize); /* This next dubious line is so that the input and output processes will not be viewed as children of the main process, just in case the main process executes the wait instruction. */ p->children = -2; if (debug_flag) @< Setup initial breakpoint @> initialise_display(1, &p->SP, &p->BP); @< Schedule process \verb|p| for execution @> start_time = clock(); interpreter_loop(); if (pool_flag) { printf("CPU time used in interpreter = %d milliseconds.\n", (clock()-start_time)*10); pool_statistics(); process_statistics(); } } @| interpret @} The structure of the interpreter loop is fairly simple. We just keep executing instructions until all processes have terminated or blocked. When a process communicates it will naturally get rescheduled. However, if it does a lot of computation without any communication then the other processes will get blocked out. An efficient implementation would set a timer and then reschedule the process in an interrupt handler. However, the interpreter is not designed to be particularly efficient and so we adopt a simpler approach. The \verb|counter| variable is decremented each time an instruction is executed and when it reaches 0 the process is rescheduled. To give a bit of randomness to the system the inital value of the counter when a process starts executing is set to a random number in the range 0..\verb|max_slice| When a process is picked for execution we must set up the machine registers appropriately. When the process stops executing because it is waiting on a communication, or its timeslice has expired, we must save the current values of the registers in the processor block. Because instruction execution is nested inside a \verb|for| loop and a \verb|switch| it is convenient to use a \verb|goto| when a process is rescheduled. Hopefully the resulting control flow isn't too obscure. @d Machine prototypes @{extern void interpreter_loop(); @} \label{fn_interpreter_loop} @d The interpreter @{ void interpreter_loop() { while (1) { @< Machine state @> register int counter; register instruction curr_inst; load_state: PC = active_process->PC; SP = active_process->SP; BP = active_process->BP; for (counter = rand() % max_slice; counter--; ) { if (active_process->stack_overflow_check != OVERFLOW_CHECKSUM) error("Fatal error", "Stack overflow for process %d with stack size = %d.\n", active_process, BP-SP+1); if (tracing_flag || active_process->trace_process) @< Display machine state @> @< Check for breakpoint @> switch (curr_inst = Code[PC++], curr_inst.op) { @< Execute instruction \verb|curr_inst| @> } } save_state: active_process->PC = PC; active_process->SP = SP; active_process->BP = BP; prev_process = active_process; active_process = active_process->next; } } @| interpreter_loop Code curr_inst counter load_state: save_state: @} Finally, a few code fragments for process manipulation. When we schedule a process for execution we put it at the end of the queue so that we don't overtake any waiting processes. We need to treat the case where the queue is empty specially. @d Schedule process \verb|p| for execution @{ if (active_process) { prev_process->next = p; p->next = active_process; prev_process = p; } else { prev_process = active_process = p; p->next = p; } p->status = ACTIVE; @} To suspend a process we just remove it from the queue, save its state, and then schedule the next process (if any). We presume its status has already been changed. @d Suspend process \verb|p| @{ @< Remove \verb|p| from active process queue @> p->SP = SP; p->PC = PC; p->BP = BP; @< Schedule next process or \verb|return| if none @> @} @d Suspend if children and schedule next process @{ if (active_process->children) { process p = active_process; p->status = WAITING; @< Suspend process \verb|p| @> } else break; @} @d Schedule next process or \verb|return| if none @{ if (active_process == NULL) return; else goto load_state; @} \section{Breakpoints} When debugging a sequential program it is often useful to be able to set a breakpoint at a particular instruction. The machine will then continue executing instructions until a breakpoint is reached, or execution terminates. A trivial way of supporting such a behaviour in an interpreter is to use one bit in the instruction to indicate the presence or absence of a breakpoint. In a parallel setting such breakpoints are less useful. It would be neater if we could stop execution when a particular process reaches the breakpoint. As space and speed are not really an issue in such an interpreter, we associate a linked list of process IDs (i.e. addresses) with each instruction in the code array. If the active process is contained in this list then we breakpoint. If the list contains an entry with a \verb|NULL| process ID then we stop no matter which process is running. \label{td_process_list} @d Machine {\tt typedef}s @{ typedef struct process_list { process p; struct process_list *next; } *process_list; @| process_list @} @d Machine globals @{process_list *Bpt; @| Bpt @} @d Create and initialise breakpoint ... @{ { int i = n-1; Bpt = malloc(n * sizeof(process_list)); while (i >= 0) Bpt[i--] = NULL; } @} @d Setup initial breakpoint @{ { Bpt[0] = new(struct process_list); Bpt[0]->p = p; Bpt[0]->next = NULL; } @} We enter the debugger if the current process has the \verb|step_process| flag set to \verb|TRUE| (clearing it in the process), the current instruction's breakpoint list mentions this process, or this list contains the null process. @d Check for breakpoint @{ if (active_process->step_process) { active_process->step_process = FALSE; debugger(PC,SP,BP); } else if (Bpt[PC]) { /* Check list for this process, or the NULL process */ process_list l = Bpt[PC]; while (l && l->p != NULL && l->p != active_process) l = l->next; if (l) debugger(PC,SP,BP); } @} When a breakpoint is encountered the user is presented with a number of alternatives. The language accepted by the system at this point is shown below. We could use bison to parse this language, but it's a bit tedious supporting two bison parsers in the same program, and the language is quite simple. We therefore do it by hand. Here are the statements accepted by the debugger (optional items are enclosed in square brackets): \begin{description} \item[{\tt i [$m$] [+ $n$]}] Display $n$ instructions starting at instruction $m$. If $n$ is omitted it defaults to the current PC, and $n$ defaults to 1. \item[{\tt s [$n$]}] Print the top $n$ values on the stack. If $n$ is omitted then the stack entries between \verb|SP| and \verb|BP| are printed. \item[{\tt a $a$ [+ $n$]}] Print the contents of $n$ addresses starting at hex address $a$. \item[{\tt b [$i$ [$a$]]}] Toggle the breakpoint at instruction $i$. If $a$ is specified then toggle the breakpoint for the process with this hex address. If no arguments are specified then display a list of breakpoints. \item[{\tt n}] Step to the next instruction. Note that if the instruction involves a communication then the process may be suspended, and other breakpoints might be responded to first. \item[{\tt g}] Continue execution until the next breakpoint is reached (not necessarily in this process), or until execution terminates. \item[{\tt t [$a$]}] Toggle tracing for process with hex address $a$, or for all processes if no argument is present. \end{description} The main structure of the debugger is fairly obvious. The main arkwardness arises because we must reenable input buffering before interacting with the user. We disable it again when we exit from the debugger. @d Machine prototypes @{extern void debugger(int PC, stack_slot *SP, stack_slot *BP);@} @D Machine utility functions @{ void debugger(int PC, stack_slot *SP, stack_slot *BP) { char buffer[256], command[256]; int offset; printf("%X @@ %3d: ", active_process, PC); print_instruction(Code[PC]); printf("\n"); reset_terminal(); while (TRUE) { printf("? "); fflush(stdout); if (gets(buffer) == NULL) exit(EXIT_SUCCESS); if (sscanf(buffer, "%s%n", command, &offset) != 1) { printf("%X @@ %3d: ", active_process, PC); print_instruction(Code[PC]); printf("\n"); } else if (strlen(command) != 1) printf("Illegal command. Type '?' for help.\n"); else switch (command[0]) { case 'i': @< Handle debugger instruction 'i' and break @> case 's': @< Handle debugger instruction 's' and break @> case 'a': @< Handle debugger instruction 'a' and break @> case 't': @< Handle debugger instruction 't' and break @> case 'b': @< Handle debugger instruction 'b' and break @> case 'n': active_process->step_process = TRUE; case 'g': disable_buffering(); return; default : printf("Illegal command.\n\n"); case '?': @< Display debugger help @> } } } @| debugger @} \label{fn_debugger} Now let's deal with each debugger instruction in turn. To process the optional arguments to the instructions we make multiple calls to \verb|scanf| until one succeeds. This is clearly non-optimal, but this is not important in the current context. In the case of the '{\tt i}' instruction we just need to call \verb|print_instruction| on the appropriate range of instructions. We associate the debugging help information with each entry to ease maintenance. @d Debugger help information @{ printf("i [m] [+ n]\n"); printf(" Displays n instructions starting at instruction m.\n"); printf(" If m is omitted it defaults to the current PC, and n defaults to 1.\n\n"); @} @d Handle debugger instruction 'i' and break @{ { int m, n, i; if (sscanf(&(buffer[offset]), "%d +%d", &m, &n) == 2) ; else if (sscanf(&(buffer[offset]), " +%d", &n) == 1) m = PC; else if (sscanf(&(buffer[offset]), "%d", &m) == 1) n = 1; else { m = PC; n = 1; } for (i = 1; i <= n && m < code_size; i++) { printf("%3d: ", m); print_instruction(Code[m++]); printf("\n"); } break; } @} The '{\tt s}' and '{\tt a}' instructions are similar. Note that the world can end if we provide bogus hex addresses. This is particularly important for those instructions that require a valid process address. @d Debugger help information @{ printf("s [n]\n"); printf(" Display the top n entries on the stack. \n"); printf(" If n is omitted then the stack entries between SP and BP are displayed.\n\n"); @} @d Handle debugger instruction 's' and break @{ { stack_slot *m = BP, *p; int i; if (sscanf(&(buffer[offset]), "%d", &i) == 1) m = SP+(i-1); for (p = SP; p <= m; p++) printf("%X: %X\n", p, p->as_addr); break; } @} @d Debugger help information @{ printf("a m [+ n]\n"); printf(" Print the contents of n addresses starting at hex address m.\n\n"); @} @d Handle debugger instruction 'a' and break @{ { void *p; int i, n = 1; if (sscanf(&(buffer[offset]), "%x +%d", &p, &n) == 2) ; else if (sscanf(&(buffer[offset]), "%x", &p) == 1) ; else p = SP; for (i = 0; i < n; ((int *)p)++, i++) printf("%X: %X\n", p, *(int *)p); break; } @} If an argument is provided to the '{\tt t}' instruction then we assume that this is the hex address of a process. We toggle the \verb|trace_process| flag of this process. If no argument is provided we toggle the global \verb|tracing_flag|. @d Debugger help information @{ printf("t [a]\n"); printf(" Toggle tracing for process with hex address $a$,\n"); printf(" or for all processes if no argument is present.\n\n"); @} @d Handle debugger instruction 't' and break @{ { process p = NULL; sscanf(&(buffer[offset]), "%x", &p); if (p) p->trace_process = !p->trace_process; else tracing_flag = !tracing_flag; break; } @} The breakpoint instruction is the trickiest one to process. If no arguments are provided we just traverse and display the contents of the \verb|Bpt| array. Otherwise we traverse the appropriate entry in this list to see if there is already an appropriate breakpoint. If there is we remove it, and if not we add it. @d Debugger help information @{ printf("b [i [a]]\n"); printf(" Toggle the breakpoint at instruction i. If a is specified then\n"); printf(" toggle the breakpoint for the process with this hex address.\n"); printf(" If no arguments then display a list of breakpoints.\n\n");@} @d Handle debugger instruction 'b' and break @{ { int pc = PC; void *p = NULL; process_list l; if (sscanf(&(buffer[offset]), "%d %x", &pc, &p) == 2) ; else if (sscanf(&(buffer[offset]), "%d", &pc) == 1) ; else @< Display breakpoints and \verb|continue| with debugger loop @> if (Bpt[pc]) if (Bpt[pc] && (Bpt[pc]->p == p)) /* Remove it */ Bpt[pc] = Bpt[pc]->next; else { process_list prev = Bpt[pc]; l = prev->next; while (l && l->p != p) { prev = l; l = l->next; } if (l) /* Remove it */ { prev->next = l->next; dispose(l); } else @< Add breakpoint @> } else @< Add breakpoint @> break; } @} @d Add breakpoint @{ { l = new(struct process_list); l->p = (process)p; l->next = Bpt[pc]; Bpt[pc] = l; } @} @d Display breakpoints and \verb|continue| with debugger loop @{ { int i; printf("Breakpoints:\n"); for (i = 0; i < code_size; i++) { if (Bpt[i]) { l = Bpt[i]; printf("%3d: ", i); while (l) { if (l->p == NULL) printf("*"); else printf("%X", l->p); if (l = l->next) printf(", "); else printf("\n"); } } } continue; } @} To step past an instruction we use the \verb|step_process| flag contained in the PCB of a process. When set, the process will enter the debugger before executing it's next instruction. To step a process we just need to set the flag and clear it when the debugger is next entered. @d Other PCB fields @{ bool step_process; @| step_process @} @d Debugger help information @{ printf("n Step to the next instruction.\n\n"); printf("g Continue execution.\n\n");@} Finally, here is some boring code to print out the help information for the instructions. @d Display debugger help @{ printf("These are the instructions accepted by the debugger:\n\n"); @< Debugger help information @> @} When tracing the processes we need to display the state of the machine. Here is a simple bit of code to print out the current state of the active process. @d Display machine state @{ { int stack_slots = BP-SP+1; printf("Process %X: PC = %3d, instruction = ", active_process, PC); print_instruction(Code[PC]); printf(", SP = %X,\n", SP); printf(" stack depth =%3d", stack_slots); if (stack_slots > 0) { stack_slot *sp; printf(", contents = "); if (stack_slots > 10) { stack_slots = 10; printf("..., "); } sp = SP+(stack_slots-1); while (--stack_slots) printf("%X, ", (sp--)->as_int); printf("%X.\n", sp->as_int); } else printf(".\n"); fflush(stdout); } @} \section{The Interpreter Files} @o code/interpreter.h @{@< Copyright message @> #ifndef _INTERPRETER_H #define _INTERPRETER_H @< Machine opcodes {\tt typedef} @> @< Process status {\tt typedef} @> @< Process block {\tt typedef} @> @< Machine {\tt typedef}s @> @< Machine prototypes @> #endif @| interpreter.h @} @d Occam object files @{ interpreter.o @} @d Makefile dependencies @{ interpreter.c: occam.h pool.h interpreter.h @} @o code/interpreter.c @{@< Standard \OCCAM\ includes @> #include #include #include "occam.h" #include "pool.h" #include "interpreter.h" @< Machine globals @> @< Machine utility functions @> @< The interpreter @> @| interpreter.c @} % Following code defines table of functions for preceeding chapter. \newpage \begin{table}[t] \begin{center} \begin{tabular}{||l|l|l|c||} \hline Function name&Return value type&Argument types&Page\\ \hline build\_ccode&code&code, ...&\pageref{fn_build_ccode}\\ build\_icode&code&int, ...&\pageref{fn_build_icode}\\ build\_jump\_table&jump\_table&code&\pageref{fn_build_jump_table}\\ code\_args&code&int, va\_list&\pageref{fn_code_args}\\ compute\_max\_stack&int&code&\pageref{fn_compute_max_stack}\\ constant\_expression&int&code, int *&\pageref{fn_constant_expression}\\ create\_process&process&int, int&\pageref{fn_create_process}\\ debugger&void&int, stack\_slot *, stack\_slot *&\pageref{fn_debugger}\\ dispose\_code&void&code&\pageref{fn_dispose_code}\\ dispose\_jump\_table&void&jump\_table&\pageref{fn_dispose_jump_table}\\ exit\_scope\_code&code&void&\pageref{fn_exit_scope_code}\\ initialise\_display&void&int, stack\_slot **, stack\_slot **&\pageref{fn_initialise_display}\\ interpret&void&code&\pageref{fn_interpret}\\ interpreter\_loop&void&void&\pageref{fn_interpreter_loop}\\ io\_code&code&void&\pageref{fn_io_code}\\ load\_code&instruction *&code&\pageref{fn_load_code}\\ lookup\_jt&code\_entry *&int, jump\_table&\pageref{fn_lookup_jt}\\ max\_stack&int&code, code\_entry *,int, int jump\_table&\pageref{fn_max_stack}\\ print\_code&void&code&\pageref{fn_print_code}\\ print\_instruction&void&instruction&\pageref{fn_print_instruction}\\ process\_statistics&void&void&\pageref{fn_process_statistics}\\ stack\_effect&int&instruction&\pageref{fn_stack_effect}\\ \hline \end{tabular} \caption{Functions defined in this chapter.} \end{center} % perhaps put typedefs here.... \end{table}