Instruction buffering in the CDC 7600

The CDC 7600 [1] was designed to be machine code upward compatible with the 6600, but to provide a substantial increase in performance. The organisation of the 7600 central processor is very similar to that of the 6600. It contains nine parallel functional units, a scratch-pad of eight X registers, eight A registers and eight B registers, and an Instruction Stack. The 7600 Instruction Word Stack is made up of twelve 60-bit registers, however, compared with the eight used in the 6600, and each register also has its own 18-bit associative address register in an Instruction Address Stack (see figure).

The Instruction Stack is filled two words ahead of the instruction currently being executed, thus giving a greater degree of pre-fetching than was possible in the 6600, and hence overcoming the storage access delay for sequential instructions. Furthermore, instructions are obeyed from a Current Instruction Word register, rather than from the bottom stack register, and a complete 60-bit word is transferred from the Instruction Stack into this register whenever the word address changes in the program address counter. This transfer can be made from any of the twelve registers in the Instruction Word Stack, allowing a considerable degree of flexibility in pre-fetching and loop catching. Whenever a new word is required in the Current Instruction Word register the address in the program address counter is compared with the entries in the Instruction Address Stack, and if a coincidence occurs for any of these entries, the contents of the corresponding register in the Instruction Word Stack are transferred into the Current Instruction Word register.

When obeying sequential code the required word will normally be in one of the bottom two registers. When a branch instruction is executed and the branch taken, the required word may already be in one of the top ten registers, obviating the need for a store access, and giving improved performance. If the required word is not in the stack, the first two words at the target address are immediately requested from store and instruction execution continues when the first of these is received. Whenever an instruction word is received from store all the entries in the Instruction Word Stack and the Instruction Address Stack are simultaneously moved up one position, with the new address and instruction word being entered at the bottom of the stack and the oldest entry being lost. Entries in the stack are only invalidated by the execution of a subroutine call or Exchange Jump, and not by normal branch instructions, so that a program may branch back and forth between short sequences of non-contiguous code held in the stack.

Although this stack is larger than that of the 6600, it is still relatively small, and considerable effort is frequently required to reduce the amount of code in program loops in order that they may fit into it. A quite different scheme from any of those considered so far is required if loops of unrestricted size are to be accommodated. Such a scheme was invented as part of the MU5 project.