Three-address systems:
CDC 6600 and 7600

The Control Data Corporation 6600 computer first appeared in the early 1960s, and was superseded in the late 1960s by the 7600 system The latter is machine code compatible upward from the former, so that the basic instruction formats of the two systems are identical, but the 7600 is about four times faster that the 6600. The 6600 remains an important system for students of computer architecture, however, since it has been particularly well documented [1]. An understanding of its organisation and operation provides a proper background not only for an appreciation of the design of the 7600 system, but also that of the Cray-1, which can be seen as a logical extension of the 6600/7600 concepts from scalar to vector operation. Further details of the design of the 6600 and 7600 can be found under Parallel Units, and of the Cray-1 under Vectors.

The CDC 6600 was designed to solve problems substantially beyond contemporary computer capability and, in order to achieve this end, a high degree of functional parallelism was introduced into the design of the central processor. This in turn required an instruction set and processor organisation which could exploit this parallelism, while at the same time maintaining at least the illusion of strictly sequential execution of instructions. A three-address instruction format provides this possibility, since successive instructions can refer to totally independent input and result operands. This would be quite impossible with a one-address instruction format, for example, where one of the inputs for an arithmetic operation is normally taken from, and the result returned to, a single implicit accumulator. Despite the potential for instruction overlap, dependencies between instructions can still occur in a three-address system. For example, where one instruction requires as its input the result of an immediately preceding instruction, the hardware must ensure that these are strictly maintained. This would be difficult if full store addresses were involved, but the use of three full store addresses would, in any case, have made 6600 instructions prohibitively long. There were, in addition, strong arguments in favour of having a scratch-pad of fast registers in the 6600 which could match the operand processing rate of the functional units.

Thus the overall design of the 6600 processor and its instruction formats are as shown in the figure. There are three groups of scratch-pad registers, eight 60-bit operand registers (X), eight 18-bit address registers (A), and eight 18-bit index registers (B). 15-bit computational instructions take operands from the two X registers identified by j and k, and return a result to the X register identified by i. Operands are transferred between X registers and Central Storage by instructions which load an address into registers A1-A5 (thus causing the corresponding operand to be read from that address in store and loaded into X1-X5), or into registers A6 or A7 (thus causing the contents of X6 or X7 to be transferred to that address in store). The contents of an A register can also be indexed by the contents of a selected B register, and the B registers can also be used to hold fixed-point integers, floating-point exponents, shift counts, etc. The 30-bit instruction format (identified by particular combinations of F and m, as are the register groups) is used where a literal (immediate) operand is to be used in an operation involving an A or B register, or in control transfer (branch) instructions, where the value in the K field is used as the jump-to, or destination, address.

The issuing of instructions to the functional units and the subsequent updating of result registers is controlled by the Scoreboard), which takes instructions in sequence from the Instruction Stack (more details of which can be found under Instruction Buffers). This in turn receives instructions from Central Storage. The order code of the central processor does not include any peripheral handling instructions, since all peripheral operations are independently controlled by the ten Peripheral Processors, which organise data transfers between Central Storage and input/output and bulk storage devices attached to the peripheral channels. This relieves the central processor of the burden of dealing with input/output, a further contribution towards high performance.