Chaining

This section is easier to follow if you've read about Chaining in the Cray-1 first

Apart from the use of multiple memory ports and a doubling of the instruction buffer size, the architecture of each processor in a CRAY X-MP system is basically the same as that of the CRAY-1. The processor design is nevertheless quite different at the implementation level. The use of the same 16-gate ECL gate arrays as those used in the CRAY-2, and improved packaging density over the CRAY-1, mean that four X-MP processors occupy the same physical space as a single CRAY-1. Increased packaging density implies shorter distances between components, of course, leading to the possibility of a reduced clock period. In early versions of the X-MP the common clock, which provides synchronous control for all the processors in an X-MP system, had a period of 9.5 ns (compared with the 12.5 ns of the CRAY-1) and now has an 8.5 ns period. The performance of the X-MP processor is also improved by an important change in the detail of the implementation of chaining.

In the CRAY-1 two instructions can be chained if the second requires the result of the first as one of its inputs and if it is ready to start at the moment the first element of the first instruction appears out of the relevant arithmetic pipeline, the chain slot time. If chain slot time is missed, the second instruction must wait until all the elements of the first instruction have been written into the result vector register. This is because only one vector element can be written into, or read from, a vector register in one clock period. In the CRAY X-MP one element can be written into, and another element read from the same vector register simultaneously. Thus if the second instruction is not ready to start at chain slot time, but becomes ready to start shortly thereafter, it does not have to wait for the first instruction to complete, but can start straightaway.