Performance

Since the real work of the computer is carried out on floating-point numbers, with the fixed-point numbers being involved mainly in "house-keeping" operations, performance is often measured in terms of the rate of execution of floating-point operations, normally measured in millions of such operations per second (MFLOPS). In order to compare the architectures of the 6600 and 7600, however, it is more convenient to consider the number of floating-point operations which each can execute in one clock period (FLOPS/CLOCK). In the case of the 6600, floating-point addition/subtraction requires 4 clock periods, multiplication 10 and division 29. (Division may be ignored in this comparison; it always takes a long time, and computer designers have usually assumed that users will have the good sense to avoid using it as much as possible.) Thus, neglecting division, and allowing for the fact that there are two multiply units, the 6600 can perform 9 floating-point operations (5 additions and 4 multiplications) in 20 clock periods, corresponding to 0.45 FLOPS/CLOCK or 4.5 MFLOPS (see Figure), with all three of the corresponding functional units being fully occupied. Since the Scoreboard is capable of issuing up to 20 instructions in this time, the right mix of instructions would allow this rate to be achieved. This rate also corresponds to the sum of the maximum rates of execution of long sequences of additions (0.25 FLOPS/CLOCK) and multiplications (0.2 FLOPS/CLOCK) occurring separately.

Maintaining these rates requires that the appropriate operands be available in the X registers, of course. Where these have first to be fetched from store, additional delays are incurred. However, the separation of the operand accessing and function execution facilities in the instruction set allows the possibility, at least, of programs (or compilers) organising appropriate pre-fetching of operands.

In the 7600 an individual floating-point addition or subtraction still takes 4 clock periods, as in the 6600 but, because the units are pipelined, the maximum execution rate for these operations is 1 per clock period (1 FLOP/CLOCK). Individual floating-point multiplications take 5 clock periods, but as we have seen, the multiply unit is also pipelined and can perform multiplications at the rate of 1 every 2 clock periods (0.5 FLOPS/CLOCK).

The sum of these two rates produces a total maximum execution rate of 1.5 FLOPS/CLOCK. This rate cannot be achieved, however, since instructions can only be issued, and results entered into the X registers, at a rate of 1 per clock period at most. Thus the maximum floating-point execution rate is 1 FLOP/CLOCK, equivalent to 36.4 MFLOPS and made up, for example, of a sequence of additions (in which case the add unit is fully occupied and the multiply unit idle) or a sequence of alternate multiplications and additions (in which case the multiply unit is fully occupied and the add unit 50 per cent occupied). Even sustaining this rate for any length of time is virtually impossible, of course, since it does not allow for the execution of other instructions such as operand accesses and control transfers. However, being able to sustain the maximum rates for addition and/or multiplication for any period of time gives a performance bonus over the 6600 additional to the improvement in clock rate, and CDC claim that the overall performance of the 7600 is 15 million instructions per second.

The first CDC 7600 was delivered in 1969. By the mid 1970s technological advances offered the possibility of increasing the clock rate by a factor of about two, but in seeking to provide an increase in performance over that of the 7600 comparable with that which the 7600 had offered over the 6600, the designers (principally Seymour Cray) were faced with the problem of overcoming the instruction issuing bottleneck in the 7600 design. The solution was found in vector processing, and the architecture which resulted appeared commercially as the CRAY-1. This machine and its successors are described under Vector Processing.