Where accessing patterns involving non-unary increments are required, performance is reduced because the vector arithmetic instruction must be preceded by a gather instruction or succeeded by a scatter instruction. These instructions themselves run at less than the full arithmetic rate and have been measured to proceed at an average rate of 40 million operations per second. This performance is an average of many operations running at a rate of 50 million per second with occasional periods of 12.5 million per second.
Perhaps a more serious criticism of the CYBER 205 from the point of view of performance on general problems is the long vector start-up time. Thus whereas an execution rate of 100 MFLOPS for 64-bit numbers can in principle be obtained with a two-pipeline configuration, this can only be achieved in practice when very long vectors are used. As the length of the vectors being used becomes shorter, the start-up time has a progressively more serious effect, and Hockney and Jesshope [1] have used the vector length at which performance is halved as a measure of vector efficiency (see under Performance of Vector Processors).
For the CYBER 205 the nominal start-up time is 1 microsec, and for a nominal result rate of 100 MFLOPS this length is clearly 100. For the CRAY-1 Hockney and Jesshope quote values in the range 10 to 20, though even lower values are possible. However, the CYBER 205 was intended to be used for problems involving long vectors, and part of the start-up time can be eliminated for successive vector operations since the processing of one instruction can often begin before the last few elements of the previous instruction have been fully processed. Any attempt at a generalised comparison between the CYBER 205 and the CRAY-1 is largely irrelevant since the performance of each is critically dependent both on the application being run and on the way it is mapped on to the hardware