Perfect Club Performance

Next: Conclusions Up: Perfect Club LODs Previous: Optimizing LOD frequency

Perfect Club Performance

The execution time for program on the ACRI-1 system is modelled relatively accurately by equation 1,

where is the number of execution cycles of the DU, is the number of LODs executed, is the nominal penalty induced by each LOD, and is the clock period of the machine. Without actually executing a program is it hard to predict values for , but it is possible to define a lower bound as , where is the number of floating point operations executed by program . This must be an absolute minimum execution time for any processor with two floating point pipelines. This permits us to define execution time as follows:

In fact, in the ACRI-1 architecture only pipeline startup and shutdown delays on the DU will introduce any discrepancy between and . As the DU contains hardware support for modulo-scheduled software pipelining, we expect startup and shutdown costs to be relatively low, though it is still too early to present definite figures. Figure 2 compares the lower bound on ACRI-1 cycles to completion, for each program in the Perfect Club, with the measured cycles to completion on the Cray Y-MP C90 (from [11]).

Figure 3 illustrates how the lower bound on cycles to completion translates into MFLOPS execution rates for each of the Perfect Club programs, given a target clock period of 6ns.

ships@dcs.ed.ac.uk
Wed Mar 1 16:43:22 GMT 1995