The execution time for program
on the ACRI-1 system is modelled
relatively accurately by equation 1,
where is the number of execution cycles of the DU
,
is the number of LODs executed,
is the nominal penalty induced by
each LOD, and
is the clock period of the machine.
Without actually executing a program is it hard to predict
values for
, but it is possible to define a lower bound
as
, where
is the number of floating point operations
executed by program
. This must be an absolute minimum execution time for
any processor with two floating point pipelines. This permits us to define
execution time as follows:
In fact, in the ACRI-1 architecture only pipeline startup and shutdown delays
on the DU will introduce any discrepancy between and
. As the DU contains hardware support for modulo-scheduled
software pipelining, we expect startup and shutdown costs to be relatively
low, though it is still too early to present definite figures.
Figure 2 compares the lower bound on ACRI-1 cycles to completion,
for each program in the Perfect Club, with the measured cycles to
completion on the Cray Y-MP C90 (from [11]).
Figure 3 illustrates how the lower bound on cycles to
completion translates into MFLOPS execution rates for each of the
Perfect Club programs, given a target clock period of 6ns.