4.5 Speculative loop dispatch

Next: 5. Conclusions Up: 4. Idiomatic Transformations Previous: 4.4 Subroutine inlining

4.5 Speculative loop dispatch

Loops with an indeterminate loop count, and DO loops with premature exits, cause LOD penalties which are rather different examples of the general Control Transfer LOD.

From the point of view of decoupling, it is instructive to differentiate WHILE loops from loops with a premature exit. In a WHILE loop, the only event causing the loop to be terminated is a loop-variant test, whereas a DO loop with premature exit may terminate either because its loop count is exhausted, or because the termination condition is met. An example of a WHILE loop is shown below:

 10   CONTINUE
        A(I) = A(I) * FACTOR
        DIF = A(I) - EXACT
        IF (ABS(DIF) .GT. EPSILON) GOTO 10
 20   CONTINUE

In the former case, a naive compiler would insert an LOD event on every loop iteration, to enable or suppress the execution of the next iteration on the AU, depending on the evaluation of the continuation condition from the current iteration within the DU. This is undesirable, and a better strategy is to allow the AU to execute speculatively, ahead of the DU, possibly by many iterations. This avoids the loss of decoupling on every iteration, but means that the AU will have over-speculated when the terminating condition finally arrives from the DU.

This optimization is possible if the architecture is capable of discarding queue entries in both the Load Data Queue and the Store Address Queue. When the loop has terminated, a single LOD penalty is incurred, since the AU must be restarted at the instructions following the loop that has just terminated.

The situation is more complicated in the case of a DO loop with a premature exit: consider the loop below.

      DO 10 I = 1, 20
         C(I) = A(I)*B(I)
         IF (C(I) .LT. 0.0) GOTO 20
 10   CONTINUE
 20   CONTINUE

As in the previous example, a naive compilation algorithm will impose an LOD penalty on every iteration, as it prevents the AU from issuing iteration i+1 until the DU has evaluated the condition in iteration i. An optimization would be to issue all iterations on the AU, and arrange to discard the side-effects of the unwanted iterations when the exit condition evaluates true.

In this case, three possibilities arise: the exit from the loop may take place on an early iteration, in which case the AU may be executing a later iteration. This is similar to the situation arising when a WHILE loop is terminated, described above. A second possibility is that the loop is terminated by loop count expiry, that is the GOTO out of the loop is never activated. In this case, the AU can proceed to further code with no delay, and no LOD penalty is required. The most awkward case occurs when the loop termination condition evaluates true during a late iteration of the loop, at which time the AU may have finished all of the intended iterations. If the AU has proceeded to further instructions, subsequent to the loop, the operand queues may contain some items belonging to loop iterations that are to be suppressed, followed by some live values that must be retained for the DU to use in code following the loop. This implies that the architecture must provide a selective flush capability in the operand queues, and that an architecture without such a capability must not permit the AU to proceed past the final loop iteration until the DU has determined whether any operands must be flushed. This further implies that an architecture without a selective flush feature must accept a LOD penalty following every loop that contains a premature exit.

Next: 5. Conclusions Up: 4. Idiomatic Transformations Previous: 4.4 Subroutine inlining

npt@dcs.ed.ac.uk