A.3 Potential for improved decoupling

Next: A.4 Robust hoisting Up: A. Dominant LOD points Previous: A.2 Example of inline

A.3 Potential for improved decoupling

Figure 13 below illustrates where improvements to the decoupling algorithm could be applied. The computation of YR at the LOD point is occuring on the CU, but as there is no reaching use of YR within the enclosing loop (except the use of YR within that scalar recurrence) the CU need not compute YR. However, as it is needed on the CU later on in that routine, the decoupling algorithm determines that the CU should compute it, and this causes a transfer from the DU. If the transfer is delayed until after the enclosing loop has completed, then the frequency of transfers (and hence LODs) will be minimized.

Figure 13: An LOD from DYFSEM which can be propagated to the enclosing loop nest by improved decoupling (click on image to view at full scale)

npt@dcs.ed.ac.uk