4.4 Subroutine inlining

Next: 4.5 Speculative loop dispatch Up: 4. Idiomatic Transformations Previous: 4.3 IF-conversion

4.4 Subroutine inlining

Given the large number of C-LODs and R-LODs in some programs, a natural transformation for reducing overall LOD counts is subroutine inlining. In an attempt to examine the effectiveness of inlining, those programs with large numbers of C-LODs and R-LODs were inlined using KAP. The transformed programs were analyzed by the experimental compiler, and the results are shown in table 4. The improvement in decoupling efficiency is illustrated graphically in figure 10.

Table 4: Optimized LOD frequencies after Inlining

We see that significant overall reductions in LODs are possible using this transformation, but we also note that MG3D and QCD still have a significant fraction of their LODs due to calls and return after inlining. This is because some routines cannot be inlined, either because they contain an ENTRY statement or a SAVE statement. Inter-procedural analysis could be particularly useful in such circumstances.

Figure 10: Graph showing effect of inlining on decoupling efficiency

Figure 11 compares the cycles to completion of the Perfect Club programs on a Cray C90 [14] with the lower bound on cycles to completion on a decoupled architecture.

Cycles to completion is sensitive to the number of floating point pipelines. In this comparison the C90 has four vector pipelines and the decoupled (ACRI-1) architecture has two floating point DU pipelines. This therefore requires careful interpretation. For reference we also show the minimum theoretical execution time for a machine with two floating point pipelines.

Figure 11: Graph comparing C90 execution time with decoupled execution time

npt@dcs.ed.ac.uk