 
    
    
    
      
In this paper we have explored some of the issues surrounding compilation for decoupled supercomputers, illustrating some of the key optimizations with quantitative performance data from the Perfect Club. Many of these optimizations have already been implemented in an experimental compiler. The compiler is an interactive tool primarily to aid compiler development, but also as a visual aid for expert programmers. It produces annotated displays of the source code, prioritizing important synchronization events based on actual execution frequencies.
We have seen that the compiler optimizations of Hoisting, IF-conversion, and Inlining together produce significant improvements in decoupling efficiency across a wide range of programs from the Perfect Club. The two programs which steadfastly refuse to decouple (SPICE and TRACK), also perform badly on vector supercomputers. Bearing in mind that every operand obtained from memory requires a nominal latency of 150 cycles, it is remarkable indeed that so many programs have mean perceived latencies close to a single cycle.