Standard dependence analysis is not capable of detecting a sufficient amount of parallelism in general purpose applications. The main reason for this are the limitations of static analysis - it is provably undecidable. But even when augmented with dynamic information about dependences, a parallelising compiler is not able significantly increase the performance of the majority of programs.
The problem with dependence analysis as a tool to discover parallelism is that not all dependences need to be honoured when translating a sequential program to a parallel form. Indeed, some data dependencies are an artefact of the implementation rather than essential for the underlying algorithm.
Relaxing the strict requirement for honouring all dependencies and checking instead for the necessary condition of commutativity allows a compiler to discover parallelisation opportunities that remain hidden otherwise. The following poster presents the methodology behind commutativity testing.
Q: Why do you focus on loops?
A: By Amdahl's law, one should focus on optimising the parts of a program that represent a large part of the program's run-time, or otherwise the impact would be limited. Any program that runs for a significant amount of time must be running the same part of its code repeatedly. This is usually achieved either by using loops or recursion. We have focused on the former as it is the more conventional way of expressing programs' structure in low-level languages, but our technique can easily be extended to recursion. By optimising a loop that takes 90% of the runtime, one can achieve a speedup of up to 10x, while the cost of analysing and optimising such a loop might be relatively small, compared to the cost of analysing and trying to optimise the whole program.
Q: Doesn't your search space explode very fast even for simple programs?
A: Correct! It does. But we are not really trying all permutations. We are testing a subset, say a 1000 different shuffles of the input array, and the assumption we hinge on is this: if the result is always the same, either the program (loop) is commutative for that input, or the program is so contrived, that a programmer would easily realise it is not parallel by looking at the code. And looking at the code he/she would have to do, since our system would never give a definitive answer, but only suggest regions that it has tested for and found that they probably commute.
Q: This looks like a good idea, but doesn't it take a really long time to perform this analysis in reality?
A: It is true that our approach takes ages compared to traditional compilation, but our baseline is different here. Imagine a legacy program with 1M lines of code that you want to parallelise and run on a modern system. You would have to go through the whole thing and inspect each of its functions and loops, trying to figure out which of them can benefit from parallelism. It is going to take a considerate amount of time and effort. Our system allows you to pin-point only a few places in code that are promising to be parallel and which seem to consume a significant part of the runtime. Then even if it takes it a day to perform its analysis, it is a significant improvement over the weeks or months it's going to take an expert to go through the whole program.
Q: Your approach seems to depend a lot on the input it uses for testing. Wouldn't this mean it will produce the wrong result, if your input was not representative enough?
A: Correct! It might, and this is also a reason to hand over the final decision to the expert developer who is using the system so that he/she can verify that code that has tested positive for commutativity can in fact be ran in parallel. This might seem like a serious limitation, but just like in the previous questions, the system is more similar to a debugger than a compiler in the sense that it is automating a complicated test, not drawing conclusions from it.
Q: What if it turns out that for some inputs the program is commutative and for others it isn't?
A: This is part of the output: what percentage of the inputs resulted in a commutative operation and what percentage did not. In either extreme the result is clear - the operation is either probably commutative, or definitely not commutative. However, if the result is in between, then the program might benefit by duplicating the tested code and running a parallel version depending on features of the input data. We haven't implemented input dependent code specialisation in our prototype yet, but it is something we have thought about and is part of our future research.