Comparing Prefetching Schemes

Hardware-based prefetching requires some support connected to the cache but little modification to the processor. Its main advantage is that prefetches are handled dynamically at runtime without compiler intervention. The drawbacks are that extra hardware resources are needed, that memory references for complex access patterns are difficult to predict and that it tends to have a more negative effect on memory traffic.

In contrast, software-directed approaches require little hardware support. They rely on compiler technology to perform static program analysis and to selectively insert prefetch instructions. Because of this, they are less likely to prefetch unnecessary data. The disadvantages are that there is some non-negligible overhead due to the extra prefetch instructions and that some useful prefetching cannot be uncovered at runtime.

Hardware prefetching schemes perform best for programs in which most references are regular and sequential. Software prefetching can be more flexible, it can deal with programs with complicated but well-organised data structures.

Chen and Baer compare Mowry et al.'s software approach with their own hardware approach in a multiprocessor environment [4], and their results confirm the above observations. They also propose a prefetching scheme that combines software and hardware techniques, having (hopefully) the advantages of both and the drawback of neither. This is shown to outperform both of the above approaches, and is perhaps the next step in cache prefetching.