Cache Prefetching

Adapted from an MSc assignment written by Helen Berrington, January 1998

Cache prefetching is a technique used to improve cache performance, i.e. to increase the cache hit ratio. Caches may be either lockup-free (non-blocking) or blocking. For a blocking cache, when a cache miss occurs, the processor stalls until the required data has arrived. However a lockup-free cache allows execution to proceed concurrently with cache misses to the extent allowed by the data dependencies. Cache prefetching (or non-binding prefetching) aims at reducing the basic cache miss rate by fetching the data from remote memory into the cache before it is required by the processor.

Cache prefetching may be software-controlled [9, 11, 7, 8], where prefetch instructions are inserted into the code by the compiler, hardware-controlled [6, 2, 3], where data is automatically fetched by the hardware or a combination of the two [4].

Software-controlled prefetching requires little hardware support: usually a non-blocking cache and a processor capable of issuing prefetch commands. Hardware-based prefetching on the other hand may require more complex hardware support: usually a history table and sometimes a branch prediction table as well.

Cache prefetching does not guarantee improvement in processor performance, particularly if the memory latency is small. Overheads due to increased cache interference and increased memory traffic can seriously affect processor performance, especially for multiprocessors. In multi-processors the prefetched data is still accessible to the cache coherency protocol with nonbinding prefetching (with binding prefetching this is not the case).

Both the software and the hardware schemes have their strengths and weaknesses, and perhaps the best solution is a combination of the two.