They later extended this method to scalar multiprocessors [6], where the stride is not explicitly known, by using a stride prediction table (SPT). This method can significantly reduce the number of cache misses for numerically intensive programs that can be highly vectorized, with a low overhead. However, for other programs, especially those with a large number of stride changes, although the cache miss rate is still reduced, there is generally a more significant overhead, and Fu and Patel conclude that stride directed prefetching is not appropriate for such programs.
The required hardware consists of a support unit for a data cache whose design is based on the prediction of the execution of the instruction stream and associated operand references in load/store instructions. The latter, and their referencing patterns, are kept in a reference prediction table (RPT) ( see figure). "The RPT will be accessed ahead of the regular program counter (PC) by a look-ahead program counter (LA-PC). The LA-PC is incremented and maintained in the same fashion as the PC with the help of a dynamic branch prediction mechanism. The LA-PC/RPT combination is used to detect regular data accesses in the RPT and to generate prefetching requests." [2] The prefetched data blocks are put in the data cache. Note that the cache concerned may be blocking or non-blocking.
The idea is to keep enough distance between the PC and the LA-PC that the prefetched data arrives "just before" it is needed. The LA-PC initially points to the instruction following the current PC and moves ahead of the PC when the execution stalls on real misses. Incorrect branch predictions and other mechanisms stop the lookahead distance from growing too large. Since the supporting unit is not on the critical path, its presence should not increase the processor cycle time.
Chen and Baer claim that data pollution resulting from erroneous prefetches can be almost totally avoided by the fine tuning of the RPT finite state machines. However, this scheme relies heavily on good branch prediction to predict the lookahead stream.