Improving the performance of such a computing structure means increasing the throughput. One way to do this is to reduce the latency, usually by implementing F using faster logic circuits. However, there are limitations on the performance improvements attainable in this way, and so other techniques have been sought. If the function F consists of a sequence of sub-functions, for example a, b and c, such that F(xi)= a(b(c(xi))), then it is possible to implement F using pipelined logic. The decomposition of F into three sub-functions is illustrated in figure 1.
Each stage in the pipeline contains some function evaluation logic and between the stages there are latches for storing the intermediate values. Since the latency of each sub-function is less than the latency of F the rate at which new input values can be presented to each stage is greater than the rate at which they can be presented to a non-pipelined implementation of F. If a common clock is used to latch the intermediate values at all stages in the pipeline, and the logic at stage i has a latency of τ, then assuming a latch delay of φ, the clock period, τ, of a k-stage pipeline can be defined as
The activity within a pipeline can be represented pictorially using a space-time diagram, as shown in figure 2 where the space-time behaviour of a four-stage pipeline is depicted for a sequence of six input values x1 ... x6, From figure 2 it can be seen that for a k-stage pipeline a time equivalent to k clock periods elapses before the pipeline is full, after which one result appears at the output every clock period. Therefore, if a sequence of n data items is processed by a k-stage pipeline, it will take a time of kτ to produce the first result, and a time (n-1)τ to produce the remaining (n-1) results. Hence the total time to process n data items in a k-stage pipeline is
If we now make the assumption that a non-pipelined implementation of F has a latency of kτ, then the same n data items will take a time T1 = nkτ when processed on a non-pipelined processor, effectively a one-stage pipeline, which evaluates an equivalent function.
The speedup resulting from the use of a k-stage pipelined implementation is defined as Sk= T1/Tk, and this is given by
| Sk | = | T1 Tk | = | nk k + (n - 1) |
| lim n→∞ | T1 Tk | = | k |
| η | = | speedup number of stages | = | n k + (n - 1) |
| r | = | efficiency | x | peak processing rate | = | η τ |
When the number of data items being processed in a pipeline approaches infinity, η → 1 and r → r∞ = τ-1.
Given that a pipeline cannot be maintained in a continuously busy state, there are a number of conditions which the designer should attempt to satisfy in order to maintain throughput as far as possible. These are
In considering the design of real pipelines we shall see not only how the designers have attempted to satisfy these conditions, but also how a variety of problems arises to thwart their best intentions.