Next: Access Decoupling Up: Performance of the Decoupled Previous: Performance of the Decoupled

Introduction

[4]This research was supported by ESPRIT project P6253 and by UK EPSRC research grant number GR/K19723.

A computer's instruction set may be conceptually partitioned into subsets which may then be assigned to specialised hardware for execution. That these subsets are defined to maximise concurrence of hardware utilisation has meant that many different partitionings have been tried. Early high-performance computers such as the Cray-1 overlapped memory access and arithmetic operations to an extent, but the trend towards more explicit separation of the instruction sets has led to architectures such as PIPE [1], the Astronautics ZS-1 [2], and more recently the ACRI-1, which is the subject of this paper.

Smith [3] originally proposed a partitioning of memory access and execution instructions onto what he termed the A-processor and X-processor respectively. Wulf's WM machine [4] used four main processing units, three assigned to arithmetic operations and one to instruction fetch. When an architecture implements such processing units to operate cooperatively but essentially in independence of each other, it is termed decoupled. In order to meet this independence requirement the units act on separate instructions streams and do not, in general, stall when other units are stopped. This temporal independence is achieved using independent program counters and communications through queues. A simulation study of the ZS-1 is presented in [12].

The ZS-1, PIPE and Wulf's WM architecture are termed access decoupled. In such architectures, control transfers often require the synchronisation of the processing units. This occurs, for example, when a branch is dependent upon a comparison computed by the X-processor. A further architectural optimisation, termed control decoupling, is introduced in the Advanced Computer Research Institute's ACRI-1 architecture [5]. In a control decoupled architecture there are three independent units, responsible respectively for control, memory access and execution. The additional benefit of control decoupling is that the majority of control transfer decisions can be pre-computed, thus opening up many opportunities for preparing the access and execute pipelines for the operations which follow. Access and control decoupling are fully described elsewhere [5], however, a brief introduction to each is provided in sections 1.1 and 1.2.

Given the high degree of decoupling, and the asynchronous nature of the decoupled units in the ACRI-1 architecture, the degree to which real-world applications can exploit decoupling is of prime importance. In common with many recent architectural innovations, the performance of the architecture is greatly influenced by the capabilities of the compiler. In that sense our analysis in this paper should be seen as a combined evaluation of the architecture and a pre-production version of the compiler for this architecture (scf90 ). Our analysis consists of compiler-driven measurements and profile-driven modelling of the Perfect Club suite of scientific programs [6].

As fully functional hardware for the ACRI-1 system was not available at the time of writing, our performance results are derived from frequency-domain profiling of the test programs on another system, combined with event-domain profiling from the pre-production compiler. This provides us with the capability to determine the position of events of interest within the source code, and to match this with run-time frequency information. As frequency information is architecture independent, the actual machine on which this information is gathered does not influence the outcome. The event frequency measurements are presented in section 4. The dynamic event counts produced by this method then drive a simple linear model of execution time to produce bounds on execution times which are tight. These execution time bounds are presented in section 4.2.




Next: Access Decoupling Up: Performance of the Decoupled Previous: Performance of the Decoupled


ships@dcs.ed.ac.uk
Wed Mar 1 16:43:22 GMT 1995