Next: Contents
Compiling and Optimizing for Decoupled Architectures
Compiling and Optimizing for Decoupled Architectures
- Authors:
- Nigel Topham
- Department of Computer Science
The University of Edinburgh
Mayfield Road
Edinburgh, UK
e-mail: npt@dcs.ed.ac.uk
- Alasdair Rawsthorne
- Department of Computer Science
The University of Manchester
Oxford Road
Manchester
UK
e-mail: alasdair@cs.man.ac.uk
- Callum McLean
- Department of Computer Science
The University of Edinburgh
Mayfield Road
Edinburgh, UK
e-mail: callum@quadstone.co.uk
- Muriel Mewissen
- Department of Computer Science
The University of Edinburgh
Mayfield Road
Edinburgh, UK
e-mail: mu@quadstone.co.uk
- Peter Bird
- e-mail: plbird@aol.com
- Keywords:
- Decoupled architecture, Compiling, Performance, Benchmarks,
Optimization, Quantitative analysis.
Abstract
Decoupled architectures provide a key to the problem of
sustained supercomputer performance through their ability to
hide large memory latencies.
When a program executes in a decoupled mode the perceived
memory latency at the processor is zero; effectively the entire physical memory
has an access time equivalent to the processor's register file,
and latency is completely hidden. However, the asynchronous
functional units within a decoupled architecture must occasionally
synchronize, incurring a high penalty. The goal of compiling and
optimizing for decoupled architectures is to partition the
program between the asynchronous functional units in such a
way that latencies are hidden but synchronization events are
executed infrequently.
This paper describes a model for decoupled compilation, and
explains the effectiveness of compilation for decoupled systems.
A number of new compiler optimizations are introduced and
evaluated quantitatively using the Perfect Club scientific benchmarks.
We show that with a suitable repertiore of optimizations, it is possible
to hide large latencies most of the time for most
of the programs in the Perfect Club.
Copyright (C) ACM, 1995.