@SkeLCore: Algorithmic Skeletons for Reconfigurable Computing
Carlos Acosta-Leon
Dr Murray Cole (Supervisor)
Institute for Computing Systems Architecture
University of Edinburgh
Introduction.
The fixed hardware, sequential control and general-purpose instructions set architecture of conventional microprocessors have affected absolute performance. Although generic programmability has allowed some degree of flexibility, microprocessors are still unsuitable for high performance applications.
In contrast, Field-Programmable Gate Arrays or FPGAs allow end-users to customize algorithms on hardware to realise specialised tasks with high performance. FPGAs are reusable VLSI templates, made up of a regular computation and interconnection network whose functionality is reprogrammable by software. Additionally, they support embedded complex components such as µP and µC cores, DSPs, high-speed I/O interfaces, RAM memory, etc.
This complexity, along with inherent parallelism, dynamically changing functionality, and coupling the work between FPGA and microprocessor makes a powerful Reconfigurable Computing System.
However, this brings a high programming complexity that must be managed by using suitable design techniques at the highest possible abstraction level.
This involves the design of new abstraction techniques. A common approach to general-purpose parallel computation is based on packaging complex operations as templates, parallel patterns or algorithmic skeletons, which encapsulate the necessary control and data flow. This allows software to be written in a way that is independent of particular architectures and hence portable.
We seek to address the challenge of providing programming models on FPGA and support our thesis that a skeletal programming methodology, borrowed and extended from work in the field of parallel programming, can offer benefits to the programmer of reconfigurable systems. We will pursue this research in the context of an application suite drawn from the image processing domain. These can be provided though sequential programming interfaces to parallel (and dynamically reconfigured) skeletons and used as a tool for exploring FPGA/microprocessor design spaces.
Overview:
Algorithmic Skeletons:
The concept of the algorithmic skeleton was originally formulated in the late
1980’s in a PhD thesis by Murray Cole [1] conducted at the University of Edinburgh under the supervision of Gordon Brebner, now of Xilinx Labs. Seeking to address the complexities of parallel programming, it observes that many parallel algorithms can be characterised and classified by their adherence to one or more of a number of generic patterns of computation and interaction. Skeletal programming proposes that such patterns be abstracted and provided as a programmer’s toolkit, with specifications which transcend architectural variations but implementations which recognise these to enhance performance. In this way, it promises to address many of the traditional issues within the parallel software engineering process:
- it will simplify programming by raising the level of abstraction;
- it will enhance portability and re-use by absolving the programmer of responsibility for detailed realisation of the underlying patterns;
- it will improve performance by providing access to carefully optimised, architecture specific implementations of the patterns;
- it will offer scope for static and dynamic optimisation, by explicitly documenting information on algorithmic structure (e.g. sharing and dependencies) which would often be impossible to extract from equivalent unstructured programs.
Cole has continued to research in the area, which now boasts a healthy number of interested and related research groups [2]. An interesting divergence has emerged between systems which attempt to provide generic skeletons, and those which focus in skeletons in an application specific area (for example, image processing).
Skeletons and Reconfigurable Computing:
Field-Programmable Gate Arrays or FPGAs allow end-users to customize algorithms on hardware to realise specialised tasks with high performance. FPGAs are reusable VLSI templates, made up of a regular computation and interconnection network whose functionality is reprogrammable by software. Currently, they also support embedded complex components such as µP and µC cores, DSPs, high-speed I/O interfaces, RAM memory, etc [see Figure 0]. But, this involves a high programming complexity that must be managed by using suitable design techniques at the highest possible abstraction level.
Figure 1. FPGA template
Our project is motivated by the following observations on the analogy between the challenges of traditional parallel and contemporary reconfigurable programming:
- Correct expression of complex concurrent activities and their interaction is difficult, labourintensive and error-prone;
- Overhead resource use must be kept to a minimum, since high absolute performance is a fundamental goal;
- Many existing hand-coded applications conform, more or less, to a number of standard patterns.
We hope that it will be possible to support the programmer of reconfigurable systems by providing a skeletal methodology (and of course, library) in this new context. In addition, we note that reconfigurable computing adds (at least) two new dimensions: reconfiguration itself and the choice between hard and soft implementation platforms [3].
The opportunities afforded by dynamic reconfiguration are well documented [4], as are the extra complexities presented to the programmer. It is our intention to explore the extent to which reconfiguration can also be encompassed by the skeletal methodology. We observe that, in the abstract, reconfiguration is simply another implementation option, with its own costs and benefits.
We expect that it will be possible to harness this, from within our skeletons, just as we already harness parallelism. As with skeletal parallelism, the exploitation of skeletal reconfiguration will be guided by analysis of the customising code, selecting from a variety of implementation patterns
according to the requirements of each application.
Similarly, the programmer must decide which parts of an overall system to implement in reconfigurable logic and which to implement in traditional software, whether with an on-chip core or traditional co-processor. Again, we hope to provide assistance from within our skeleton library, by embedding good options, and an understanding of the grounds upon which they must be distinguished, within the skeleton library.
The aSkeLCore Project:
Framework.
A full investigation of the area outlined above would be the subject of a major programme of research. It is our intention to begin in a relatively modest way, with a PhD project which is called "aSkeLCore: Algorithmic Skeletons for Reconfigurable Computing". Carlos Acosta has made preliminary progress in this direction, during the first year of his PhD under the
supervision of Murray Cole. Acosta’s aims are to explore the thesis that a skeletal programming methodology can be beneficial to the programming of reconfigurable systems. This will be achieved by the design, implementation and evaluation of a small suite of skeletons with the following characteristics [see Figure 1]:
- Programmed in Handel-C, with an API which allows programmers to exploit one or more skeletons by providing self-contained code, customising each skeleton to the application;
- internal implementations of skeletons which exploit parallelism and reconfiguration transparently;
- A focus on image processing applications as a test application suite;
- Experiments comparing resulting programs with conventional directly programmed alternatives on objective grounds of performance and resource utilisation, and more subjectively, on grounds of programmability and maintainability.
Figure 2. Architecture of the proposed Framework
Preliminary has been undertaken on a skeleton for pipelined application structures. This has been conducted entirely within SystemC, and tested by simulation. For the on-board working prototype we plan to move to Handel-C since is better supported and provides a route to the timing control we need for dynamic reconfiguration. Figure 1 illustrates the structure of our planned experimental system.
In Figure 2 we illustrate our generic skeletal programming methodology. Here the programmer is responsible for using skeletons (1) and providing application specific customisations. The skeleton library (2) is the repository of skeleton implementations. We use a high level language for FPGA programming. Its compiler is standard, with no understanding of skeletons. The FPGA design suite will take (3) decisions to select between competing exploitations of parallelism/reconfiguration and about the mapping to FPGA on-chip core or traditional co-processor as generated from the library code.
Figure 3. Architecture of the proposed Framework
Application example.
As our application area we have chosen image processing (double Thresholding edge detection algorithm, as shown in Figure 3) because this kind of application has the followings characteristics:
- Compositional development and generalization for testing different image algorithms in each stage,
- Parallelism as Pipeline and Tasks Farm can be used, and
- Different levels of granularity and possibility of exploiting polymorphism.
Figure 4. Generic Methodology for the Skeletons Framework
Conclusions and further work:
In line with the usual plan for PhD study, the practical aspects of the work are currently at a preliminary stage. Exploratory development of the pipeline skeleton is being undertaken in SystemC, with testing by simulation using OSCI SystemC 2.1 v1.
We will provide the reconfigurable computing systems programmer with support from within our skeleton library, by embedding good options, to take advantage of reconfiguration, parallelism and decide which parts of an overall system to implement in reconfigurable logic and which to implement in traditional software.
So far our current work has focused on a skeleton for pipelined processing, expressed in SystemC RTL for simulation on computer or synthesis on hardware. For future work we will use a real FPGA design suite, Celoxica DK4/Handel-C and Xilinx ISE 7.1i on a Virtex-II Pro or RC10 FPGA board, and will integrate SystemC and Handel-C hardware/software processes.
References:
[1] Murray Cole, “Bringing Skeletons out of the Closet: A Pragmatic Manifesto for Skeletal Parallel Programming”. Parallel Computing, Vol. 30, Issue 3, March 2004, Pages 389-406.
[2] M. Cole, The Skeletal Parallelism Homepage, http://www.inf.ed.ac.uk/home/mic/Skeletons.
[3] Clive Maxfield, “The Design Warrior’s Guide to FPGAs: Devices, Tools and Flows”. Elsevier, 2004.
[4] Maya B. Gokhale & Paul S. Graham, “Reconfigurable Computing: Accelerating Computation with FPGA”. Springer 2005.
[Return to top]
[Return to Homepage]