NAIS Workshop on Skeletons, Heterogeneous Systems and Domain Specific Optimization

NAIS Workshop on Skeletons, Heterogeneous Systems
and Domain Specific Optimization

School of Informatics, University of Edinburgh

Friday 20th April 2012

Overview

The arrival of highly parallel, heterogeneous hardware in the computing mainstream challenges conventional programming models. This workshop brings together a series of talks on projects which seek to address the issues, focusing in particular on the synthesis between pattern and domain oriented programming abstractions and the ability to autotune code for performance portability.

The workshop is funded by the Centre for Numerical Algorithms and Intelligent Software (NAIS).

Talks (Abstracts)

PATUS: A Code Generation and Auto-Tuning Framework for Stencil Kernels on Modern Microarchitectures
Matthias Christen (Universität Basel)

A Skeleton Library for Heterogeneous Multi-/Many-Core Systems
Michel Steuwer, Philipp Kegel, Sergei Gorlatch (Universität Münster)

Delite: A Framework For High Performance Embedded Domain-Specific Languages
HyoukJoong Lee, Arvind Sujeeth, Kevin Brown, Kunle Olukotun (Stanford University)

The Paraphrase Project
Horacio Gonzalez-Velez (Robert Gordon University) Chris Brown (University of St Andrews)

Skeletons and Autotuning at Edinburgh
Murray Cole, Chris Fensch, Alex Collins, Fabricio Goes, Zoe Leiper, Thibaut Lutz, Siddharth Mohanty (University of Edinburgh)

Logistics

Programme

Registration is free, and lunch, tea and coffee will be provided for participants. To register, please email Murray Cole.

Location/Travel The workshop takes place in room G07 of the Informatics Forum, at the University of Edinburgh.

Abstracts

PATUS: A Code Generation and Auto-Tuning Framework for Stencil Kernels on Modern Microarchitectures
Matthias Christen (Universität Basel)

PATUS is a code generation and auto-tuning framework for the class of stencil computations targeted at modern multi- and many-core processors, such as multicore CPUs and graphics processing units. The ultimate goals of the framework are productivity, portability (of both the code and performance), and achieving a high performance on the target platform. The key ingredients to achieve the goals of productivity, portability, and performance are domain specific languages (DSLs) and the auto-tuning methodology. The PATUS stencil specification DSL allows the programmer to express a stencil computation in a concise way independently of hardware architecture-specific details. Thus, it increases the programmer productivity by disburdening her or him of low level programming model issues and of manually applying hardware platform-specific code optimization techniques. The use of domain specific languages also implies code reusability: once implemented, the same stencil specification can be reused on different hardware platforms, i.e., the specification code is portable across hardware architectures. Constructing the language to be geared towards a special purpose makes it amenable to more aggressive optimizations and therefore to potentially higher performance. Auto-tuning provides performance and performance portability by automated adaptation of implementation-specific parameters to the characteristics of the hardware on which the code will run. By automating the process of parameter tuning, the system can also be used more productively than if the programmer had to fine-tune the code manually.

A Skeleton Library for Heterogeneous Multi-/Many-Core Systems
Michel Steuwer, Philipp Kegel, Sergei Gorlatch (Universität Münster)

Modern parallel systems become increasingly heterogeneous. Even desktop PCs usually comprise GPUs (Graphics Processing Units) and multi-core CPUs. Application programming for such parallel heterogeneous systems is complex and error-prone, because no suitable high-level models exist. Existing programming models either: - target only a homogeneous subset of the available compute devices, e.g., only multi-core CPUs or GPUs but not both, - or target all available compute devices, but are intrinsically low-level, like OpenCL. Using multiple compute devices at the same time, e.g., multiple GPUs, brings additional challenges like work- and data-distribution while the state-of-the-art programming models like OpenCL provide no special assistance for this. In this talk, we present the SkelCL library being developed in Muenster, which aims to replace the low-level approaches. The use of high-level algorithmic skeletons greatly simplifies programming for systems comprising multiple compute devices. SkelCL is based on OpenCL and allows for ad-hoc parallelism as its skeletons can be mix with low-level OpenCL code. SkelCL also provides an abstract vector data type and a high-level data (re)distribution mechanism to free the programmer from organizing the low-level data transfers between a system's main memory and multiple compute devices. We will describe some implementation details and in particular we will discuss the problem of wrapping of OpenCL into skeletons as OpenCL uses compilation at runtime. Finally, we will give a brief overview of our dOpenCL project. In this project, we develop a middleware that is used together with SkelCL in order to extend it to distributed systems. We use a real-world application study from the area of medical imaging to demonstrate the reduced programming effort and competitive performance of SkelCL as compared to OpenCL and CUDA. Besides, we illustrate how SkelCL adapts to large-scale, distributed heterogeneous systems in order to simplify their programming.

Delite: A Framework For High Performance Embedded Domain-Specific Languages
HyoukJoong Lee, Arvind Sujeeth, Kevin Brown, Kunle Olukotun (Stanford University)

Fully utilizing heterogeneous systems has been a challenging problem for application programmers, especially with an ever-increasing number of architecture-specific programming models (e.g., Pthreads, OpenCL, CUDA). Domain-specific languages (DSLs) are a potential solution to this problem, as they can provide productivity, performance, and portability within the confines of a specific domain. However, the cost of developing such DSLs needs to be lowered to make the DSL approach useful on a large scale. We implemented the Delite compiler framework, a reusable compiler infrastructure, for the rapid development of performance-oriented DSLs. Using the concept of a multi-view intermediate representation (IR), the Delite framework provides static optimizations and code generation for heterogeneous hardware, and therefore DSL developers can easily implement DSL operations by extending the framework. We also implemented the Delite runtime that automatically schedules and executes DSL operations on heterogeneous hardware. In this talk, we will present the internals of the framework and show how DSLs can extend and interoperate with one another. We will also explain how the framework and runtime efficiently targets heterogeneous hardware, and walk through the process of implementing a new DSL as a DSL developer.

The Paraphrase Project
Horacio Gonzalez-Velez (Robert Gordon University) Chris Brown (University of St Andrews)

The ParaPhrase project aims to produce a new structured design and implementation process for heterogeneous parallel architectures, where developers exploit a variety of parallel patterns to develop component based applications that can be mapped to the available hardware resources, and which may then be dynamically re-mapped to meet application needs and hardware availability. Key features are sustainable parallel computing through enhanced programmability and lower power consumption, cost reduction in programmability and implementation of parallel systems, and better resource utilisation of parallel heterogeneous CPU/GPU architectures. This work will enable major progress to be made in programming both current and future (parallel) computer systems. Using ParaPhrase technologies, we anticipate that we will be able to achieve significant parallel speedups for realistic applications and that these results will scale with larger systems.

Skeletons and Autotuning at Edinburgh
Murray Cole, Chris Fensch, Alex Collins, Fabricio Goes, Zoe Leiper, Thibaut Lutz, Siddharth Mohanty (University of Edinburgh)

We present an overview of recent and ongoing projects within the skeletons group at Edinburgh. These include work on transactional worklists, wavefronts, stencils, mapreduce, divide-and-conquer and image processing skeleton hierarchies.