ADAPT'13
The 3rd International Workshop on Adaptive Self-tuning Computing Systems
January 22nd, 2013, Berlin, Germany
(co-located with HiPEAC 2013)
January 22nd, 2013, Berlin, Germany
(co-located with HiPEAC 2013)
sponsored by
ADAPT'13 Program
This is the full program (acceptance rate 54%). All the papers can be downloaded from the ACM digital library.Click on the title to see the abstract or expand/hide all.
Congratulations to the two winners of the Nvidia best paper/presentation award.
Session 1: 14:00-14:55
14:00 ▶ Keynote Talk: Autotuning Recursive Functions (sponsored by Microsoft Research)Markus Pueschel (ETHZ, Switzerland)
▶ slides
Automatic performance tuning often involves search in a set of alternative implementations to find the fastest. The search space is usually huge and hence the search is costly. This may be bearable in offline tuning (e.g., ATLAS) that is performed during installation but becomes cumbersome in online tuning (e.g., FFTW) that is performed at runtime since the input size is required. We argue that machine learning can solve this problem and should be added to the portfolio of performance tuning tools. As examples we show two machine learning techniques, one known and one novel, applied to tuning in recursive search spaces.
Session 2: 14:55-16:00
14:55 ▶ Application-Level Voltage and Frequency Tuning Of Multi-Phase Program on the SCC
With the technology advancement, we are quickly progressing towards the many-core era. Corresponding to this shift, techniques on how we program such chip are beginning to change. The Single-Chip Cloud Computer (SCC) is an experimental processor created by Intel Labs. When programmer is given direct control over the frequency and voltage of the cores, ideally we want to identify the phases of the program based on their computation intensity and associate frequency and voltage configuration correspondingly. In order to achieve power and energy saving in this way, however, we need to search through the entire domain of various voltage and frequency combinations supported by the chip, which is a daunting task. In this study, we propose to employ two popular optimization algorithms, i.e., Differential Evolution and Nelder-Mead Simplex, to help identifying the best configuration corresponding to various metrics, i.e., execution time, power, energy, and energy-delay product (EDP). Our experimental evaluation shows that, with a large search space of possible combinations, we can identify the configuration that provides the best result for each specific metric, which aids the tuning for individual phases.
15:20 ▶ Bio-inspired Self-Tuning Mechanism for Distributed Computing
A self-adapt/tuning approach for distributed systems is presented. It is unique to other existing approaches for software adaptation because it introduces the notions of differentiation, in cellular slime molds, e.g., dictyostelium discoideum, into real distributed systems. When an agent delegates a function to another agent coordinating with it, if the former has the function, this function becomes less-developed and the latter’s function becomes well-developed in the sense that the computational resources assigned to functions are changed according to the undertaking/delegation of functions in agents from/to other agents. The approach is useful to manage and adapt distributed systems in a self-organizing and scalable mannar.
15:45 ▶ Position Paper: Evolving Advanced Neural Networks on Run-Time Reconfigurable Digital Hardware Platform
This paper describes the development of a framework for prototyping evolvable hardware Spiking Neural Networks using runtime reconfigurable systems in a Xilinx FPGAs. Practical implementations are focused on the classification of acquired EEG signals that are processed using the wavelet transform. The dynamic run-time partial reconfiguration (PR) capability of the Virtex FPGA is used to interchange those units of the system – therefore saving precious resources – that do not run their algorithms in parallel.
Session 3: 16:30-18:00
16:30 ▶ Adaptive OpenCL (ACL) Execution in GPU ArchitecturesWinner best paper/presentation
Open Compute Language (OpenCL) has been proposed as a platform-independent, parallel execution model to target heterogeneous systems, including multiple central processing units, graphics processing units (GPUs), and digital signal processors (DSPs). OpenCL parallelism scales with the available resources and hardware generational improvements due to the data-parallel nature of its kernels. Such parallel expressions must adhere to a rigid execution model, essentially forcing the run-time system to behave as a batch-scheduler for small, local workgroups of a larger global problem. In many scenarios, especially in the real-time computing environments of mobile computing, a mobile system must adapt to system constraints and problem characteristics. This paper investigates the concept of Adaptive OpenCL (ACL) to explore algorithm support for dynamically adapting data-model properties and runtime machine characteristics. We show that certain algorithms can be structured to dynamically balance problem correctness and performance.
16:55 ▶ Position Paper: Weak Heterogeneity as a way of Adapting Multicores to Real Workloads
Winner best paper/presentation
There is a growing consensus that heterogeneous multicores are the future of CPUs. These processors would be composed of cores that are specifically adapted or tuned to particular types of applications and use cases, thereby increasing performance. The move from homogeneous to heterogeneous multicores causes the design space to explode, however. An architect of a heterogeneous processor must make design decisions per processor core rather than once for the entire processor as before. Currently, there are no methods for handling this design complexity to yield a processor that performs well for real workloads. As a step forward, we propose weak heterogeneity. A weakly heterogeneous processor is one whose cores are different, but not significantly so. The cores share an ISA and major microarchitectural features, differing only in minor details. Limiting the design space in this way allows us to explore the heterogeneous space without becoming overwhelmed by its size. We show preliminary results suggesting that a design space so constrained still has interesting trade-offs among performance, power consumption, and area.
17:10 ▶ Position Paper: Code Specialization For Red-Black Tree Management Algorithms
There is a lot of work spent on low-level optimization for regular computations; from instruction scheduling and cache-aware design to intensive use of SIMD instructions. Meanwhile, irregular applications, especially pointer intensive ones, are often only optimized at algorithm or compilation levels, since not so much hardware or dedicated instructions are available for this kind of code. In this paper, we investigate a low-level optimization of associative arrays intensively used in complex applications such as dynamic compilers, using self-modifying code. We base our work on Red-Black tree algorithms which are widely used to implement associative arrays. In order to accelerate Red-Black tree algorithms, we propose to transform tree data structures into executable code. With Red-Black trees encoded as specialized binary code rather than data, we intend to accelerate the tree traversal by taking advantage of the underlying hardware: program cache, processor fetch and decode. Our experiments show a gain of 45% on an ARM Cortex-A9 processor. We also show that we transfer most of the data-cache pressure to the program-cache, motivating future work on dedicated hardware.
17:25 ▶ Sambamba: Runtime Adaptive Parallel Execution
How can we exploit a microprocessor as efficiently as possible? The “classic” approach is static optimization at compile-time, conservatively optimizing a program while keeping all possible uses in mind. Further optimization can only be achieved by anticipating the actual usage profile: If we know, for instance, that two computations will be independent, we can run them in parallel. However, brute force parallelization may slow down execution due to its large overhead. But as this depends on runtime features, such as structure and size of input data, parallel execution needs to dynamically adapt to the runtime situation at hand.
Our SAMBAMBA framework implements such a dynamic adaptation for regular sequential C programs through adaptive dispatch between sequential and parallel function instances. In an evaluation of 14 programs, we show that automatic parallelization in combination with adaptive dispatch can lead to speed-ups of up to 5.2 fold on a quad-core machine with hyperthreading. At this point, we rely on programmer annotations but will get rid of this requirement as the platform evolves to support efficient speculative optimizations.
Our SAMBAMBA framework implements such a dynamic adaptation for regular sequential C programs through adaptive dispatch between sequential and parallel function instances. In an evaluation of 14 programs, we show that automatic parallelization in combination with adaptive dispatch can lead to speed-ups of up to 5.2 fold on a quad-core machine with hyperthreading. At this point, we rely on programmer annotations but will get rid of this requirement as the platform evolves to support efficient speculative optimizations.
17:50 ▶ Concluding remarks and Nvidia best paper/presentation award.
Registration, Accommodation and Travel
HiPEAC/ADAPT Registration (Make sure to tick the box for the ADAPT workshop)Local arrangement (HiPEAC website)
Call for papers
Computing systems are rapidly evolving into heterogeneous machines featuring many processor cores. This leads to a tremendous complexity with an unprecedented number of available design and optimization choices for architectures, applications, compilers and run-time systems. Using outdated, non-adaptive technology results in an enormous waste of expensive computing resources and energy, while slowing down time to market.The 3rd International Workshop on Adaptive Self-tuning Computing Systems is an interdisciplinary forum for researchers, practitioners, developers and application writers to discuss ideas, experience, methodology, applications, practical techniques and tools to improve or change current and future computing systems using self-tuning technology. Such systems should be able to automatically adjust their behavior to multi-objective usage scenarios at all levels (hardware and software) based on empirical, dynamic, iterative, statistical, collective, bio-inspired, machine learning and alternative techniques while fully utilizing available resources.
All papers will be peer-reviewed including short position papers and should include ideas on how to simplify, automate and standardize the design, programming, optimization and adaptation of large-scale computing systems for multiple objectives to improve performance, power consumption, utilization, reliability and scalability.
Submission guidelines
We invite papers in two categories:- Full papers should be at most 6 pages long (excluding bibliography). Papers in this category are expected to have relatively mature content.
- Position papers should be between 1-2 pages long (excluding bibliography). Preliminary and exploratory work are welcome in this category, including wild & crazy ideas. Authors submitting papers in this category must prepend "Position Paper:" to the title of the submitted paper.
For any questions/problems related to the submission please send an email here: christophe.dubach (_@_) ed.ac.uk .
IMPORTANT DATES:
Abstract: October 15, 2012Paper submission: October 22, 2012 (no extension)Notification: November 26, 2012- Final version: December 10, 2012
- Workshop: January 22, 2013
Organisers / Program chairs
- Christophe Dubach (University of Edinburgh, UK)
- Grigori Fursin (INRIA, France)
Program committee
- Erik Altman (IBM TJ Watson, USA)
- Marisa Gil (UPC, Spain)
- Vijay Janapa Reddi (UT Austin, USA)
- Timothy Jones (University of Cambridge, UK)
- Jaejin Lee (Seoul National University, Korea)
- Anton Lokmotov (ARM, UK)
- Chi-Keung Luk (Intel, USA)
- Tipp Moseley (Google, USA)
- Lasse Natvig (NTNU, Norway)
- David Padua (UIUC, USA)
- Markus Pueschel (ETH Zurich, Switzerland)
- Juergen Teich (University of Erlangen-Nuremberg, Germany)
- Chengyong Wu (ICT, China)