I'm Christophe Dubach, a lecturer (Assistant Professor) at the University of Edinburgh.

I'm part of the CArD group in the school of informatics. My research interests include architecture design space exploration, GPUs & parallel devices optimisations, high-level code generation and the use of machine-learning techniques applied to all these topics.


  • PhD studentships available for UK/European students through the EPSRC Center for Doctoral Training in Pervasive Parallelism. If you are interested, please contact me with your CV, transcripts and a short description of your research interests.
  • Just published a technical report on our latest work on high performance OpenCL code generation using patterns and rewrite rules.

Research Team


Michel Steuwer
Thibaut Lutz

PhD Students

Juan Jose Fumero
Toomas Remmelg
Paul-Jules Micolet
Adam Harries
Alberto Magni
Erik Tomusk

Research Highlights

High-Performance Code Generation for Parallel Processors using Functional Patterns

Project supported by Google.

  • Michel Steuwer
  • Adam Harries

Computing systems have become increasingly complex with the emergence of heterogeneous hardware combining multicore CPUs and GPUs. These parallel systems exhibit tremendous computational power at the cost of increased programming effort. This results in a tension between performance and code portability. This project investigates a novel approach aiming to combine high-level programming, code portability, and high-performance. Starting from a high-level functional expression we apply a simple set of rewrite rules to transform it into a low-level functional representation close to the OpenCL programming model and from which OpenCL code is generated. Our rewrite rules define a space of possible implementations which we automatically explore to generate hardware-specific OpenCL implementations.

GPU-Acceleration for the Graal VM

Project supported by Oracle Labs.

  • Michel Steuwer
  • Juan Fumero
  • Toomas Remmelg

This project aims at automatically accelerating applications running on top of the Graal VM, an Open source Java VM. To achieve this goal we first define a high-level API inspired by functional programming and data-flow programming concepts. At runtime, our system recognises the calls to our API and automatically generate OpenCL kernels. The runtime then manages the execution of this OpenCL kernel on multiple devices (e.g. GPU) which results in the application being transparently accelerated.

Design Space Exploration for Dynamically Reconfigurable Multicore

Project supported by Microsoft Research.

  • Paul Micolet

Dynamical reconfigurable multicore processors, such as the E2 architecture, offer the ability to merge simple cores into larger ones in order to increase performance. However, the problem of deciding how to aggregate these cores is non-trivial and is highly dependent on the application. In this project, we investigate the problem of mapping multi-threaded applications written in a data-flow language to this type of architecture.

Design Space Exploration for Heterogeneous Multicore

Project supported by ARM.

  • Erik Tomusk

Single-ISA heterogeneous processors have the potential to maximise performance in power-constrained mobile devices by running each job on the best available CPU core given a power budget. This research project focuses on developing new techniques to select the best set of cores in order to maximise runtime flexibility in terms of power and performance.

GPU Optimisation & Machine-Learning

Project supported by ARM.

  • Alberto Magni

Programming models such as OpenCL have been designed to achieve functional portability across multi-core devices from different vendors. However, the lack of a single cross-target optimizing compiler severely limits performance portability of OpenCL programs. Programmers need to manually tune applications for each specific device, preventing effective portability. In this project, we target a compiler transformations specific for data-parallel languages such as thread-coarsening. We address the problem of finding the best parameters that control these transformation in order to reach maximum performance. We propose a solution based on a machine-learning model that predicts the best optimisation parameters using static code features. The model automatically specializes to the different architectures considered.



Program committe member

  • Symposium on Principles and Practice of Parallel Programming (PPoPP), 2016
  • International Symposium on Code Generation and Optimization (CGO), 2015
  • International Conference on Compiler Construction (CC), 2015
  • International Workshop on Adaptive Self-tuning Computing Systems (ADAPT), co-located with HiPEAC, 2013, 2014, 2015
  • International Workshop on Accelerating Data Management Systems Using Modern Processor and Storage Architectures (ADMS), co-located with VLDB, 2011, 2012, 2013, 2014, 2015
  • Workshop on Interaction between Compilers and Computer Architectures (INTERACT), co-located with HPCA, 2012, 2011, co-located with ASLOS, 2010
  • International Conference on Parallel and Distributed Systems (ICPADS), Multicore Computing and Parallel/Distributed Architecture track, 2011, 2013
  • Symposium on High Level Languages for Parallel Computing on FPGAs (HLFPGA), co-located with ParCo, 2015

Organising committee

  • Finance chair, International Symposium on Code Generation and Optimization (CGO) 2016
  • Workshop/Tutorial chair, International Symposium on Code Generation and Optimization (CGO) 2015
  • General co-chair, International Workshop on Adaptive Self-tuning Computing Systems (ADAPT), co-located with HiPEAC, 2013, 2014, 2015
  • Local chair, International Conference on Parallel Architectures and Compilation Techniques (PACT), 2013


[1] P.-J. Micolet, A. Smith, and C. Dubach. A machine learning approach to mapping streaming workloads to dynamic multicore processors. In Proceedings of the 17th ACM SIGPLAN/SIGBED conference on Languages, Compilers and Tools for Embedded Systems, LCTES, 2016. [ bib ]
[2] T. Remmelg, T. Lutz, M. Steuwer, and C. Dubach. Performance portable gpu code generation for matrix multiplication. In Proceedings of the 2016 Workshop on General Purpose Processing on Graphics Processing Units, GPGPU, 2016. [ bib ]
[3] E. Tomusk, C. Dubach, and M. O'Boyle. Four metrics to evaluate heterogeneous multicores. In International Conference on High Performance Embedded Architectures & Compilers, HiPEAC, 2016. [ bib ]
[4] A. Harries, M. Steuwer, M. Cole, A. Gray, and C. Dubach. Compositional compilation for sparse, irregular data parallelism. In Proceedings of the 2016 Workshop on High-Level Programming for Heterogeneous and Hierarchical Parallel Systems, HLPGPU, 2016. [ bib ]
[5] E. Tomusk, C. Dubach, and M. O'Boyle. Four metrics to evaluate heterogeneous multicores. ACM Transactions on Architecture and Code Optimization, ACM TACO, 12(4), 2015. [ bib ]
[6] E. Tomusk, C. Dubach, and M. O'Boyle. Diversity: A design goal for heterogeneous processors. IEEE Computer Architecture Letters, IEEE CAL, PP, 2015. [ bib ]
[7] M. Miller, D. Holden, R. Al-Ashqar, C. Dubach, K. Mitchell, and T. Komura. Carpet unrolling descriptors for character control on uneven terrain. In Proccedings of the ACM SIGRAPH Motion in Games Conference, MIG, 2015. [ bib ]
[8] M. Steuwer, C. Fensch, S. Lindley, and C. Dubach. Generating performance portable code using rewrite rules: From high-level functional expressions to high-performance opencl code. In Proceedings of the 20th ACM SIGPLAN International Conference on Funcational Programming, ICFP, 2015. [ bib ]
[9] J. J. Fumero, T. Remmelg, M. Steuwer, and C. Dubach. Runtime code generation and data management for heterogeneous computing in java. In Proccedings of the 12th International Conference on Principles and Practice of Programming on the Java Platform: Virtual machines, languages, and tools, PPPJ, 2015. [ bib ]
[10] M. Steuwer, C. Fensch, and C. Dubach. Patterns and rewrite rules for systematic code generation (from high-level functional patterns to high-performance opencl code). arXiv Technical Report arXiv:1502.02389, 2015. [ bib | .pdf ]
[11] A. Magni, C. Dubach, and M. O'Boyle. Automatic optimization of thread-coarsening for graphics processors. In Proceedings of the 23rd international conference on Parallel architectures and compilation, PACT, 2014. [ bib | .pdf ]
[12] J. J. Fumero, M. Steuwer, and C. Dubach. A composable array function interface for heterogeneous computing in java. In Proceedings of ACM SIGPLAN International Workshop on Libraries, Languages, and Compilers for Array Programming, ARRAY, 2014. [ bib | .pdf ]
[13] G. Fursin and C. Dubach. Community-driven reviewing and validation of publications. In Proceedings of the 1st ACM SIGPLAN Workshop on Reproducible Research Methodologies and New Publication Models in Computer Engineering, TRUST, 2014. [ bib | .pdf ]
[14] A. Magni, C. Dubach, and M. O'Boyle. Exploiting gpu hardware saturation for fast compiler optimization. In Proceedings of Workshop on General Purpose Processing Using GPUs, GPGPU, 2014. [ bib | .pdf ]
[15] A. Magni, C. Dubach, and M. F. P. O'Boyle. A large-scale cross-architecture evaluation of thread-coarsening. In Proceedings of the 2013 Conference on High Performance Computing Networking, Storage and Analysis, SC, 2013. [ bib | .pdf ]
[16] C. Dubach, T. M. Jones, and E. V. Bonilla. Dynamic microarchitectural adaptation using machine learning. ACM Transactions on Architecture and Code Optimization, ACM TACO, 10(4):31, 2013. [ bib | .pdf ]
[17] C. Dubach, P. Cheng, R. Rabbah, D. Bacon, and S. Fink. Compiling a high-level language for gpus (via language support for architectures and compilers). In Proceedings of the 33rd ACM SIGPLAN Symposium on Programming Language Design and Implementation, PLDI, 2012. [ bib | .pdf ]
[18] C. Dubach, T. M. Jones, and M. F. O'boyle. Exploring and predicting the effects of microarchitectural parameters and compiler optimizations on performance and energy. ACM Transactions on Embedded Computing Systems, ACM TECS, 11(1):24, 2012. [ bib | .pdf ]
[19] C. Dubach, T. M. Jones, and M. F. O'Boyle. An empirical architecture-centric approach to microarchitectural design space exploration. IEEE Transactions on Computers, IEEE TC, 60(10):1445--1458, 2011. [ bib | .pdf ]
[20] C. Dubach, T. M. Jones, E. V. Bonilla, and M. F. P. O'Boyle. A predictive model for dynamic microarchitectural adaptivity control. In Proceedings of the 43rd Annual IEEE/ACM International Symposium on Microarchitecture, MICRO, 2010. [ bib | .pdf ]
[21] C. Dubach, T. M. Jones, E. V. Bonilla, G. Fursin, and M. F. P. O'Boyle. Portable compiler optimisation across embedded programs and microarchitectures using machine learning. In Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture, MICRO, 2009. [ bib | .pdf ]
[22] C. Dubach, T. M. Jones, and M. F. O'Boyle. Rapid early-stage microarchitecture design using predictive models. In Proceedings of the 2009 IEEE International Conference on Computer Design, ICCD, 2009. [ bib | .pdf ]
[23] C. Dubach. Using machine-learning to efficiently explore the architecture/compiler co-design space. PhD Thesis, 2009. [ bib | .pdf ]
[24] C. Dubach, T. M. Jones, and M. F. O'Boyle. Exploring and predicting the architecture/optimising compiler co-design space. In Proceedings of the 2008 International Conference on Compilers, Architecture and Synthesis for Embedded Systems, CASES, 2008. [ bib | .pdf ]
[25] C. Dubach, J. Cavazos, B. Franke, G. Fursin, M. F. O'Boyle, and O. Temam. Fast compiler optimisation evaluation using code-feature based performance prediction. In Proceedings of the 4th International Conference on Computing Frontiers, CF, 2007. [ bib | .pdf ]
[26] C. Dubach, T. M. Jones, and M. F. P. O'Boyle. Microarchitectural design space exploration using an architecture-centric approach. In Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO, 2007. [ bib | .pdf ]
[27] J. Cavazos, C. Dubach, F. Agakov, E. Bonilla, M. F. P. O'Boyle, G. Fursin, and O. Temam. Automatic performance model construction for the fast software exploration of new hardware designs. In Proceedings of the 2006 International Conference on Compilers, Architecture and Synthesis for Embedded Systems, CASES, 2006. [ bib | .pdf ]
[28] M. Vuletic, C. Dubach, L. Pozzi, and P. Ienne. Enabling unrestricted automated synthesis of portable hardware accelerators for virtual machines. In Proceedings of the 3rd IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis, CODES+ISSS, 2005. [ bib | .pdf ]
[29] C. Dubach. Java byte code synthesis for reconfigurable computing platforms. Master's thesis, 2005. [ bib | .pdf ]


email : christophe.dubach (AT) ed.ac.uk
phone : +44 (0) 131 650 3092

Christophe Dubach
University of Edinburgh
Informatics Forum - 1.12
10 Crichton Street
Edinburgh EH8 9AB
United Kingdom