I'm a final-year PhD student at the School of Informatics, University of Edinburgh, supervised by Iain Murray. I'm a member of the Centre for Doctoral Training in Data Science and co-funded by Microsoft Research.

I'm interested in probabilistic approaches to machine learning. Currently my work focuses on deep learning methods for density estimation and Bayesian inference. Previously, I studied Electrical and Computer Engineering at the Aristotle University of Thessaloniki and Advanced Computing at Imperial College London.

**MSc by Research in Data Science, University of Edinburgh.**

Grade 92%, with Distinction. Won the *MSc by Research in Data Science Class Prize*.

**MSc in Advanced Computing, Imperial College London.**

Grade 90%, with Distinction. Won the *Corporate Partnership Programme Award for Academic Excellence* and the *Winton Capital Applied Computing MSc Project Prize*.

**MEng in Electrical and Computer Engineering, Aristotle University of Thessaloniki.**

Grade 89.6%, with Distinction.

**Research intern, Microsoft Research Cambridge.**

I worked on performing Bayesian inference in computer vision models using *Infer.NET*. My supervisor was John Winn.

**Teaching assistant, University of Edinburgh.**

I've tutored and/or marked the following courses:
*Machine Learning & Pattern Recognition*;
*Introductory Applied Machine Learning*;
*Probabilistic Modelling & Reasoning*;
*Informatics 2B - Algorithms, Data Structures & Learning*;
*Introduction to Theoretical Computer Science*.

**Research assistant, Information Technologies Institute, Centre for Research & Technology Hellas.**

I've participated in the EU-funded project *Adapt4EE* and the Greek-funded project *EnNoisis*. Most of my work focused on automatic activity recognition in smart homes with ambient sensors and Kinect cameras. Quite a lot of machine learning and computer vision involved.

**Research assistant, Aristotle University of Thessaloniki.**

I've participated in the EU-funded project *AutoGPU*, where I developed software for fast parallel low-level image processing on GPUs. I was writing a lot of CUDA back then.

Sequential Neural Likelihood is a fast and robust algorithm for inference in simulator models, which are models we can simulate but whose likelihood we can't compute. SNL works by trainining a Masked Autoregressive Flow on simulated data to learn the simulator model's intractable likelihood. During training, preliminary fits to the likelihood are used to suggest what simulations to run next, which reduces the total number of simulations dramatically. SNL brings together ideas from likelihood-free inference and neural density estimation, and it's a more robust alternative to related methods that learn the posterior directly.

For more details you can have a look at the paper. Code that reproduces the experiments is here.

Autoregressive models and normalizing flows are types of neural networks that achieve state-of-the-art performance in density estimation. We developed Masked Autoregressive Flow, which is a normalizing flow whose layers are autoregressive models. MAF is obtained by stacking together a number of MADEs, such that each MADE models the random numbers that drive the next MADE in the stack. MAF has close connections to Inverse Autoregressive Flow and RealNVP, and yields state-of-the-art performance in several general-purpose density estimation tasks.

For more details you can have a look at the paper. Code that reproduces the experiments can be found here. This work was presented as an oral at NIPS 2017, video here.

Suppose we have a probabilistic model which we can simulate forward to generate data from, but whose likelihood we can't evaluate. How can we do Bayesian inference in such a model? We propose using simulated data from the model to train a Bayesian neural network to return the intractable posterior. By using preliminary fits to the posterior to guide future simulations, we can dramatically speed up the process. Our approach improves over the state-of-the-art in likelihood-free inference in three ways: (a) it targets the exact posterior, (b) it represents the posterior parametrically, and (c) it significantly reduces the number of required simulations.

For more details you can have a look at the paper. Code that reproduces the experiments can be found here. Dennis Prangle wrote a very nice blog post about our work.

In machine learning, many good models are large, expensive or intractable. *Knowledge distillation* is the idea of training a convenient model to mimic a good but cumbersome model. This way we obtain the good performance of the original model in a much more convenient compact form. We apply this idea in: (a) *model compression*, where we compress large discriminative models, such as ensembles of neural nets, into models of much smaller size; (b) *Bayesian inference*, where we distil streams of MCMC samples into closed-form predictive distributions; (c) *intractable generative models*, where we distil unnormalizable models such as RBMs into tractable models such as NADEs.

You can read more in the relevant MSc thesis. Code can be found here.

When represented as matrices, real-world data often have a low-rank structure, whereas corruptions are often sparse. Based on this observation, several optimization-based algorithms that aim to separate the low-rank component from the sparse component have been developed. In this work, we make three contributions in the area of robust low-rank modellling: (a) we review and compare existing matrix-based methods; (b) we extend matrix-based methods to tensors, introducing several tensor-based algorithms; (c) we apply both matrix-based and tensor-based algorithms in practical computer vision tasks.

You can read more in the relevant MSc thesis. This thesis won the *Winton Capital Applied Computing MSc Project Prize*.

Stochastic gradient-based optimization algorithms have become the standard method for training machine learning models such as neural nets, due to their good scalability to large datasets. Nevertheless, standard stochastic gradient descent has a slower theoretical convergence rate compared to batch gradient descent. Semi-stochastic algorithms, such as S2GD and SAG, combine fast convergence with scalability. In this project, we compare the performance of semi-stochastic methods to standard stochastic and batch methods in convex machine learning problems. We find that semi-stochastic methods indeed converge to the optimum much faster, but this doesn't necessarily translate to better generalization performance.

This is a small project I did with Peter Richtárik. You can read more in the technical report. MATLAB code with my implementation of the algorithms and scripts to reproduce the experiments can be found here.

Convolution and correlation, with or without local normalization, are fundamental low-level operations in image processing applications. In this project, we develop algorthms and software for their fast computation, based on (a) use of the Fourier domain for large templates, and (b) parallelization on GPUs. We've developed the *FLCC Library*, a software tool which automatically determines which algorithm/platform combination works the fastest for the particular problem at hand and then executes it appropriately.

For more information see the relevant MEng thesis. The latest version of the code is hosted on the FLCC Library website and the AutoGPU website.

G. Papamakarios,
D. C. Sterratt,
I. Murray.
Sequential Neural Likelihood: Fast Likelihood-free Inference with Autoregressive Flows.
arXiv:1805.07226. 2018.

pdf
bibtex
code

G. Papamakarios,
T. Pavlakou,
I. Murray.
Masked Autoregressive Flow for Density Estimation.
Advances in Neural Information Processing Systems. 2017.
**(oral presentation)**

pdf
bibtex
code

N. Xue,
G. Papamakarios,
M. Bahri,
Y. Panagakis,
S. Zafeiriou.
Robust low-rank tensor modelling using Tucker and CP decomposition.
In Proceedings of the 25th European Signal Processing Conference, pages 1185–1189. 2017.

pdf
bibtex

G. Papamakarios,
I. Murray.
Fast ε-free Inference of Simulation Models with Bayesian Conditional Density Estimation.
Advances in Neural Information Processing Systems. 2016.

pdf
bibtex
code

G. Papamakarios,
Y. Panagakis,
S. Zafeiriou.
Generalised Scalable Robust Principal Component Analysis.
In Proceedings of the British Machine Vision Conference. 2014.

pdf
bibtex
code

G. Papamakarios,
D. Giakoumis,
K. Votis,
S. Segouli,
D. Tzovaras,
C. Karagiannidis.
Synthetic Ground Truth Data Generation for Automatic Trajectory-based ADL Detection.
In Proceedings of the IEEE-EMBS International Conference on Biomedical and Health Informatics, pages 33–36. 2014.

web
bibtex

G. Papamakarios,
G. Rizos,
N. P. Pitsianis,
X. Sun.
Fast Computation of Local Correlation Coefficients on Graphics Processing Units.
In Proceedings of SPIE, volume 7444, pages 744412–744412-8. 2009.

pdf
bibtex

G. Papamakarios,
I. Murray.
Distilling Intractable Generative Models.
Probabilistic Integration Workshop at the Neural Information Processing Systems Conference. 2015.

pdf
bibtex
code

G. Papamakarios,
D. Giakoumis,
M. Vasileiadis,
K. Votis,
D. Tzovaras,
S. Segouli,
C. Karagiannidis.
A Tool to Monitor and Support Physical Exercise Interventions for MCI and AD Patients.
2nd Patient Rehabilitation Techniques Workshop at the 8th International Conference on Pervasive Computing Technologies for Healthcare, 2014.

web
bibtex

G. Papamakarios,
D. Giakoumis,
M. Vasileiadis,
A. Drosou,
D. Tzovaras.
Human Computer Confluence in the Smart Home Paradigm: Detecting Human States and Behaviours for 24/7 Support of Mild-Cognitive Impairments.
In Human Computer Confluence: Transforming Human Experience Through Symbiotic Technologies, chapter 16, pages 275–293, De Gruyter Open, 2016.

pdf
bibtex

G. Papamakarios.
Distilling Model Knowledge.
MSc by Research Thesis, Centre for Doctoral Training in Data Science, University of Edinburgh. 2015.

pdf
bibtex
code

G. Papamakarios.
Robust Low-Rank Modelling on Matrices and Tensors.
MSc Thesis, Department of Computing, Imperial College London. 2014.

pdf
bibtex

G. Papamakarios,
G. Rizos.
FLCC: A Library for Fast Computation of Convolution and Local Correlation Coefficients.
MEng Thesis, Department of Electrical and Computer Engineering, Aristotle University of Thessaloniki. 2011.

pdf
bibtex
code

Room 2.25, Informatics Forum

University of Edinburgh

10 Crichton Street, EH8 9AB

Edinburgh, UK