Asa Cooper Stickland


I'm a PhD student in the EPSRC Centre for Doctoral Training in Data Science at Edinburgh University. I'm supervised by Iain Murray and my second supervisor is Ivan Titov.

My PhD has focused on transfer learning and robustness, particularly for multilingual models, and I'm interested in parameter-efficient ways to use and train large language models. I'm pretty convinced by arguments that powerful AI systems could be misaligned with human preferences, and want to work on solving this problem. I did internships at Facebook AI and NAVER labs Europe and mainly worked on using pre-trained models for machine translation. I've previously worked on approximate inference (e.g. variational inference, MCMC, ABC), and am still a big fan of Bayes' rule.

I did my undergrad in Durham (Mphys in Physics) where my masters project involved using Bayesian linear regression for finding which properties of proteins were most effective for killing bacteria. I did a research internship in Durham doing fluid dynamics simulations, and in summer 2017 I did an internship in a startup called Five AI who are making autonomous vehicles.




When does Parameter-Efficient Transfer Learning Work for Machine Translation?

Ahmet Üstün, Asa Cooper Stickland

Comprehensive study of parameter-efficient fine-tuning of pre-trained models for MT, evaluating 1) various parameter budgets, 2) a diverse set of language-pairs, and 3) different pre-trained model scales and pre-training objectives: link. Arxiv.


Regularising Fisher Information Improves Cross-lingial Generalisation

Asa Cooper Stickland, Iain Murray

Short paper examining the link between consitency losses, the fisher information matrix and cross-lingual generalisation: link. Multilingual Representation Learning workshop at EMNLP, 2021.

Recipes for Adapting Pre-trained Monolingual and Multilingual Models to Machine Translation

Asa Cooper Stickland, Xian Li, Marjan Ghazvininejad

Result of my Facebook AI internship, we examine which parameters to leave frozen when fine-tuning large pre-trained sequence-to-sequence models on machine translation, for both monolingual and multilingual pre-trained models: link. EACL, 2021.


Deep Transformers with Latent Depth

Xian Li, Asa Cooper Stickland, Yuqing Tang, Xiang Kong

We model the choice of which transformer layer to use as a latent variable, allowing us to train deeper models and e.g. learn which layers to share between languages for multilingual machine translation: arxiv link. Neurips, 2020.

Diverse Ensembles Improve Calibration

Asa Cooper Stickland, Iain Murray

Short paper on whether calibration and accuracy improve when using ensembles of models with different data augmentation for each ensemble member: arxiv link. ICML Workshop on Uncertainty and Robustness in Deep Learning, 2020.


BERT and PALs: Projected Attention Layers for Efficient Adaptation in Multi-Task Learning

Asa Cooper Stickland, Iain Murray

This was a project about finding an efficient way to add parameters to a large pre-trained model, BERT, to get good performance for tasks in the GLUE benchmark: arxiv link. In ICML 2019, and featured in the NAACL 2019 transfer learning tutorial.



Intern at Five AI

During my summer internship with Five AI I was tasked with correcting for the motion of the car during a LIDAR sweep. My method was validated by testing in the real world with Five AI's prototype car. During this project I used C++ and ROS.

I also worked on a side project where I was attempting to predict depth from pairs of stereo images using convolutional neural networks. I used ROS and openCV to preprocess the data, and tensorflow for the prediction of depth.

Durham Masters Project

Project was to build an algorithm that can predict the ability of a peptide-like molecule to fight bacteria. I used sparse Bayesian linear regression to deal with a problem where the dimensions of the data are large and there are a small number of training instances.