me

Asa Cooper Stickland

About


As of August 2023 I am joining the Alignment Research Group at NYU under Sam Bowman, as a postdoc. I'll be broadly working on aligning language models. More specifically: In the limit of more powerful language models, we can't trust model output, since models may be aware they are being evaluated and "play nice" only to defect later. I will be working on evaluations that can get around this problem, by doing things like extra fine-tuning runs and interventions on model internals. If you are interested in collaborations on this topic or similar ones, please reach out, especially if you are at NYU.

I just finished my PhD in the EPSRC Centre for Doctoral Training in Data Science at Edinburgh University. I was supervised by Iain Murray and my second supervisor was Ivan Titov. My PhD focused on transfer learning and robustness, particularly for multilingual models, and I'm interested in parameter-efficient ways to use and train large language models. I did internships at Facebook AI, NAVER labs Europe, and Amazon, and mainly worked on using pre-trained models for machine translation. I've previously worked on approximate inference (e.g. variational inference, MCMC, ABC), and am still a big fan of Bayes' rule.

I did my undergrad in Durham (Mphys in Physics) where my masters project involved using Bayesian linear regression for finding which properties of proteins were most effective for killing bacteria. I did a research internship in Durham doing fluid dynamics simulations, and in summer 2017 I did an internship in a startup called Five AI who are making autonomous vehicles.

CV

Publications


2023

The Reversal Curse: LLMs trained on "A is B" fail to learn "B is A"

Lukas Berglund, Meg Tong, Max Kaufmann, Mikita Balesni, Asa Cooper Stickland, Tomasz Korbak, Owain Evans

Born out of our work measuring situational awareness, we found models cannot generalize from "A is B" to "B is A", e.g. when trained on "Olaf Scholz was the ninth Chancellor of Germany", they will not automatically be able to answer the question, "Who was the ninth Chancellor of Germany?": link. Arxiv.

Taken out of context: On measuring situational awareness in LLMs

Lukas Berglund, Asa Cooper Stickland, Mikita Balesni, Max Kaufmann, Meg Tong, Tomasz Korbak, Daniel Kokotajlo, Owain Evans

We measure a proxy for "situational awareness" in LLMs, i.e. their ability to reason about the fact that they are machine learning models, whether they are being evaluated, etc: link. Arxiv.

Robustification of Multilingual Language Models to Real-world Noise in Crosslingual Zero-shot Settings with Robust Contrastive Pretraining

Asa Cooper Stickland, Sailik Sengupta, Jason Krone, Saab Mansour, He He

We evaluate models on "noisy" (e.g. data with typos) data in multiple languages, and propose a new pretraining objective which improves robustness to noise: link. EACL, 2023.

2022

When does Parameter-Efficient Transfer Learning Work for Machine Translation?

Ahmet Üstün, Asa Cooper Stickland

Comprehensive study of parameter-efficient fine-tuning of pre-trained models for MT, evaluating 1) various parameter budgets, 2) a diverse set of language-pairs, and 3) different pre-trained model scales and pre-training objectives: link. EMNLP, 2022.

2021

Regularising Fisher Information Improves Cross-lingial Generalisation

Asa Cooper Stickland, Iain Murray

Short paper examining the link between consitency losses, the fisher information matrix and cross-lingual generalisation: link. Multilingual Representation Learning workshop at EMNLP, 2021.

Recipes for Adapting Pre-trained Monolingual and Multilingual Models to Machine Translation

Asa Cooper Stickland, Xian Li, Marjan Ghazvininejad

Result of my Facebook AI internship, we examine which parameters to leave frozen when fine-tuning large pre-trained sequence-to-sequence models on machine translation, for both monolingual and multilingual pre-trained models: link. EACL, 2021.

2020

Deep Transformers with Latent Depth

Xian Li, Asa Cooper Stickland, Yuqing Tang, Xiang Kong

We model the choice of which transformer layer to use as a latent variable, allowing us to train deeper models and e.g. learn which layers to share between languages for multilingual machine translation: arxiv link. Neurips, 2020.

Diverse Ensembles Improve Calibration

Asa Cooper Stickland, Iain Murray

Short paper on whether calibration and accuracy improve when using ensembles of models with different data augmentation for each ensemble member: arxiv link. ICML Workshop on Uncertainty and Robustness in Deep Learning, 2020.

2019

BERT and PALs: Projected Attention Layers for Efficient Adaptation in Multi-Task Learning

Asa Cooper Stickland, Iain Murray

This was a project about finding an efficient way to add parameters to a large pre-trained model, BERT, to get good performance for tasks in the GLUE benchmark: arxiv link. In ICML 2019, and featured in the NAACL 2019 transfer learning tutorial.

Projects


2017

Intern at Five AI

During my summer internship with Five AI I was tasked with correcting for the motion of the car during a LIDAR sweep. My method was validated by testing in the real world with Five AI's prototype car. During this project I used C++ and ROS.

I also worked on a side project where I was attempting to predict depth from pairs of stereo images using convolutional neural networks. I used ROS and openCV to preprocess the data, and tensorflow for the prediction of depth.

Durham Masters Project

Project was to build an algorithm that can predict the ability of a peptide-like molecule to fight bacteria. I used sparse Bayesian linear regression to deal with a problem where the dimensions of the data are large and there are a small number of training instances.