me

Asa Cooper Stickland

About


As of August 2023 I am joining the Alignment Research Group at NYU under Sam Bowman, as a postdoc. I'll be broadly working on aligning language models. More specifically: In the limit of more powerful language models, we can't trust model output, since models may be aware they are being evaluated and "play nice" only to defect later. I will be working on evaluations that can get around this problem, by doing things like extra fine-tuning runs and interventions on model internals. If you are interested in collaborations on this topic or similar ones, please reach out, especially if you are at NYU.

I completed my PhD in the EPSRC Centre for Doctoral Training in Data Science at Edinburgh University. I was supervised by Iain Murray and my second supervisor was Ivan Titov. My PhD focused on transfer learning and robustness, particularly for multilingual models, and I'm interested in parameter-efficient ways to use and train large language models. I did internships at Facebook AI, NAVER labs Europe, and Amazon, and mainly worked on using pre-trained models for machine translation. I've previously worked on approximate inference (e.g. variational inference, MCMC, ABC), and am still a big fan of Bayes' rule.

I did my undergrad in Durham (Mphys in Physics) where my masters project involved using Bayesian linear regression for finding which properties of proteins were most effective for killing bacteria. I did a research internship in Durham doing fluid dynamics simulations, and in summer 2017 I did an internship in a startup called Five AI who are making autonomous vehicles.

CV

Publications


2024

Future Events as Backdoor Triggers: Investigating Temporal Vulnerabilities in LLMs

Sara Price, Arjun Panickssery, Sam Bowman, Asa Cooper Stickland

We train sleeper agent models which act maliciously if they see future (post training-cutoff) news headlines, but act normally otherwise, and explore how this changes the effectiveness of safety training: link. Arxiv.

Steering Without Side Effects: Improving Post-Deployment Control of Language Models

Asa Cooper Stickland, Alexander Lyzhov, Jacob Pfau, Salsabila Mahdi, Samuel R. Bowman

Adding steering vectors to language models represents a lightweight way to modify behavior post-deployment, but comes at the cost of capabilities. We develop a technique to reduce these capabilities side-effects: link. Arxiv.

GPQA: A Graduate-Level Google-Proof Q&A Benchmark

David Rein, Betty Li Hou, Asa Cooper Stickland, Jackson Petty, Richard Yuanzhe Pang, Julien Dirani, Julian Michael, Samuel R. Bowman

A challenging dataset of 448 multiple-choice questions written by domain experts in biology, physics, and chemistry, designed to be used for scalable oversight research: link. COLM, 2024.

2023

The Reversal Curse: LLMs trained on "A is B" fail to learn "B is A"

Lukas Berglund, Meg Tong, Max Kaufmann, Mikita Balesni, Asa Cooper Stickland, Tomasz Korbak, Owain Evans

Born out of our work measuring situational awareness, we found models cannot generalize from "A is B" to "B is A", e.g. when trained on "Olaf Scholz was the ninth Chancellor of Germany", they will not automatically be able to answer the question, "Who was the ninth Chancellor of Germany?": link. Arxiv.

Taken out of context: On measuring situational awareness in LLMs

Lukas Berglund, Asa Cooper Stickland, Mikita Balesni, Max Kaufmann, Meg Tong, Tomasz Korbak, Daniel Kokotajlo, Owain Evans

We measure a proxy for "situational awareness" in LLMs, i.e. their ability to reason about the fact that they are machine learning models, whether they are being evaluated, etc: link. Arxiv.

Robustification of Multilingual Language Models to Real-world Noise in Crosslingual Zero-shot Settings with Robust Contrastive Pretraining

Asa Cooper Stickland, Sailik Sengupta, Jason Krone, Saab Mansour, He He

We evaluate models on "noisy" (e.g. data with typos) data in multiple languages, and propose a new pretraining objective which improves robustness to noise: link. EACL, 2023.

2022

When does Parameter-Efficient Transfer Learning Work for Machine Translation?

Ahmet Üstün, Asa Cooper Stickland

Comprehensive study of parameter-efficient fine-tuning of pre-trained models for MT, evaluating 1) various parameter budgets, 2) a diverse set of language-pairs, and 3) different pre-trained model scales and pre-training objectives: link. EMNLP, 2022.

2021

Regularising Fisher Information Improves Cross-lingial Generalisation

Asa Cooper Stickland, Iain Murray

Short paper examining the link between consitency losses, the fisher information matrix and cross-lingual generalisation: link. Multilingual Representation Learning workshop at EMNLP, 2021.

Multilingual Domain Adaptation for NMT: Decoupling Language and Domain Information

Asa Cooper Stickland, Alexandre Bérard, Vassilina Nikoulina

We study parameter-efficient domain adaptation for Machine Translation specifically, 1) parameter-efficient adaptation to multiple domains and languages simultaneously and 2) cross-lingual transfer in domains where parallel data is unavailable for certain language pairs: link. WMT, 2021.

Recipes for Adapting Pre-trained Monolingual and Multilingual Models to Machine Translation

Asa Cooper Stickland, Xian Li, Marjan Ghazvininejad

Result of my Facebook AI internship, we examine which parameters to leave frozen when fine-tuning large pre-trained sequence-to-sequence models on machine translation, for both monolingual and multilingual pre-trained models: link. EACL, 2021.

2020

Deep Transformers with Latent Depth

Xian Li, Asa Cooper Stickland, Yuqing Tang, Xiang Kong

We model the choice of which transformer layer to use as a latent variable, allowing us to train deeper models and e.g. learn which layers to share between languages for multilingual machine translation: arxiv link. Neurips, 2020.

Diverse Ensembles Improve Calibration

Asa Cooper Stickland, Iain Murray

Short paper on whether calibration and accuracy improve when using ensembles of models with different data augmentation for each ensemble member: arxiv link. ICML Workshop on Uncertainty and Robustness in Deep Learning, 2020.

2019

BERT and PALs: Projected Attention Layers for Efficient Adaptation in Multi-Task Learning

Asa Cooper Stickland, Iain Murray

This was a project about finding an efficient way to add parameters to a large pre-trained model, BERT, to get good performance for tasks in the GLUE benchmark: arxiv link. In ICML 2019, and featured in the NAACL 2019 transfer learning tutorial.

Misc.


I have recently been a mentor for MATS and helped organize a workshop on evaluations at Constellation.