"Ryght reuerent and wurschypfull and my ryght welebeloued Voluntyne, I recommande me vnto yowe full hertely, desyring to here of yowr welefare, whech I beseche Almyghty God long for to preserve vnto hys plesure and yowr hertys desyre."

Margery Paston to John Paston III, England, 1477

"Lol. Happy Valentine's Day, sweetie. 💖💋"

@cherokee_autumn to @smartassicus, Twitter, 2017


I'm a PhD candidate at the University of Edinburgh
and a member of the CDT in Data Science
where I'm supervised by Sharon Goldwater and Walid Magdy.

Working with terabytes of Twitter data and historical corpora
that can fit on a floppy disk, I investigate how language is used,
how it varies between people and how this all changes across time.


I wrote about how to plot and visualise data for Cambridge Spark.

The paper I wrote over the summer at Bell Labs, on integrative complexity, was accepted to ICWSM.

My submission to IC2S2, on emoji and identity, was accepted.

I gave some talks about my work at the Centre For English Corpus Linguistics at UCLouvain.

I was interviewed by The Verge's podcast "Why'd You Push That Button?"
and talked about my research into emoji skin tone modifiers.

I wrote some blog posts for Cambridge Spark,
showing how to extract data from websites that don't have an API for it,
and how to deploy a machine learning model to the web.

I spent the summer at Bell Labs in Cambridge,
working on machine learning for determining Integrative Complexity from texts.
An online demo is available here.

My paper Self-Representation on Twitter Using Emoji Skin Color Modifiers was accepted to ICWSM.
It was covered, amongst others, by the BBC, the Telegraph and National Geographic.

My paper Evaluating historical text normalization systems: How well do they generalize? was accepted to NAACL.


I completed my MSc at the University of Edinburgh.

I took part in a data study group at the Alan Turing Institute,
helping an NHS Trust to predict emergency admissions.

I worked in the developmental psychology labs at
both Edinburgh and Harvard, focusing on eye-tracking studies.

I spent a summer at TAB, working on categorisation and recommendation systems.

I completed my BA in linguistics at the University of Cambridge, where my
dissertation supervisor was Paula Buttery.


Anything to do with historical spelling variation, the psychology of language processing, language use and variation.


Dissertations etc.

Bachelor's dissertation

Master's thesis


Self-representation on Twitter using emoji skin color modifiers

Evaluating historical text normalization systems: How well do they generalize?

How do children learn to avoid referential ambiguity? Insights from eye-tracking

The importance of awareness for understanding language

Code, posters, misc.

Emoji extractor

Poster: Evaluating historical text normalization systems:How well do they generalize?

iPad experiments

Poster: Referential ambiguity

Poster: Deep learning for historical text normalisation


A mostly up-to-date copy of my CV can be found here.