"Ryght reuerent and wurschypfull and my ryght welebeloued Voluntyne, I recommande me vnto yowe full hertely, desyring to here of yowr welefare, whech I beseche Almyghty God long for to preserve vnto hys plesure and yowr hertys desyre."

Margery Paston to John Paston III, England, 1477

"Lol. Happy Valentine's Day, sweetie. 💖💋"

@cherokee_autumn to @smartassicus, Twitter, 2017


I'm a PhD candidate at the University of Edinburgh and a member of the CDT in Data Science where I'm supervised by Sharon Goldwater and Walid Magdy.

Working with terabytes of Twitter data and historical corpora that can fit on a floppy disk, I investigate how language is used, how it varies between people and how this all changes across time.


Black or White but never neutral: How readers perceive identity from yellow or skin-toned emoji was accepted for CSCW.

Identity Signals in Emoji Do not Influence Perception of Factual Truth on Twitter was accepted for the Emoji2021 workshop.

Semantic Journeys: Quantifying Change in Emoji Meaning from 2012-2018 was accepted for the Emoji2021 workshop. Check out this interactive dashboard to explore data on semantic change in emoji.

I built a graph-based puzzle solver for a minigame in Cyberpunk 2077. See the web-based app and the source code.

I built a dashboard for exploring some emoji semantic change data I have.

A paper I worked on as part of collaboration with the vet school, "Scaling Systematic Literature Reviews with Machine Learning Pipelines", was presented at the Scholarly Document Processing workshop at EMNLP.

My journal article, "Emoji skin tone modifiers: Analyzing variation in usage on social media", was published in ACM Transactions on Social Computing. A non-paywall PDF version is available for download here.

I wrote about how to plot and visualise data for Cambridge Spark.

The paper I wrote over the summer at Bell Labs, "The Language of Dialogue is Complex", was accepted to ICWSM. You can try analysing texts for integrative complexity on this little webapp I made.

My submission to IC2S2, on emoji and identity, was accepted. You can view the slides here.

I gave some talks about my work at the Centre For English Corpus Linguistics at UCLouvain.

I was interviewed by The Verge's podcast "Why'd You Push That Button?" and talked about my research into emoji skin tone modifiers.

I wrote some blog posts for Cambridge Spark, showing how to extract data from websites that don't have an API for it, and how to deploy a machine learning model to the web.

I spent the summer at Bell Labs in Cambridge, working on machine learning for determining Integrative Complexity from texts. An online demo is available here.

My paper Self-Representation on Twitter Using Emoji Skin Color Modifiers was accepted to ICWSM. It was covered, amongst others, by the BBC, the Telegraph and National Geographic.

My paper Evaluating historical text normalization systems: How well do they generalize? was accepted to NAACL.


I completed my MSc at the University of Edinburgh.

I took part in a data study group at the Alan Turing Institute,
helping an NHS Trust to predict emergency admissions.

I worked in the developmental psychology labs at both Edinburgh and Harvard, focusing on eye-tracking studies.

I spent a summer at TAB, working on categorisation and recommendation systems.

I completed my BA in linguistics at the University of Cambridge, where my dissertation supervisor was Paula Buttery.


Anything to do with historical spelling variation, the psychology of language processing, language use and variation.


Dissertations etc.

Thumbnail of BA dissertation

Bachelor's dissertation

Thumbnail of MSc dissertation

Master's thesis


Thumbnail of ACM TSC 2020 paper on emoji skin tone modifier variation on social media

Emoji Skin Tone Modifiers: Analyzing Variation in Usage on Social Media

Thumbnail of ICWSM 2019 paper on intergrative complexity

The Language of Dialogue Is Complex

Thumbnail of ICWSM 2018 paper on emoji skin tone modifier usage

Self-representation on Twitter using emoji skin color modifiers

Thumbnail of NAACL 2018 paper on evaluation historical text normalization systems

Evaluating historical text normalization systems: How well do they generalize?

Thumbail of paper on how children learn to avoid referential ambiguity

How do children learn to avoid referential ambiguity? Insights from eye-tracking

Thumbnail of paper on the important of awareness for language understanding

The importance of awareness for understanding language

Code, posters, misc.

Thumbnail for github page for emoji extractor tool

Emoji extractor

Thumbnail for NAACL 2018 poster

Poster: Evaluating historical text normalization systems:How well do they generalize?

Thumbnail for github page for iPad experiments

iPad experiments

Thumbnail for poster on referential ambiguity

Poster: Referential ambiguity

Thumbnail for poster on deep learning for historical text normalization

Poster: Deep learning for historical text normalisation


A mostly up-to-date copy of my CV can be found here.