Working with terabytes of Twitter data and historical corpora that can fit on a floppy disk, I investigate how language is used, how it varies between people and how this all changes across time.
Identity Signals in Emoji Do not Influence Perception of Factual Truth on Twitter was accepted for the Emoji2021 workshop.
Semantic Journeys: Quantifying Change in Emoji Meaning from 2012-2018 was accepted for the Emoji2021 workshop. Check out this interactive dashboard to explore data on semantic change in emoji.
I built a dashboard for exploring some emoji semantic change data I have.
A paper I worked on as part of collaboration with the vet school, "Scaling Systematic Literature Reviews with Machine Learning Pipelines", was presented at the Scholarly Document Processing workshop at EMNLP.
My journal article, "Emoji skin tone modifiers: Analyzing variation in usage on social media", was published in ACM Transactions on Social Computing. A non-paywall PDF version is available for download here.
I wrote about how to plot and visualise data for Cambridge Spark.
I gave some talks about my work at the Centre For English Corpus Linguistics at UCLouvain.
I was interviewed by The Verge's podcast "Why'd You Push That Button?" and talked about my research into emoji skin tone modifiers.
I wrote some blog posts for Cambridge Spark, showing how to extract data from websites that don't have an API for it, and how to deploy a machine learning model to the web.
My paper Self-Representation on Twitter Using Emoji Skin Color Modifiers was accepted to ICWSM. It was covered, amongst others, by the BBC, the Telegraph and National Geographic.
My paper Evaluating historical text normalization systems: How well do they generalize? was accepted to NAACL.
I spent a summer at TAB, working on categorisation and recommendation systems.
Anything to do with historical spelling variation, the psychology of language processing, language use and variation.
A mostly up-to-date copy of my CV can be found here.