University Crest

Dr. Beatrice Alex

Research Fellow in Text Mining

Bea Alex

Contact Details:
University of Edinburgh
School of Informatics
10 Crichton Street, Room 4.38
Edinburgh, EH8 9AB, UK
Tel: +44 (131) 650 2684

My Research

I'm a Research Fellow at the Institute for Language, Cognition and Computation (ILCC) at the School of Informatics at the University of Edinburgh. My research interests are in text mining for documents from different domains as well as multi- and mixed-lingual text processing and its applications. My ambition is to make archives more accessible to users.

I'm currently involved in a collaboration with William Whiteley on text mining brain image reports for disease information (Targeted treatment for acute stroke: development of prognostic models & decision support tools, 02/2014-08/2015, MRC) and in S-Case (FP7).

Most recently I worked on the Palimpsest project, one of the Digital Transformations in the Arts and Humanities, Big Data projects funded by AHRC. Palimpsest's main goal was to mine and geo-reference Edinburgh's literature. It was a collaboration with English Literature scholars and visualisation experts. Our aim was to adapt the Edinburgh Geoparser to do fine-grained geo-referencing (on the street and building level) using an Edinburgh gazetteer which we are in the process of aggregating. We also analysed the context of geo-referenced locations in text in order to visualise Edinburgh's literature in different ways. The web interface to it is called LitLong and can be found here:

I also worked on the BotaniTours project on aggregating and mining botanical information (wild plants and gardens) for tourists to the Scottish Borders.

In 2012 and 2013 I spent most of my time working on historical text mining for Trading Consequences, a Digging Into Data project. It was a highly interdisciplinary project working with environmental historians at York University in Canada, visualisation experts at the University of St. Andrews and database specialists at Edina. Our goal was to explore big historical text collections related to trade in the British Empire during the nineteenth century. By means of geo-grounding of place names and commodity entity and relation extraction we tried to determine the economic trends and environmental consequences of commodity trading in the world. The outcomes of this research is accessible in our White Paper.

Previously, I worked as a researcher on SYNC3, a large European project (FP7) which developed a system that analyses news events and related blogs. I was primarily responsible for labelling of news events and extracting their relatedness in space and time. I also studied causal relations for story analysis within news as part of this project.

In the past, I also worked on the TXM and TXV projects. This work involved text mining, and specifically named entity recognition, as part of an information and relationship extraction pipeline applied to the domains of biomedicine and recruitment. I was also heavily involved in the user evaluation of an NLP-assisted curation tool and in corpus annotation and preparation.

I hold a PhD in Computational Linguistics from the University of Edinburgh (ESRC and Edinburgh-Stanford-Link-funded). My PhD thesis is on the automatic foreign inclusion detection in text. Most recent work involved interfacing an English inclusion classifier with a statistical parser in order to improve parsing performance (EMNLP 2007).

During my PhD, I was also involved in the following projects:

  • SEER - Machine learning of entity recognisers for modular retargetable natural language processing
  • SUM - The use of rhetorical and discourse structure information for the generation of flexible, high-compression summaries in the legal domain
  • CROSSMARC - CROSS-lingual Multi-Agent Retail Comparison