University Crest

Dr. Beatrice Alex

Research Fellow in Text Mining

Bea Alex








Contact Details:
University of Edinburgh
School of Informatics
10 Crichton Street, Room 4.38
Edinburgh, EH8 9AB, UK
balex@staffmail.ed.ac.uk
Tel: +44 (131) 650 2684

My Research

I'm a Research Fellow at the Institute for Language, Cognition and Computation (ILCC) at the School of Informatics at the University of Edinburgh and a Faculty Fellow of the Alan Turing Institute. My research interests are in text mining for different domains, speech analytics as well as multi- and mixed-lingual text processing and its application. I am currently pursuing research on speech analytics in collaboration with the British Library as part of my ATI fellowship and am planning to build on this line of work with the aim to make speech archives more accessible.

I was a lead researcher on the Palimpsest project, one of the Digital Transformations in the Arts and Humanities, Big Data projects funded by AHRC. Palimpsest's main goal was to mine and geo-reference Edinburgh's literature. It was a collaboration with English Literature scholars and visualisation experts. Our aim was to adapt the Edinburgh Geoparser to do fine-grained geo-referencing (on the street and building level) using an Edinburgh gazetteer which we are in the process of aggregating. We also analysed the context of geo-referenced locations in text in order to visualise Edinburgh's literature in different ways. The web interface to it is called LitLong and can be found here: www.litlong.org

I was also involved in a collaboration with William Whiteley on text mining brain image reports for disease information (Targeted treatment for acute stroke: development of prognostic models & decision support tools, MRC).

In the past, I worked on the following projects:

I hold a PhD in Computational Linguistics from the University of Edinburgh (ESRC and Edinburgh-Stanford-Link-funded). My PhD thesis is on the automatic foreign inclusion detection in text. It involved interfacing an English inclusion classifier with a statistical parser in order to improve parsing performance (EMNLP 2007).

During my PhD, I was also involved in the following projects:

  • SEER - Machine learning of entity recognisers for modular retargetable natural language processing
  • SUM - The use of rhetorical and discourse structure information for the generation of flexible, high-compression summaries in the legal domain
  • CROSSMARC - CROSS-lingual Multi-Agent Retail Comparison