I'm a professor in the School of Informatics at the University of Edinburgh. I'm affiliated with the Institute for Communicating and Collaborative Systems and the Edinburgh Natural Language Processing Group.

My research focuses on probabilistic learning techniques for natural language understanding and generation. I am interested in the the general problem of extracting semantic information from large volumes of text. This is a challenging task for at least three reasons. Firstly, a lot of intended meaning is implicit and must be inferred as corpora usually do not contain much information beyond the words of a language. Secondly, ambiguity is prevalent in natural language, and although humans deal with it effectively, computational models must have specialized strategies for representing meaning distinctions and narrowing down the range of possible meanings invoked by ambiguous words. Thirdly, there are many alternative ways to convey the same information. A method of identifying whether two words or phrases have similar meanings is paramount for any model of lexical meaning.

My work explores primarily unsupervised methods that learn directly from text samples rather than from human-annotated data. Within the general area of empirical semantic processing, I have worked on the development of graph-theoretic algorithms for the assignment of semantic roles and word sense disambiguation, the creation of knowledge lean algorithms for sentence ordering, discourse segmentation and chunking, and the automatic identification of the temporal order of events in texts.

The main theme underlying my generation work concerns the acquisition of paraphrases and their application to text rewriting. The latter is an important component in many applications that extract and synthesize information. Examples include summarization, but also sentence compression, text simplification and question answering. The idea is to develop a rewriting framework that is not application-specific but can be tailored to different tasks and user groups. The framework uses integer linear programming (ILP), a technique for solving discrete optimization problems that offers great modeling flexibility and allows to incorporate syntactic, semantic, and discourse-based information.