Dr Catherine Lai
Lecturer in Speech and Language Technology
Linguistics and English Language
School of Philosophy, Psychology and Language Sciences University of Edinburgh![]()
I am a lecturer (~assistant professor) based in the Centre for Speech Technology Research at the University of Edinburgh. I work in Linguistics and English Language in the School of Philosophy, Psychology and Language Sciences, at the University of Edinburgh, and am also affliated with the Institute for Language, Cognition and Computation in the School of Informatics where I was previously a post-doc and senior researcher.
My research focuses on how we can use the non-lexical aspects of speech (e.g. speech prosody) to get at what speakers actually mean. I'm currently working mainly on topical and affective information extraction from conversational speech. I'm interested in this for both theoretical and practical reasons. On the one hand, I'm interested in developing speech technologies that can better take account of contextual variation. Before taking up my position in LEL, I worked on this topic with researchers at Toyota (in Europe and Japan) on the project 'Spoken Dialogue Processing for Robot companions'. The hope is that this sort of research will help make assistive technologies more robust, as well as provide new ways for linguists andother social scientists to explore speech and video data. On the other hand, I'm interested generally in understanding where non-lexical aspects of speech fit into linguistic theories. This means that I'm usually, in some way or another, working more generally on developing models of prosody in dialogue. I'm particularly interested in how we can use idea from topic modeling to get a more robust idea of the relationship between prosody, discourse structure, and information structure. Hopefully this will build some bridges between more theoretical and empirical approaches understanding to this complicated aspect of spoken communication.
I'm currently supervising Sarenne Wallbridge (modelling lexical and acoustic information in dialogue), and Yuanchao Li (speech emotion recognition), and second supervisor to Johannah O'Mahony (conversational text-to-speech), Pilar Oplustil (text-to-speech synthesis in context), Emelie van der Vreken (affective speech synthesis), and Nina Markl (speech technology and algorithmic bias), and Jie Chi (automatic speech synthesis and code switching). I recently co-supervised Leimin Tian, who worked on emotion recognition in dialogue.
I originally came to Edinburgh to work on the EU FP project InEvent, which looked at how automatic processing of audiovisual data could be used to aid browsing of large video archives. I mainly worked on how to features often associated with speaker affect, e.g. prosody and measures of participation, can be used for summarization and affect detection. I also ended up doing a bit of HCI evaluation. I was also a co-organizer of an IAD funded interdisciplinary network on Speech, Image and Social Media Data for the Social Sciences.
Before that, I was graduate student in the Department of Linguistics at the University of Pennsylvania. At Penn, I worked in the phonetics lab where I took advice from Jiahong Yuan, Mark Liberman, Florian Schwarz, and many other people. My dissertation was about where intonational features, particularly final pitch rises, fit in with semantic and pragmatic theories. I also worked on various other topics ranging from iterated learning in language change, gradability and modality in semantics, tone and stress in Chinese, and prosody of second language learners.
I'm currently part of the UKSpeech organizing committee. In the past, I have served on the Student Advisory Committee of the International Speech Communication Association (ISCA-SAC). I helped organize a workshop on New Tools and Methods for Very-Large-Scale Phonetics. I have also served of the organizing committee of the 2010 Young Researcher's Roundtable on Spoken Dialogue Systems (YRRSDS).
Quite a long time ago, I did a research masters at the University of Melbourne, Australia. I was part of the Language Technology Group where my supervisor was Steven Bird. Back in the day, I researched querying and manipulating linguistically annotated structured data.
Research Interests
Prosody, discourse and dialogue structure Affective Computing (emotion recognition) Multimodal language processing Speech processing for social science research and assistive technologiesContact
Office hours by appointment
email: <c . lai at ed ac uk>
-- Dennis (Monty Python's Holy Grail)