New
- G. Murray, T. Kleinbauer, P. Poller, T. Becker, S. Renals and J. Kilgour (2009). Extrinsic Summarization Evaluation: A Decision Audit Task, ACM Trans on Speech and Language Processing, 6(2):1-29. [pdf].
- Three new PhD students: Karl Isaac (working on MultiMemoHome), Liang Lu (working on SCALE) and Erich Zwyssig (working with EADS Innovation Works)
- Two Interspeech-09 papers: Songfang Huang and Steve Renals, "A Parallel Training Algorithm for Hierarchical Pitman-Yor Process Language Models" [pdf]; Ravichander Vipperla, Maria Wolters and Steve Renals: "Age Recognition for Spoken Dialogue Systems: Do We Need It?" [pdf].
- EPSRC funded project MultiMemoHome with Glasgow and Queen Margaret University (2009-2013).
- H. Cuayahuitl, S. Renals, O. Lemon and H. Shimodaira (2009). Evaluation of a hierarchical reinforcement learning spoken dialogue system, Computer Speech and Language, 24:395-429. [pdf]
- J. Yamagishi, T. Nose, H. Zen, Z. Ling, T. Toda, K. Tokuda, S.King and S. Renals (2009). Robust Speaker-Adaptive HMM-based Text-to-Speech Synthesis, IEEE Trans. Audio, Speech and Language Processing, 17:1208-1230. [pdf]
- Juicer - BSD licensed WFST decoder from AMI(DA) [not that I wrote a line of the code!]
Research
I'm interested in understanding human communication using machine learning and statistical models, and constructing systems that can recognize and interpret communication scenes. My research career is grounded in speech processing, and our approaches start from the signals.
Speech Recognition and Synthesis
How can we improve large vocabulary speech recognition? We are looking at better acoustic models that are discriminatively trained, that are better adapted or normalised to new domains or speakers, or that use improved spectral representations. Years of research have proven that it is difficult to improve upon appropriately smoothed n-grams for speech recognition - but we believe that non-parametric Bayesian models have some new things to offer. More recently, I've become interested in models, such as trajectory HMMs, that may be used for both recognition and synthesis. Current research students in speech recognition and synthesis include Songfang Huang (Bayesian language modelling), Joao Cabral (source models for HMM speech synthesis), Ravichander Vipperla (speech recognition for voices that display ageing), Erich Zwyssig and Liang Lu.Read more...
Multimodal Interaction
Human communication is factored across more than one modality. The analysis and interpretation of multimodal interaction presents a number of challenges, ranging from ways to model multiple asynchronous streams of data to the construction of systems that can interpret aspects of multiparty meetings. A lot of this work is about augmenting communication in meetings (in the AMI and AMIDA Integrated Projects); we are also interested in the development of systems for home care. Current students in multimodal interaction include Karl Isaac.Read more...
Projects
Current research projects include the AMI and AMIDA Integrated Projects, MATCH, EMIME, EdSST, SSPNet, SCALE, the Edinburgh Speech Production Facility and MMH.Opportunities
We are looking for excellent research students: see the page about PhD opportunities at CSTR. I am not looking for undergraduate interns for the foreseeable future.Teaching
This year I am teaching Informatics 2B (jointly with Kyriakos Kalorkoti) and Automatic Speech Recognition (jointly with Hiroshi Shimodaira).