Research

My research is about the development of interactive systems that can understand human communication. A lot of this work is grounded in speech recognition, and is based on building and applying statistical models to interpret communication signals.

Over the past decade this work has grown beyond speech transcription, to include approaches to interpreting and accessing information from speech, and multimodal interaction. Since 2002, a lot of our work has focussed on the recognition and interpretation of multiparty meetings, as part of the M4, AMI and AMIDA projects.

Speech Recognition and Synthesis

I am interested in developing better models for speech recognition, trainable from large amounts of data. I have worked in most aspects of speech recognition including discriminative acoustic modelling, ongoing attempts to develop language models that work significantly better than smoothed n-grams, efficient search, and HMM-based synthesis and other models that are useful for both recognition and synthesis.

Acoustic modelling

State-of-the-art generative models of speech acoustics, based on HMMs, are scalable to huge amounts of training data, and are surprisingly accurate in practice. However, there are many drawbacks to standard HMM approaches which may be addressed by discriminative training, richer spectral representations, and better dynamic models of speech. In the past I did a lot of work on connectionist/HMM hybrids, and there is still plenty of interest about such models, especially given recent interest in models such as conditional random fields. Recent work has included alternative discriminative approaches, the use of pitch-synchronous acoustic representations and trajectory HMMs.

Selected Publications

Language modelling

Fred Jelinek's keynote at Eurospeech '91 was entitled Up from trigrams! The struggle for improved language models. Over 15 years later most state-of-the-art large vocabulary speech recognition systems still use smoothed trigram or 4-gram language models.... The struggle continues, however, and we are interested in hierarchical Bayesian approaches which can provide a framework for the inclusion of additional variables for language modelling.

Selected Publications

LVCSR Search and Systems

Building speech recognition systems is fun and during the 1990s we worked very hard on a connectionist/HMM hybrid system, ABBOT. I enjoyed writing decoders then (and still would, if I had the time...) - the NOWAY decoder was designed to decode 20,000 word WSJ sentences in realtime (on a 120MHz pentium and 64-96Mb RAM!)

Selected Publications

Speech synthesis

The biggest innovation in speech technology over the past decade has been the development of the trajectory HMM, and HTS the HMM-based speech synthesis system, by Tokuda and co-workers at NITech.

Selected Publications

Acoustic-articulatory models

What can we infer about the state of the articulatory system from the acoustic signal? This is an intriguing machine learning problem - and solutions are likely to benefit recognition and synthesis.

Selected Publications

Multimodal Interaction

Interaction and communication is multimodal. We have developed an instrumented meeting room to capture human communication in meetings across multiple modalities, and are working on automatic approaches to recognize, interpret and structure meetings. The annual MLMI workshops (Machine Learning for Multimodal Interfaces) are an effort to advance progress in this area.

Meetings

In the AMI and AMIDA projects we are interested in recognizing, interpreting, summarizing and structuring multiparty meetings. Summarization, dialog act recognition, meeting phase segmentation are examples of things that we are pursuing, along with meeting speech recognition.

Selected Publications

Information Access from Speech

In addition to work on meetings and multimodal interaction, when at Sheffield we constructed systems for spoken document retrieval, named entity identification, summarization and automatic segmentation of speech such as broadcast news and voicemail. In the late 1990s we put in a good deal of effort to develop systems for NIST evaluations in these areas.

Selected Publications