Peter Bell – research

See also my main page. Inevitably, this page sometimes gets out of date!

Research interests

My primary research focus is in automatic speech recognition (ASR). An underpinning theme of my research is training acoustic models in low-data conditions, often in a weakly supervised or unsupervised manner.

Topics I am particularly interested in:

methods for cross-domain and cross-lingual adaptation
regularisation methods, particularly multi-task learning
algorithms for efficient alignment, search and decoding on audio data
lightly supervised, semi-supervised and unsupervised training methods for ASR
end-to-end and raw-waveform methods
ASR systems for minority or under-resourced languages
speech representation learning
audio-visual ASR and speech enhancement

Current PhD student projects

End-to-end ASR

Shucong Zhang is funded by Toshiba to work on attention-based E2E systems for ASR in multi-speaker environments; Andrea Carmantini is funded by Samsung to investigate speaker adaptation in such systems; and Zeyu Zhao is researching interal LM adaptation. Jie Chi is a member of the CDT in NLP, working on multi-lingual and code-switching E2E ASR.

Adaptation/transfer learning

Joachim Fainberg, funded by Bloomberg, looked at methods for domain and speaker adaptation and lightly supervised training. Joanna Rownicka, funded by Ericsson and DataLab, studied interpretable feature representations derivived from deep architectures such as VDCNNs. Ondrej Klejch has developing fully automatic methods for adapation based on meta-learning; he is now a post-doctoral researcher investigating zero-resource ASR for historically-marginalised language communities .

Spoken language understanding

Sarenne Wallbridge works on self-supervised representation learning to better understand the non-lexical aspects of spoken language. I am second supervisor for Yuanchao Li, who is studying emotion recognition.

Speaker diarization

Chau Luu works on diarization for diverse data, supported partially by the BBC.

Research projects

CoG-MHEAR

CoG-MHEAR is a EPSRC programme grant under the emerging healthcare technologies theme, aiming to develop the next generation of multi-model hearing aids. Our partners include Edinburgh Napier University, Glasgow, Nottingham and Manchester.

Unmute

I lead this EPSRC-funded collaboration with HCI researchers at Swansea, which aims to develop novel approaches to completely unsupervised ASR to address the limitations of today's speech and voice-based interactions and open up intelligent interfaces to the currently digitally 'unheard' in language communities in South Africa and India. Working on this project at Edinburgh are Ondrej Klejch and Electra Wallington.

SpeechWave

SpeechWave aims to study methods to replace conventional signal-processing modules of ASR with convolutional and recurrent neural network architectures that operate directly on raw waveforms. This is work with Steve Renals, Erfan Loweimi and Yumnah Mohamied at Edinburgh, and Zoran Cvetkovic and colleagues at KCL.

MATERIAL

We provide low-resource language ASR for the IARPA MATERIAL programme as part of the SCRIPTS team with partners at the universities of Maryland, Columbia, Yale and Cambridge. In this programme, it is necessary to build ASR systems for multi-genre data in low-resourced languages – initially Swahili, Tagalog and Somali – with a small mismatched corpus of conversational speech as the only transcribed training data provided. We work alongside Kenneth Heafield's machine translation team here at Edinburgh

SUMMA

Inspired by our GlobalVox prototype developed at a BBC News Labs newsHACK event, the EU-funded SUMMA project integrates stream-based media processing tools (including speech recognition and machine translation) with deep language understanding capabilities (including named entity relation extraction and semantic parsing) for automatic monitoring of multilingual news media sources.

Natural Speech Technology

I worked on speech recognition within the EPSRC-funded, Natural Speech Technology project, a five-year programme grant held jointly with the University of Cambridge and University of Sheffield. I was particularly involved in the themes of structuring diverse data and generating systems with wide domain coverage, for example, through the use of adaptive neural network features. One of our primary use-cases was recognition of broadcast data from the BBC (See my publications for more details.) I am an organiser of the MGB challenge, which first featured as an official challenge at ASRU 2015.

Gaelic speech recognition

I ran a project to create the first speech recogniser for Scottish Gaelic, funded by iDEA lab. The scarcity of resources for this language makes the task challenging, and is a good testing ground for recent advances in cross-lingual speech recognition. Later, I worked on this data in collaboration with Ramya Rasipuram and Mathew Magimai-Doss at IDIAP.

We collected a 6-hour corpus of spoken gaelic from BBC Radio nan Gàidheal's Aithris na Maidne, fully transcribed at utterance level to modern digital standards. The corpus is available to interested researchers on request.

Pre-graduation research

Full covariance modelling

For my PhD thesis I investigated the use of full covariance gaussian models for speech recognition. Full covariance models have hugely increased modelling power compared to the standard diagonal covariance model, but suffer from a number of deficiences when the quantity of training data is limited. The problem is essentially one of generalisation. I investigated two solutions: imposing sparse Gaussian Graphical Model structure on the covariance matrices by using l₁-norm penalised likelihood maximisation; and by the use of a "shrinkage estimator".

My PhD supervisor was Prof. Simon King.

MPhil research

For my master's thesis, I investigated methods for adapting prosodic phrasing models, supervised by Tina Burrows (then Toshiba Research Cambridge) and Paul Taylor (then Cambridge University Engineering Department).