Peter Bell – research

See also my main page.

Research interests

My primary research focus is in acoustic modelling for automatic speech recognition (ASR). An underpinning theme of my research is training acoustic models in low-data conditions.

Topics I am particularly interested in:

Current PhD student projects

End-to-end ASR

Shucong Zhang is funded funded by Toshiba to work on attention-based E2E systems for ASR in multi-speaker environments; Andrea Carmantini is funded by Samsung to investigate speaker adaptation in such systems; and Jie Chi is a member of the CDT in NLP, working on multi-lingual E2E ASR.

Adaptation/transfer learning

Joachim Fainberg, funded by Bloomberg, is is looking at methods for domain and speaker adaptation and lightly supervised training. Joanna Rownicka, funded by Ericsson and DataLab, studies interpretable feature representations derivived from deep architectures such as VDCNNs. Ondrej Klejch has been developing fully automatic methods for adapation based on meta-learning.

Speaker diarization

Chau Luu works on diarization for diverse data, supported partially by the BBC.

Research projects


SpeechWave aims to study methods to replace conventional signal-processing modules of ASR with convolutional and recurrent neural network architectures that operate directly on raw waveforms. This is work with Steve Renals, Erfan Loweimi at Edinburgh, and Zoran Cvetkovic and colleagues at KCL.


We provide low-resource language ASR for the IARPA MATERIAL programme as part of the SCRIPTS team with partners at the universities of Maryland, Columbia, Yale and Cambridge. In this programme, it is necessary to build ASR systems for multi-genre data in low-resourced languages – initially Swahili, Tagalog and Somali – with a small mismatched corpus of conversational speech as the only transcribed training data provided. We work alongside Kenneth Heafield's machine translation team here at Edinburgh


Inspired by our GlobalVox prototype developed at a BBC News Labs newsHACK event, the EU-funded SUMMA project integrates stream-based media processing tools (including speech recognition and machine translation) with deep language understanding capabilities (including named entity relation extraction and semantic parsing) for automatic monitoring of multilingual news media sources.

Natural Speech Technology

I worked on speech recognition within the EPSRC-funded, Natural Speech Technology project, a five-year programme grant held jointly with the University of Cambridge and University of Sheffield. I was particularly involved in the themes of structuring diverse data and generating systems with wide domain coverage, for example, through the use of adaptive neural network features. One of our primary use-cases was recognition of broadcast data from the BBC (See my publications for more details.) I am an organiser of the MGB challenge, which first featured as an official challenge at ASRU 2015.

Gaelic speech recognition

I ran a project to create the first speech recogniser for Scottish Gaelic, funded by iDEA lab. The scarcity of resources for this language makes the task challenging, and is a good testing ground for recent advances in cross-lingual speech recognition. Later, I worked on this data in collaboration with Ramya Rasipuram and Mathew Magimai-Doss at IDIAP.

We collected a 6-hour corpus of spoken gaelic from BBC Radio nan GĂ idheal's Aithris na Maidne, fully transcribed at utterance level to modern digital standards. The corpus is available to interested researchers on request.

Pre-graduation research

Full covariance modelling

For my PhD thesis I investigated the use of full covariance gaussian models for speech recognition. Full covariance models have hugely increased modelling power compared to the standard diagonal covariance model, but suffer from a number of deficiences when the quantity of training data is limited. The problem is essentially one of generalisation. I investigated two solutions: imposing sparse Gaussian Graphical Model structure on the covariance matrices by using l1-norm penalised likelihood maximisation; and by the use of a "shrinkage estimator".

My PhD supervisor was Prof. Simon King.

MPhil research

For my master's thesis, I investigated methods for adapting prosodic phrasing models, supervised by Tina Burrows (then Toshiba Research Cambridge) and Paul Taylor (then Cambridge University Engineering Department).