My primary research focus is in acoustic modelling for automatic speech recognition (ASR). An underpinning theme of my research is training acoustic models in low-data conditions.
Topics I am particularly interested in:
Shucong Zhang is funded funded by Toshiba to work on attention-based E2E systems for ASR in multi-speaker environments; Andrea Carmantini is funded by Samsung to investigate speaker adaptation in such systems; and Jie Chi is a member of the CDT in NLP, working on multi-lingual E2E ASR.
SpeechWave aims to study methods to replace conventional signal-processing modules of ASR with convolutional and recurrent neural network architectures that operate directly on raw waveforms. This is work with Steve Renals, Erfan Loweimi at Edinburgh, and Zoran Cvetkovic and colleagues at KCL.
Inspired by our GlobalVox prototype developed at a BBC News Labs newsHACK event, the EU-funded SUMMA project integrates stream-based media processing tools (including speech recognition and machine translation) with deep language understanding capabilities (including named entity relation extraction and semantic parsing) for automatic monitoring of multilingual news media sources.
I worked on speech recognition within the EPSRC-funded, Natural Speech Technology project, a five-year programme grant held jointly with the University of Cambridge and University of Sheffield. I was particularly involved in the themes of structuring diverse data and generating systems with wide domain coverage, for example, through the use of adaptive neural network features. One of our primary use-cases was recognition of broadcast data from the BBC (See my publications for more details.) I am an organiser of the MGB challenge, which first featured as an official challenge at ASRU 2015.
I ran a project to create the first speech recogniser for Scottish Gaelic, funded by iDEA lab. The scarcity of resources for this language makes the task challenging, and is a good testing ground for recent advances in cross-lingual speech recognition. Later, I worked on this data in collaboration with Ramya Rasipuram and Mathew Magimai-Doss at IDIAP.
We collected a 6-hour corpus of spoken gaelic from BBC Radio nan Gàidheal's Aithris na Maidne, fully transcribed at utterance level to modern digital standards. The corpus is available to interested researchers on request.
For my PhD thesis I investigated the use of full covariance gaussian models for speech recognition. Full covariance models have hugely increased modelling power compared to the standard diagonal covariance model, but suffer from a number of deficiences when the quantity of training data is limited. The problem is essentially one of generalisation. I investigated two solutions: imposing sparse Gaussian Graphical Model structure on the covariance matrices by using l1-norm penalised likelihood maximisation; and by the use of a "shrinkage estimator".My PhD supervisor was Prof. Simon King.
For my master's thesis, I investigated methods for adapting prosodic phrasing models, supervised by Tina Burrows (then Toshiba Research Cambridge) and Paul Taylor (then Cambridge University Engineering Department).