From Le Zhang

Main: Research

Research Interest

Currently I'm looking at the use of Trajectory Model in speech recognition. The title of my research proposal is Modelling Speech Dynamics with a Trajectory Model.

Motivation of Research

Acoustic Modelling is embedded in a context that may look back over as many as forty years of cunning experiment and elaborate theory. Yet, to date, the state-of-the-art ASR system was still built on Hidden Markov Models, a relatively simple model that has been used for nearly three decades.

A major drawback of HMM is the so called "conditional independence assumption", i.e. the acoustic observation of each frame is modelled independently given the discrete state of that frame. While this assumption yields efficient algorithms for HMM, it is too restrictive to model the dynamic aspect of human speech. For instance, it is known that in human speech the same phoneme can be pronounced differently according to its surrounding phoneme context to ensure a smooth transition between syllables. This phenomenon, also called coarticulation in phonology, leads to strong correlations between adjacent speech segments and is difficult to model with an HMM.

Partly motivated by these insights, this research will investigate alternative acoustic models that can better model the temporal correlations in speech. More specifically, the following research issues will be addressed in this work:

    The highly dynamic nature of speech is due to the joint effect of the
    vocal tract movement, the motion of articulators, and the underlying
    neurological faculties which are responsible for the production of speech.
    A better understanding of the sources of acoustic dynamics will help
    derive proper acoustic models for ASR.

    Current HMM-based ASR systems try to capture short-term speech
    dynamics by appending feature derivatives to the acoustic
    vectors. This method, though works well in practice, contradicts the
    independence assumption made by HMMs. Alternative methods that properly
    handle the temporal constraints should warrant better performance.

    Many promising models such as Segment models and the Trajectory model
    are known to work well on small dataset, but a clear superiority in 
    performance with respect to ordinary HMMs on large speech recognition 
    task still remains to be shown.

The Proposal

In this research we propose to model the dynamic patterns of speech using a trajectory model (Keiichi Tokuda, 2004), which is a properly normalised version of HMM that models feature derivatives explicitly without imposing any conditional independence assumption. The use of trajectory models in speech recognition is still in its early stage although this model has been successfully applied to speech synthesis (Keiichi Tokuda et al., 2000).

The conditional independence assumption imposed by the Hidden Markov Models (HMMs) makes it difficult to model temporal correlation patterns in human speech. Traditionally, this limitation is circumvented by appending the first and second-order regression coefficients (dynamic features) to the acoustic feature vectors. Although this workaround leads to improved performance in speech recognition, we argue that a straightforward use of dynamic features in HMMs will result in an inferior model, which can be fixed by using a trajectory model that correctly handles the dynamic constraints. It can be shown that an HMM can be transformed into a trajectory model, by performing a per-utterance normalisation. In contrast to the band-diagonal temporal covariance matrix of an HMM, the new model has a full covariance matrix capable of modelling short range temporal dynamics of speech.

We hope this research will deepen our understanding of the statistical speech processing enterprise, which inevitably brings with it some insight into the nature of human speech production process.

Main Reference

Other Stuff

You are invited to have a look at my past research done at the Natural Language Processing Lab of Northeastern University. Here is the (slightly outdated) project description entry on CSTR's web page.

Retrieved from http://homepages.inf.ed.ac.uk/lzhang10/pmwiki/pmwiki.php/Main/Research
Page last modified on January 12, 2011, at 10:04 AM