This companion page to paper  presents some randomly selected audio examples from our listening test. The stimuli illustrate the effects of conventional versus robust DNN-based duration prediction from found audiobook data (“Emma” by Jane Austen). Please read the paper for more information, including descriptions of the different systems and their properties.
Note: Should you experience problems with hearing the audio, please wait a while to allow the audio data to load (3.8 MiB). If playback still does not work, please try another web browser. (It is rumoured that Internet Explorer cannot play audio objects in wav format.)
- G. E. Henter, S. Ronanki, O. Watts, M. Wester, Z. Wu and S. King, “Robust TTS duration modelling using DNNs,” Proc. ICASSP, 2016, pp. 5130–5134.
[ pdf | more info ]