This companion page to paper [1] presents some randomly selected audio examples from our listening test. The stimuli illustrate the effects of conventional versus robust DNN-based duration prediction from found audiobook data (“Emma” by Jane Austen). Please read the paper for more information, including descriptions of the different systems and their properties.

Note: Should you experience problems with hearing the audio, please wait a while to allow the audio data to load (3.8 MiB). If playback still does not work, please try another web browser. (It is rumoured that Internet Explorer cannot play audio objects in wav format.)


Audio examples



System:


VOC
FRC
BOT
MSE
MLE1
MLE3
B75
B50
Prompt ID:
184

192

198

207

References

  1. G. E. Henter, S. Ronanki, O. Watts, M. Wester, Z. Wu and S. King, “Robust TTS duration modelling using DNNs,” Proc. ICASSP, 2016, pp. 5130–5134.
    [ pdf | more info ]

[ back to main page | contact the author ]