This companion page to paper [1] presents some audio examples of manipulated speech stimuli. The manipulations illustrate the perceptual effects of certain modelling assumptions in statistical parametric speech synthesis in an otherwise highly accurate model. Please read the paper for more information, including descriptions of the different conditions and their interpretation.

Note: Headphones are highly recommended to properly hear the differences! Should you experience problems with hearing the audio, please wait a while to allow the audio data to load (1.8 MiB). If playback still does not work, please try another web browser. (It is rumoured that Internet Explorer cannot play audio objects in wav format.)

References

  1. G. E. Henter, T. Merritt, M. Shannon, C. Mayo and S. King, “Measuring the perceptual effects of modelling assumptions in speech synthesis using stimuli constructed from repeated natural speech,” Proc. Interspeech, 2014, pp. 1504–1508.
    [ pdf | more info ]

Hvd 004: “These days a chicken leg is a rare dish.”

Sampling-based generation:
Baselines:
Stream independence:
Filter coefficient independence:
Mean-based generation:
Averaging:

Hvd 039: “The meal was cooked before the bell rang.”

Sampling-based generation:
Baselines:
Stream independence:
Filter coefficient independence:
Mean-based generation:
Averaging:

[ back to main page | contact the author ]