Comparison of Unsupervised Adaptation for Speaker-adaptive HMM-based Speech Synthesis (Finnish samples)


Sample 1 (Synthetic speech generated from speaker-dependent HMMs )

Speaker-adaptive HTS voices
Sentences
5
10
20
30
50
100
Supervised adaptation
Unsupervised adaptation (1st pass ASR)
Unsupervised adaptation (2nd pass ASR)



Sample 2 (Synthetic speech generated from speaker-dependent HMMs )

Speaker-adaptive HTS voices
Sentences
5
10
20
30
50
100
Supervised adaptation
Unsupervised adaptation (1st pass ASR)
Unsupervised adaptation (2nd pass ASR)


System configuration
Training data for average voice model: Finnish Speecon database
Adaptation data: various amounts of adaptation data included in Finnish MV database
Model: state-tied context-dependent MSD-HSMMs
Adaptation: CSMAPLR+MAP
Acoustic features: STRAIGHT spectral parameters above, logF0 and aperiodicity + their delta and delta-delta