Comparison of spectral representation for speaker-adaptive HMM-based speech synthesis


Speaker 4oa (Real speech )

Speaker-adaptive HTS voices
Order
12+C0
24+C0
39+C0
49+C0
59+C0
Mel-cepstrum
Mel-generalized cepstrum
MGC-LSP

Copy synthesis
Order
12+C0
24+C0
39+C0
49+C0
59+C0
Mel-cepstrum
Mel-generalized cepstrum
MGC-LSP


Target Speaker 4oi (Real speech )

Speaker-adaptive HTS systems
Order
12+C0
24+C0
39+C0
49+C0
59+C0
Mel-cepstrum
Mel-generalized cepstrum
MGC-LSP

Copy synthesis
Order
12+C0
24+C0
39+C0
49+C0
59+C0
Mel-cepstrum
Mel-generalized cepstrum
MGC-LSP


System configuration
Training data for average voice model:SI-84 set of WSJ corpora
Adaptation data: 40 'block adaptation' sentences included in November 1993 CSR H2 task
Model: state-tied context-dependent MSD-HSMMs
Adaptation: CSMAPLR+MAP
Acoustic features: STRAIGHT spectral parameters above, logF0 and aperiodicity + their delta and delta-delta