Thousands of Voices and Geographical GUI for HMM-based Speech Synthesis
Our robust speaker-adaptive speech synthesis system can generate the voice of any speaker. It only requires a small amount of data from each speaker because it uses model adaptation. This means that it is now possible to create a virtually unlimited number of different voices.

Googlemap-details
(Click here to start this demo)

In fact, we believe this is the largest known collection of synthetic voices in existence. We built so many voices (1500+ voices built on ASR corpora plus several voices built on TTS corpora using the same techniques) that it became impossible to represent them in list or table form. Instead, we devised an interactive geographical representation, shown above. Each marker corresponds to an individual speaker. Blue markers show male speakers and red markers show female speakers. Some markers are in arbitrary locations (in the correct country) because precise location information is not available for all speakers. Then right box shows list of speakers that user can choose with speakers’ gender and nationality. This is based on Google Maps and AJAX Language (Translation) APIs as well as our Festival TTS system running on a University of Edinburgh server. Clicking on a marker will play synthetic speech from that speaker. Currently the interactive mode supports all English and some of the Spanish voices. For other languages only pre-synthesised examples are available, but we plan to add an interactive text-to-speech feature in the very near future.

What's more, the method is almost completely automatic and can even work from existing recordings such as speeches, movies, TV and podcasts. This will enable new applications of text-to-speech technology. Please click [CELEB] section on the demo!

For details, please refer to the following journal paper published from IEEE:

J. Yamagishi, B. Usabaev, S. King, O. Watts, J. Dines, J. Tian, R. Hu, Y. Guan, K. Oura, K. Tokuda, R. Karhila, M. Kurimo, “Thousands of Voices for HMM-based Speech Synthesis -- Analysis and Application of TTS Systems Built on Various ASR Corpora”, IEEE Audio, Speech, & Language Processing, 2010