Postdoc position for JST CREST uDialogue project

The School of Informatics, University of Edinburgh invites applications for a Postdoctoral Research Associate in Speech Technology supported by the JST CREST grant uDialogue. uDialogue is a joint project with the Nagoya Institute of Technology in Japan, funded by the Japan Science and Technology Agency (JST). The overall goal of the uDialogue project is the development of spoken dialogue systems based on user-generated content, and the project contains research on speech synthesis, speech recognition, and spoken dialogue. You will be concerned with various research problems of dialogue content processing, such as multiparty dialogue systems and learning of the structure of the dialogues. You will also be concerned with the development of spoken dialogue systems including avatars for UK English in the uDialogue project.

You should have or be near completion a PhD in speech processing, computer science, linguistics, engineering, mathematics, or a related discipline. You must have a background in statistical modelling and machine learning, research experience in speech recognition, speech synthesis and/or dialogue, excellent programming skills, and research publications in international journals or conferences.

Experience in acoustic modelling or language modelling for speech recognition, speech synthesis or spoken dialogue systems is essential. A background in one or more of the following areas is also desirable: multilingual speech recognition; adaptation techniques for acoustic or language modelling; experience of the design, construction and evaluation of speech recognition, speech synthesis systems or dialogue system; distant speech recognition; microphone array; dialogue management and learning; familiarity with software tools including HTK, Julius, Kaldi, HTS, or Festival; and; familiarity with computer graphics and its programming skill such as openGL.

For further information, see


I have been elected as a new member of IEEE Speech Processing Society (SPS) Speech and Language Technical Committee (term 2013-2015). Thank you very much!

Articulate: The Art and Science of Speech Synthesis

There will be a series of interactive exhibits illustrating different aspects of the speech synthesis technology, called "the Articulate" road show in December 2012.


Venues are

City Screen York
Monday 3 December 

Sheffield Winter Garden
Tuesday 4 December

Hull - Hull Truck Theatre
Wednesday 5 December

CSTR also demonstrates an exhibit related to speech synthesis: "It ain't what you say, it's the way that you say it".

Synthetic speech is often characterized by a monotonic, flat presentation. Modern text-to-speech systems are capable of much more: expression can be inserted in the speech. One question is how to control the expression. In this display a person will hear a monologue, but be able to control the way the speech sounds by way of gestures that they will make with their body.

What is this? Please try yourself in December!

Credit: Rob Clark, Magdalena Konkiewicz Anna, Maria Astrinaki

Whisper speech recognition in noise

A demo video for our recent work on whisper speech recognition in noise using NAM microphone and VTS


The detail of this work will be presented at ISCSLP 2012 in Hong Kong

Credit: Chen-Yu Yang and
Georgina Brown

PhD studentship

One fully-funded PhD studentship at the Centre for Speech Technology Research, University of Edinburgh, UK.

The Centre for Speech Technology Research at University of Edinburgh invites applications for one PhD studentship in speech and language processing. The PhD studentship is supported by the SNSF SIWIS Project. This is a speech to speech translation project funded by the Swiss National Science Foundation, involving partners from: Idiap (Martigny), ETH (Zurich), University of Geneva and The University of Edinburgh (UK) The topics of the PhD project at the University of Edinburgh include prosody modelling for speech synthesis and speech-to-speech translation. It will involve working jointly with the other partners and working with the five Swiss languages.

Suitable candidates will have a good undergraduate or Masters degree in Computer Science, Mathematics or a related subject and a good understanding of probability theory, signal processing, linguistics, and machine learning. Specific expertise in the algorithms behind HMM-based speech synthesis, speech recognition or Bayesian networks would be a advantage.

See here for further details

A new journal paper

A new journal paper has been published.

P.L. De Leon, M. Pucher, J. Yamagishi, I. Hernaez, I. Saratxaga
"Evaluation of Speaker Verification Security and Detection of HMM-based Synthetic Speech"
IEEE Audio, Speech, & Language Processing, vol. 20, no. 8 Oct 2012

In this paper, we evaluate the vulnerability of speaker verification (SV) systems to synthetic speech. The SV systems are based on either the Gaussian mixture model–universal background model (GMM-UBM) or support vector machine (SVM) using GMM supervectors. We use a hidden Markov model (HMM)-based text-to-speech (TTS) synthesizer, which can synthesize speech for a target speaker using small amounts of training data through model adaptation of an average voice or background model. Although the SV systems have a very low equal error rate (EER), when tested with synthetic speech generated from speaker models derived from the Wall Street Journal (WSJ) speech corpus, over 81% of the matched claims are accepted. This result suggests vulnerability in SV systems and thus a need to accurately detect synthetic speech. We propose a new feature based on relative phase shift (RPS), demonstrate reliable detection of synthetic speech, and show how this classifier can be used to improve security of SV systems.

A new journal paper

A new journal paper has been published.

K. Hashimoto, J. Yamagishi, W. Byrne, S. King, K. Tokuda
"Impact of machine translation and speech synthesis on speech-to-speech translation"
Speech Communication, vol.54, issue 7, pp. 857--866, September 2012


This paper analyzes the impacts of machine translation and speech synthesis on speech-to-speech translation systems. A typical speech-to-speech translation system consists of three components: speech recognition, machine translation and speech synthesis. Many techniques have been proposed for integration of speech recognition and machine translation. However, corresponding techniques have not yet been considered for speech synthesis. The focus of the current work is machine translation and speech synthesis, and we present a subjective evaluation designed to analyze their impact on speech-to-speech translation. The results of these analyses show that the naturalness and intelligibility of the synthesized speech are strongly affected by the fluency of the translated sentences. In addition, several features were found to correlate well with the average fluency of the translated sentences and the average naturalness of the synthesized speech.

LISTA workshop and invited talk

The LISTA (listening talker) workshop was held on Wednesday 2nd and Thursday 3rd May in Edinburgh. I have had an invited talk on the intelligibility of HMM-based speech synthesis in noisy conditions and have introduced a few techniques for improving the intelligibility of HMM-based speech synthesizer. (Slides)

Pasted Graphic

NST annual meeting

The annual meeting of EPSRC NST (Natural Speech Technology) project was held in Edinburgh on Tuesday 24th April.


A new journal paper

A new journal paper has been published.

K. Oura, J. Yamagishi, M. Wester, S. King, K. Tokuda
"Analysis of unsupervised cross-lingual speaker adaptation for HMM-based speech synthesis using KLD-based transform mapping"
Speech Communication, Vol 54, Issue 6, pp.704-714, July 2012


In the EMIME project, we developed a mobile device that performs personalized speech-to-speech translation such that a user’s spoken input in one language is used to produce spoken output in another language, while continuing to sound like the user’s voice. We integrated two techniques into a single architecture: unsupervised adaptation for HMM-based TTS using word-based large-vocabulary continuous speech recognition, and cross-lingual speaker adaptation (CLSA) for HMM-based TTS. The CLSA is based on a state-level transform mapping learned using minimum Kullback–Leibler divergence between pairs of HMM states in the input and output languages. Thus, an unsupervised cross-lingual speaker adaptation system was developed. End-to-end speech-to-speech translation systems for four languages (English, Finnish, Mandarin, and Japanese) were constructed within this framework. In this paper, the English-to-Japanese adaptation is evaluated. Listening tests demonstrate that adapted voices sound more similar to a target speaker than average voices and that differences between supervised and unsupervised cross-lingual speaker adaptation are small. Calculating the KLD state-mapping on only the first 10 mel-cepstral coefficients leads to huge savings in computational costs, without any detrimental effect on the quality of the synthetic speech.

ICASSP 2012 and CREST meeting

We have participated in ICASSP 2012 and JST CREST symposium in Kyoto, Japan


2nd uDialogue meeting in Nagoya

The second uDialogue meeting was held on Thursday 22nd and Friday 23rd March in Nagoya, Japan


2月28日のNHKクローズアップ現代「思いが伝わる声を作れ ~初音ミク 歌声の秘密~」の後半VTRにおいて、運動ニューロン疾患(MND/ALS)、パーキンソン病(PD)、多発性硬化症(MS)などの脳神経変性病患者のための音声合成技術を紹介して頂きました。本技術を多くの方に知って頂き大変嬉しい限りです。


山岸順一, C. Veaux, S. King, S. Renals,
(解説)音声の障害患者のための音声合成技術 – Voice banking and reconstruction
日本音響学会誌6712, pp587-592, 2011

Speech synthesis technologies for individuals with vocal disabilities: Voice banking and reconstruction
Junichi Yamagishi, Christophe Veaux, Simon King and Steve Renals
Acoustical Science and Technology
Vol. 33 (2012) , No. 1 pp.1-5

出演者の後藤さんによりますと、「明日からNHKオンデマンド 2週間視聴でき、 にテキストと画像で内容が掲載」されるそうです。

IAST workshop on innovation and applications in speech technology

There is a workshop called " IAST: innovation and applications in speech technology" in Dublin (

Christophe Veaux will give a presentation titled "Voice reconstruction for individuals with vocal disabilities"

Two pictures of recent recordings

Two pictures of recent interesting recordings


Articulation data recording of Lombard speech (speech in noise)

Electromagnetic Articulograph
B&K Head and Torso Simulator

Credit: Julian Villega (University of Basque Country, Spain)


Whisper speech recording via the NAM microphone in anechoic chamber

NAM (Non audible murmur) microphone
DPA microphone

Credit: Georgina Brown and Chenyu

LISTA meeting in Crete

A meeting for FP7 EC Project "LISTA" was held in Crete, Greece


A picture taken at FORTH (Foundation for Research & Technology) building

Three journal papers

Three new journal papers were published and two of them are available to see online.

Analysis of unsupervised cross-lingual speaker adaptation for HMM-based speech synthesis using KLD-based transform mapping
Keiichiro Oura, Junichi Yamagishi, Mirjam Wester, Simon King, Keiichi Tokuda,
Speech Communication, Available online 5 January 2012

Speech synthesis technologies for individuals with vocal disabilities: Voice banking and reconstruction
Junichi Yamagishi, Christophe Veaux, Simon King and Steve Renals
Acoustical Science and Technology
Vol. 33 (2012) , No. 1 pp.1-5

山岸順一, C. Veaux, S. King, S. Renals,
(解説)音声の障害患者のための音声合成技術 – Voice banking and reconstruction
日本音響学会誌6712, pp587-592, 2011

NST meeting in Cambridge

The NST (natural speech technology) meeting was held at the University of Cambridge

Dinner at Peterhouse

Roberto's PhD Viva

There was a very successful PhD Viva for Roberto Barra Chicote on Tuesday 20th December 2011 in Madrid, Spain and I participated in the Viva as an invited external examiner. Congratulations, Roberto!!