Postdoc position for JST CREST uDialogue project
You should have or be near completion a PhD in speech processing, computer science, linguistics, engineering, mathematics, or a related discipline. You must have a background in statistical modelling and machine learning, research experience in speech recognition, speech synthesis and/or dialogue, excellent programming skills, and research publications in international journals or conferences.
Experience in acoustic modelling or language modelling for speech recognition, speech synthesis or spoken dialogue systems is essential. A background in one or more of the following areas is also desirable: multilingual speech recognition; adaptation techniques for acoustic or language modelling; experience of the design, construction and evaluation of speech recognition, speech synthesis systems or dialogue system; distant speech recognition; microphone array; dialogue management and learning; familiarity with software tools including HTK, Julius, Kaldi, HTS, or Festival; and; familiarity with computer graphics and its programming skill such as openGL.
For further information, see https://www.vacancies.ed.ac.uk/
IEEE SPS SLTC
Articulate: The Art and Science of Speech Synthesis
http://crestnetwork.org.uk/page/articulate-the-art-and-science-of-speech-synthes

Venues are
City Screen York
Monday 3 December
Sheffield Winter Garden
Tuesday 4 December
Hull - Hull Truck Theatre
Wednesday 5 December
CSTR also demonstrates an exhibit related to speech synthesis: "It ain't what you say, it's the way that you say it".
Synthetic speech is often characterized by a monotonic, flat presentation. Modern text-to-speech systems are capable of much more: expression can be inserted in the speech. One question is how to control the expression. In this display a person will hear a monologue, but be able to control the way the speech sounds by way of gestures that they will make with their body.
http://crestnetwork.org.uk/page/it-ain-t-what-you-say-it-s-the-way-that-you-say-
What is this? Please try yourself in December!
Credit: Rob Clark, Magdalena Konkiewicz Anna, Maria Astrinaki
Whisper speech recognition in noise

The detail of this work will be presented at ISCSLP 2012 in Hong Kong
http://www.iscslp2012.org
Credit: Chen-Yu Yang and Georgina Brown
PhD studentship
The Centre for Speech Technology Research at University of Edinburgh invites applications for one PhD studentship in speech and language processing. The PhD studentship is supported by the SNSF SIWIS Project. This is a speech to speech translation project funded by the Swiss National Science Foundation, involving partners from: Idiap (Martigny), ETH (Zurich), University of Geneva and The University of Edinburgh (UK) The topics of the PhD project at the University of Edinburgh include prosody modelling for speech synthesis and speech-to-speech translation. It will involve working jointly with the other partners and working with the five Swiss languages.
Suitable candidates will have a good undergraduate or Masters degree in Computer Science, Mathematics or a related subject and a good understanding of probability theory, signal processing, linguistics, and machine learning. Specific expertise in the algorithms behind HMM-based speech synthesis, speech recognition or Bayesian networks would be a advantage.
See here for further details
http://www.ed.ac.uk/schools-departments/informatics/postgraduate/fees/research-grant-funding/speechandlanguage
A new journal paper
P.L. De Leon, M. Pucher, J. Yamagishi, I. Hernaez, I. Saratxaga
"Evaluation of Speaker Verification Security and Detection of HMM-based Synthetic Speech"
IEEE Audio, Speech, & Language Processing, vol. 20, no. 8 Oct 2012
http://dx.doi.org/10.1109/TASL.2012.2201472
Abstract
In this paper, we evaluate the vulnerability of speaker verification (SV) systems to synthetic speech. The SV systems are based on either the Gaussian mixture model–universal background model (GMM-UBM) or support vector machine (SVM) using GMM supervectors. We use a hidden Markov model (HMM)-based text-to-speech (TTS) synthesizer, which can synthesize speech for a target speaker using small amounts of training data through model adaptation of an average voice or background model. Although the SV systems have a very low equal error rate (EER), when tested with synthetic speech generated from speaker models derived from the Wall Street Journal (WSJ) speech corpus, over 81% of the matched claims are accepted. This result suggests vulnerability in SV systems and thus a need to accurately detect synthetic speech. We propose a new feature based on relative phase shift (RPS), demonstrate reliable detection of synthetic speech, and show how this classifier can be used to improve security of SV systems.
A new journal paper
K. Hashimoto, J. Yamagishi, W. Byrne, S. King, K. Tokuda
"Impact of machine translation and speech synthesis on speech-to-speech translation"
Speech Communication, vol.54, issue 7, pp. 857--866, September 2012
http://dx.doi.org/10.1016/j.specom.2012.02.004
Abstract
This paper analyzes the
impacts of machine translation and speech synthesis on
speech-to-speech translation systems. A typical speech-to-speech
translation system consists of three components: speech
recognition, machine translation and speech synthesis. Many
techniques have been proposed for integration of speech recognition
and machine translation. However, corresponding techniques have not
yet been considered for speech synthesis. The focus of the current
work is machine translation and speech synthesis, and we present a
subjective evaluation designed to analyze their impact on
speech-to-speech translation. The results of these analyses show
that the naturalness and intelligibility of the synthesized speech
are strongly affected by the fluency of the translated sentences.
In addition, several features were found to correlate well with the
average fluency of the translated sentences and the average
naturalness of the synthesized speech.
LISTA workshop and invited talk

NST annual meeting

A new journal paper
K. Oura, J. Yamagishi, M. Wester, S. King, K. Tokuda
"Analysis of unsupervised cross-lingual speaker adaptation for HMM-based speech synthesis using KLD-based transform mapping"
Speech Communication, Vol 54, Issue 6, pp.704-714, July 2012
http://dx.doi.org/10.1016/j.specom.2011.12.004
Abstract
In the EMIME project, we
developed a mobile device that performs personalized
speech-to-speech translation such that a user’s spoken input in one
language is used to produce spoken output in another language,
while continuing to sound like the user’s voice. We integrated two
techniques into a single architecture: unsupervised adaptation for
HMM-based TTS using word-based large-vocabulary continuous speech
recognition, and cross-lingual speaker adaptation (CLSA) for
HMM-based TTS. The CLSA is based on a state-level transform mapping
learned using minimum Kullback–Leibler divergence between pairs of
HMM states in the input and output languages. Thus, an unsupervised
cross-lingual speaker adaptation system was developed. End-to-end
speech-to-speech translation systems for four languages (English,
Finnish, Mandarin, and Japanese) were constructed within this
framework. In this paper, the English-to-Japanese adaptation is
evaluated. Listening tests demonstrate that adapted voices sound
more similar to a target speaker than average voices and that
differences between supervised and unsupervised cross-lingual
speaker adaptation are small. Calculating the KLD state-mapping on
only the first 10 mel-cepstral coefficients leads to huge savings
in computational costs, without any detrimental effect on the
quality of the synthetic speech.
ICASSP 2012 and CREST meeting
2nd uDialogue meeting in Nagoya
クローズアップ現代
本技術の詳細は、下記の日本語解説記事および英語のレビュー論文でご覧になる事ができます。英語のレビュー論文は下記のURLより無料で見ることができます。
山岸順一, C. Veaux, S. King, S. Renals,
(解説)音声の障害患者のための音声合成技術 – Voice banking and reconstruction
日本音響学会誌67巻12号, pp587-592, 2011
Speech synthesis technologies for individuals with vocal disabilities: Voice banking and reconstruction
Junichi Yamagishi, Christophe Veaux, Simon King and Steve Renals
Acoustical Science and Technology
Vol. 33 (2012) , No. 1 pp.1-5
http://www.jstage.jst.go.jp/browse/ast/33/1/_contents
出演者の後藤さんによりますと、「明日からNHKオンデマンド https://www.nhk-ondemand.jp/program/P200800010100000/#/1/0/ で2週間視聴でき、 http://cgi4.nhk.or.jp/gendai/kiroku/detail.cgi?content_id=3166 にテキストと画像で内容が掲載」されるそうです。
IAST workshop on innovation and applications in speech technology
Christophe Veaux will give a presentation titled "Voice reconstruction for individuals with vocal disabilities"
Two pictures of recent recordings

Articulation data recording of Lombard speech (speech in noise)
Electromagnetic Articulograph
B&K Head and Torso Simulator
Credit: Julian Villega (University of Basque Country, Spain)

Whisper speech recording via the NAM microphone in anechoic chamber
NAM (Non audible murmur) microphone
DPA microphone
Credit: Georgina Brown and Chenyu
LISTA meeting in Crete

A picture taken at FORTH (Foundation for Research & Technology) building
Three journal papers
Analysis of unsupervised cross-lingual speaker adaptation for HMM-based speech synthesis using KLD-based transform mapping
Keiichiro Oura, Junichi Yamagishi, Mirjam Wester, Simon King, Keiichi Tokuda,
Speech Communication, Available online 5 January 2012
http://dx.doi.org/10.1016/j.specom.2011.12.004
Speech synthesis technologies for individuals with vocal disabilities: Voice banking and reconstruction
Junichi Yamagishi, Christophe Veaux, Simon King and Steve Renals
Acoustical Science and Technology
Vol. 33 (2012) , No. 1 pp.1-5
http://www.jstage.jst.go.jp/browse/ast/33/1/_contents
山岸順一, C. Veaux, S. King, S. Renals,
(解説)音声の障害患者のための音声合成技術 – Voice banking and reconstruction
日本音響学会誌67巻12号, pp587-592, 2011
NST meeting in Cambridge

Dinner at Peterhouse
Roberto's PhD Viva

