PhD studentship

One fully-funded PhD studentship at the Centre for Speech Technology Research, University of Edinburgh, UK.

The Centre for Speech Technology Research at University of Edinburgh invites applications for one PhD studentship in speech and language processing. The PhD studentship is supported by the SNSF SIWIS Project. This is a speech to speech translation project funded by the Swiss National Science Foundation, involving partners from: Idiap (Martigny), ETH (Zurich), University of Geneva and The University of Edinburgh (UK) The topics of the PhD project at the University of Edinburgh include prosody modelling for speech synthesis and speech-to-speech translation. It will involve working jointly with the other partners and working with the five Swiss languages.

Suitable candidates will have a good undergraduate or Masters degree in Computer Science, Mathematics or a related subject and a good understanding of probability theory, signal processing, linguistics, and machine learning. Specific expertise in the algorithms behind HMM-based speech synthesis, speech recognition or Bayesian networks would be a advantage.

See here for further details

NST annual meeting

The annual meeting of EPSRC NST (Natural Speech Technology) project was held in Edinburgh on Tuesday 24th April.


ICASSP 2012 and CREST meeting

We have participated in ICASSP 2012 and JST CREST symposium in Kyoto, Japan


2nd uDialogue meeting in Nagoya

The second uDialogue meeting was held on Thursday 22nd and Friday 23rd March in Nagoya, Japan


2月28日のNHKクローズアップ現代「思いが伝わる声を作れ ~初音ミク 歌声の秘密~」の後半VTRにおいて、運動ニューロン疾患(MND/ALS)、パーキンソン病(PD)、多発性硬化症(MS)などの脳神経変性病患者のための音声合成技術を紹介して頂きました。本技術を多くの方に知って頂き大変嬉しい限りです。


山岸順一, C. Veaux, S. King, S. Renals,
(解説)音声の障害患者のための音声合成技術 – Voice banking and reconstruction
日本音響学会誌6712, pp587-592, 2011

Speech synthesis technologies for individuals with vocal disabilities: Voice banking and reconstruction
Junichi Yamagishi, Christophe Veaux, Simon King and Steve Renals
Acoustical Science and Technology
Vol. 33 (2012) , No. 1 pp.1-5

出演者の後藤さんによりますと、「明日からNHKオンデマンド https://www.nhk-ondemand.jp/program/P200800010100000/#/1/0/ 2週間視聴でき、 http://cgi4.nhk.or.jp/gendai/kiroku/detail.cgi?content_id=3166 にテキストと画像で内容が掲載」されるそうです。

IAST workshop on innovation and applications in speech technology

There is a workshop called " IAST: innovation and applications in speech technology" in Dublin (http://muster.ucd.ie/workshops/iast/)

Christophe Veaux will give a presentation titled "Voice reconstruction for individuals with vocal disabilities"

Two pictures of recent recordings

Two pictures of recent interesting recordings


Articulation data recording of Lombard speech (speech in noise)

Electromagnetic Articulograph
B&K Head and Torso Simulator

Credit: Julian Villega (University of Basque Country, Spain)


Whisper speech recording via the NAM microphone in anechoic chamber

NAM (Non audible murmur) microphone
DPA microphone

Credit: Georgina Brown and Chenyu

NST meeting in Cambridge

The NST (natural speech technology) meeting was held at the University of Cambridge

Dinner at Peterhouse

Roberto's PhD Viva

There was a very successful PhD Viva for Roberto Barra Chicote on Tuesday 20th December 2011 in Madrid, Spain and I participated in the Viva as an invited external examiner. Congratulations, Roberto!!


Five PhD studentship

The Centre for Speech Technology Research at University of Edinburgh invites applications for five fully-funded PhD studentships in speech and language processing.

For details, see http://www.ed.ac.uk/schools-departments/informatics/postgraduate/fees/research-grant-funding/speechtechnologyphd

Staff news


English PodCastle

PodCastle is a service that enables users to find speech data that include a search term, read full texts of their recognition results, and easily correct recognition errors by simply selecting from a list of candidates.


An English version of PodCastle is now running. This utilises the CSTR's speech recogniser, which was developed under their EU projects "FP6 AMI" and "FP6 AMIDA".


Three new grants awarded

Three new grants awarded!
- Deep architectures for statistical speech synthesis (EPSRC Career Acceleration Fellowship): £914k
- Silence speech interface for MND patients (EMC Seedcorn funding): £5k
- uDialogue (JST CREST Project) : £700k

Lecture in Granada

I attended a summer school on "Application of Speech technology" in Granada, Spain, and gave a lecture about text-to-speech synthesis and festival/HTS toolkits. I had a lot of nice sea food dishes there. The information of the course can be seen from here


ICASSP 2011 Award Ceremony



Korin and I visited USTC and iFlytek under the RSE-NSFC joint project to discuss further collaboration.


Tutorial slides

I had a tutorial at ISCSLP 2010 in Taiwan last week. Our slides can be seen from

Congratulations to Dr. Joao

Joao Cabral passed his PhD viva with minor corrections.

New databases RSS database

We released a new free Romanian speech database for speech synthesis named "RSS" .

The Romanian speech synthesis (RSS) corpus is a free large-scale Romanian speech corpus that includes about 3000 sentences uttered by a native female speaker. The RSS corpus was designed mainly for text-to-speech synthesis and was recorded in a hemianechoic chamber (anechoic walls and ceiling; floor partially anechoic) at the University of Edinburgh. We used three high quality studio microphones: a Neumann u89i (large diaphragm condenser), a Sennheiser MKH 800 (small diaphragm condenser with very wide bandwidth) and a DPA 4035 (headset-mounted condenser). Although the current release includes only speech data recorded via Sennheiser MKH 800, we may release speech data recorded via other microphones in the future. All recordings were made at 96 kHz sampling frequency and 24 bits per sample, then downsampled to 48 kHz sampling frequency. For recording, downsampling and bit rate conversion, we used ProTools HD hardware and software. We conducted 8 sessions over the course of a month, recording about 500 sentences in each session. At the start of each session, the speaker listened to a previously recorded sample, in order to attain a similar voice quality and intonation.


Presentatinos and Talks

I have a series of talks, presentations, and meetings in this summer:

- 1st Sept, Talk at Aholab, University of Basque Country, Bilbao, Spain
- 2nd and 3rd Sept, LISTA project meeting, Vitoria, Spain
- 17th, 19th (National working day!), and 20th Sept, Talk at Nokia research center, Beijing, China
- 22nd, 23rd, and 24th Sept, Speech synthesis workshop 7, Presentation for EMIME work
- 24th Sept, Open Source Initiatives for Speech Synthesis, Presentation for 'open-source/creative common' speech database
- 25th Sept, The 2010 Blizzard Challenge, Presentation for 'CSTR/EMIME entry for the 2010 Blizzard Challenge'
- 27th to 30th Sept, Interspeech 2010, Presentation for 'Roles of the average voice in speaker-adaptive HMM-based speech synthesis'


We would like to announce an open postdoc position in speech synthesis, voice reconstruction and personalised voice communication aids at University of Edinburgh. (This position is now closed!)


An Open Position for Postdoctoral Research Associate

The Centre for Speech Technology Research (CSTR)
University of Edinburgh

Job Description
The School of Informatics at the University of Edinburgh invites applications for the post of Postdoctoral Research Associate on a project concerning voice reconstruction and personalised voice communication aids. The project will develop clinical applications of speaker-adaptive statistical text-to-speech synthesis in collaboration with the Euan MacDonald Centre, who are funding this project. Applications include the reconstruction of voices of patients who have disordered speech as a consequence of Motor Neurone Disease, by using statistical parametric model adaptation. The project will also investigate better voice reconstruction methods.

You will be part of a dynamic and creative research team within the Centre for Speech Technology Research, at the forefront of developments in statistical speech synthesis. The application of statistical parametric speech synthesis to clinical applications such as voice banking, voice reconstruction and assistive devices, is an exciting new development and an area in which we expect to have increased research activity in the coming years. We are seeking additional long-term funding for this work and there may be the possibility of extending this Research Associate position.

Person Specification
You have (or will be near completion of) a PhD in speech processing, computer science, cognitive science, linguistics, engineering, mathematics, or a related discipline.

You will have the necessary programming ability to conduct research in this area, a background in statistical modelling using Hidden Markov Models and strong experimental planning and execution skills.

A background in one or more of the following areas is also desirable: statistical parametric text-to-speech synthesis using HMMs and HSMMs; speaker adaptation using the MLLR or MAP family of techniques; familiarity with software tools including HTK, HTS, Festival; ability to implement web applications.; Familiarity with the issues surrounding degenerative diseases which affect speech, Motor Neurone Disease, Parkinson's disease, Cerebral Palsy or Multiple Sclerosis are is desirable.

For further information, see http://www.jobs.ed.ac.uk/vacancies/index.cfm?fuseaction=vacancies.detail&vacancy_ref=3013390


Tutorial at ISCSLP 2010

Simon and I will give a tutorial at ISCSLP 2010 held in Taiwan on 29th November.


New and emerging applications of speech synthesis

Until recently, text-to-speech was often just an 'optional extra' which allowed text to be read out loud. But now, thanks to statistical and machine learning approaches, speech synthesis can mean more than just the reading out of text in a predefined voice. New research areas and more interesting applications are emerging.

In this tutorial, after a quick overview of the basic approaches to statistical speech synthesis including speaker adaptation, we consider some of these new applications of speech synthesis. We look behind each application at the underlying techniques used and describe the scientific advances that have made them possible. The applications we will examine include personalised speech-to-speech translation, 'robust speech synthesis' (the making thousands of different voices automatically from imperfect data), clinical applications such as voice reconstruction of patients who have disordered speech, and articulatory-controllable statistical speech synthesis.

The really interesting problems still to be solved in speech synthesis go beyond simply improving 'quality' or 'naturalness' (typically measured using Mean Opinion Scores). The key problem of personalised speech-to-speech translation is to reproduce or transfer speaker characteristics across languages. The aim of robust speech synthesis is to create good quality synthetic speech from noisy and imperfect data. The core problems in voice reconstruction centre around retaining or reconstructing the original characteristics of patients, given only samples of their disordered speech.

We illustrate our multidisciplinary approach to speech synthesis, bringing in techniques and knowledge from ASR, speech enhancement and speech production in order to develop the techniques required for these new applications. We will conclude by attempting to predict some future directions of speech synthesis.

See you in Taiwan!

Homepage updated

I renewed my web sites. Demonstration pages are still under constructions.

Itakura Prize!

The 2010 Itakura Prize for Innovative Young Researchers by the Acoustical Society of Japan, has been awarded to me for "Speaker adaptation techniques for speech synthesis”. This is a very great news. Thank you very much!