2011

Five PhD studentship

The Centre for Speech Technology Research at University of Edinburgh invites applications for five fully-funded PhD studentships in speech and language processing.

For details, see http://www.ed.ac.uk/schools-departments/informatics/postgraduate/fees/research-grant-funding/speechtechnologyphd

Simple4All kick off

EC FP7 Simple4All project has started.
This is a 3 year EC collaboration project.
The partners are Aalto University, University of Helsinki, University Politecnia de Madrid and Technical University of Cluj-Napoca.

URL http://simple4all.org/


IMG_0160

Staff news

http://www.ed.ac.uk/news/staff/sept-awards-251011

uDialogue kick off meeting

JST CREST "uDialogue" kick off meeting was held in Nagoya, Japan.
This is a 5 year project supported by JST CREST.

IMG_0054

English PodCastle

PodCastle is a service that enables users to find speech data that include a search term, read full texts of their recognition results, and easily correct recognition errors by simply selecting from a list of candidates.

http://en.podcastle.jp/

An English version of PodCastle is now running. This utilises the CSTR's speech recogniser, which was developed under their EU projects "FP6 AMI" and "FP6 AMIDA".

http://www.aist.go.jp/aist_j/press_release/pr2011/pr20111012/pr20111012.html

Three PhD positions available

Three PhD positions available
Centre for Speech Technology Research
University of Edinburgh

uDialogue is a new JST CREST project, involving Nagoya Institute of Technology and the University of Edinburgh. Its objective is to develop a new spoken dialogue system framework based on user-generated content, and to advance speech recognition and synthesis technologies that strongly attract crowdsource dialogue content creators. uDialogue starts in October 2011, and has a duration of 5 years.

Three PhD positions in speech recognition and speech synthesis are available at CSTR, Edinburgh:

1. PhD position in speech recognition
2. PhD position in speech synthesis
3. PhD position in dialogue-content processing (content linking etc)

The PhD positions will start in September 2012, and have a duration of 4 years. They will include a 6-month internship at Nagoya. For details, please contact us by e-mail.


Interspeech-ad-uedin-v2

Video Recordings

Two video recordings (Simon's lecture at IRCAM and my presentation at Odyssey 2010) which you might have interests

http://www.dailymotion.com/video/xjswae_advances-in-speech-technologies-ircam-simon-king_tech
http://www.superlectures.com/odyssey/lecture.php?lang=en&id=29&query=Junichi%20Yamagishi

New Journal Paper

Sebastian’s paper was published in Speech Communication. Congratulations!
http://dx.doi.org/10.1016/j.specom.2011.08.001

Abstract:
Spontaneous conversational speech has many characteristics that are currently not modelled well by HMM-based speech synthesis and in order to build synthetic voices that can give an impression of someone partaking in a conversation, we need to utilise data that exhibits more of the speech phenomena associated with conversations than the more generally used carefully read aloud sentences. In this paper we show that synthetic voices built with HMM-based speech synthesis techniques from conversational speech data, preserved segmental and prosodic characteristics of frequent conversational speech phenomena. An analysis of an evaluation investigating the perception of quality and speaking style of HMM-based voices confirms that speech with conversational characteristics are instrumental for listeners to perceive successful integration of conversational speech phenomena in synthetic speech. The achieved synthetic speech quality provides an encouraging start for the continued use of conversational speech in HMM-based speech synthesis.

Three new grants awarded

Three new grants awarded!
- Deep architectures for statistical speech synthesis (EPSRC Career Acceleration Fellowship): £914k
- Silence speech interface for MND patients (EMC Seedcorn funding): £5k
- uDialogue (JST CREST Project) : £700k

Lecture in Granada

I attended a summer school on "Application of Speech technology" in Granada, Spain, and gave a lecture about text-to-speech synthesis and festival/HTS toolkits. I had a lot of nice sea food dishes there. The information of the course can be seen from here
http://ceres.ugr.es/THA/eng/

IMG_0993

ICASSP 2011 Award Ceremony

IMG_0923award

USTC

Korin and I visited USTC and iFlytek under the RSE-NSFC joint project to discuss further collaboration.

IMG_0479
IMG_0481

Speech synthesis seminar series at the University of Cambridge

I attended speech synthesis seminar series at the University of Cambridge and gave a talk about the adaptive speech synthesis and our current projects. My slides can be seen from
http://mi.eng.cam.ac.uk/mi/Main/SeminarsSpeech

A new journal paper

Adriana (Technical University of Cluj-Napoca)’s new journal paper was published in Speech Communication!
doi:10.1016/j.specom.2010.12.002

This paper introduces a new speech corpus named "RSS" and HMM-based speech synthesis systems using higher sampling rates such as 48kHz. The following is abstract.

This paper first introduces a newly-recorded high quality Romanian speech corpus designed for speech synthesis, called “RSS”, along with Romanian front-end text processing modules and HMM-based synthetic voices built from the corpus. All of these are now freely available for academic use in order to promote Romanian speech technology research. The RSS corpus comprises 3500 training sentences and 500 test sentences uttered by a female speaker and was recorded using multiple microphones at 96 kHz sampling frequency in a hemianechoic chamber. The details of the new Romanian text processor we have developed are also given.
Using the database, we then revisit some basic configuration choices of speech synthesis, such as waveform sampling frequency and auditory frequency warping scale, with the aim of improving speaker similarity, which is an acknowledged weakness of current HMM-based speech synthesisers. As we demonstrate using perceptual tests, these configuration choices can make substantial differences to the quality of the synthetic speech. Contrary to common practice in automatic speech recognition, higher waveform sampling frequencies can offer enhanced feature extraction and improved speaker similarity for HMM-based speech synthesis.

IEEE SPS 2010 Young Author Best Paper Award

Dr. Zhen-Hua Ling's journal paper published from IEEE in 2009 (titled "Integrating Articulatory Features into HMM-based Parametric Speech Synthesis") wins IEEE SPS 2010 Young Author Best Paper Award. Congratulations!

http://www.signalprocessingsociety.org/awards-fellows/award-recipient/

Christophe and the Voice reconstruction project

Welcome to Christophe Veaux, who joins us from IRCAM. He'll be working on a new voice reconstruction project. Some preliminary experimental results can be seen in my tutorial slides http://homepages.inf.ed.ac.uk/jyamagis/ISCSLP2010/ISCLSP-Tutorial.pdf

RSE-NSFC

The Royal Society of Edinburgh - National Science Foundation China travel grant has been awarded to CSTR and USTC for further joint and linked research on our novel framework for speech synthesis. Korin Richmond and I will visit USTC in China.