 |
Simon King |
| Centre for Speech Technology Research |
| University of Edinburgh |
|   |
| Room 3.11, Informatics Forum | |
| 10 Crichton Street | |
| Edinburgh EH8 9AB |
| United Kingdom |
| |
| Tel: +44 131 651 1725 |
| Fax: +44 131 650 4587 |
| email:
|
|
|
  
|
Research
A fundamental question is: What are the basic
building blocks of speech? To answer this question, I am working in a
number of areas.
In
speech recognition, I am looking at new acoustic models,
such as Linear Dynamical Models, factorial-HMMs and other graphical
models that can represent speech not as 'beads on a string' but as
streams of interacting factors. I've investigated ways to automatically find an
inventory of suitable units to model, as well as working on other
alterntives to phonetic units, such as graphemes. One long-standing interest is
the use of phonological/acoustic/articulatory features and
articulatory measurement data as a tool to develop models of
speech.
In speech synthesis, I work on both unit selection methods
and HMM-based speech synthesis. In both of these areas, the
definition of the unit of speech is crucial. Both typically use
context-dependent phonemes or diphones so, in this context, we can
gain some insight into the basic building blocks of speech by asking
"What contextual features must we model?" In unit selection, this
means learning the target cost and in HMM-based speech
synthesis, it relates to the clustering of acoustically similar
units. Neither of these processes is entirely satisfactory, but to
improve them requires a better understanding of how we can construct
speech from basic units.
I am increasingly interested in perceptual measures in speech synthesis,
not just for evaluation of the final output, but within the synthesis
process itself. In unit selection, perceptual measures should be used to determine
equivalent units or contexts, because acoustic similarity and
perceptual interchangeability are not the same thing. In HMM-based
speech synthesis, the training criterion should be perceptual: perhaps
minimum generation error gives us a way to use such a
criterion? How can the requirements of acoustic modelling fit with
this idea of perceptual equivalence?
In both recognition and synthesis, I have recently started work on
multilingual systems as an additional way to look at the basic
units of speech. Is there a univeral set of building blocks for
speech, and can we build systems that use common models or unit
inventories for multiple languages?
- Current research funding
- Study of Source Features for Speech
Synthesis and Speaker Recognition (UKIERI April 2007 - March 2011)
- Automatic target cost and database
design for unit-selection speech synthesis (EPSRC April 2007 - March 2010)
- Effective Multilingual Interaction
in Mobile Environments - EMIME
(EC FP7 March 2008 - Feb 2011)
- NEW LISTA - The Listening Talker
(EC FP7 2010 - 2013)
- Recently completed research grants
- Testing Evaluation of Speech Synthesis, in which we will a)
investigate the psycho-acoustic processes underlying human
auditory evaluation of synthetic speech and then b) design new
evaluation methodologies for specific, individual aspects of
speech synthesis (EPSRC Jan 2005-Aug 2008)
- Automatically-determined Unit
Inventories for Unit Selection Text-to-Speech Synthesis (June 2006 -
May 2009)
- EPSRC Advanced Research Fellowship, during which I will be
investigating 'streamed' (factored) models, with both hidden and
observed factors (EPSRC Jan 2005-Dec 2009)
Publications
See my publications
page
- Research fellows I currently work with
- PhD students (in chronological order)
- As principal supervisor
- As second supervisor or advisor
- Former students
If you are interested in studying for a PhD at CSTR, you can find more
information here or
here
Recent presentations
- Universiti Teknologi Malaysia: Speech Technology
Travel plans
- 10-12 February 2010: personal
- 3-5 March 2010: EPFL & Idiap, Switzerland
- 13-20 March 2010: ICASSP, Dallas, Texas, USA
- 21-28 September 2010 (TBC): SSW7, Blizzard Workshop and Interspeech, Japan
Teaching
Office hours
By appointment, preferably during 10.00-13.00.
Courses
Personal
In my so-called "spare time" I'm either going to the opera, indulging in marathon
sessions of Blake's Seven or Dr. Who, or fiddling with mod_rewrite and pretending I have a clue what PHP is on spanish-bookworld.com.