The Centre for Speech Technology Research
University of Edinburgh
Informatics Forum
10 Crichton Street
Edinburgh EH8 9AB
United Kingdom
phone: +44 131 651 1768 (office)
email: user@sms.ed.ac.uk (user = "m.a.berger")
I am a PhD Student in the School of Informatics at the University of Edinburgh, where I am a member of the Institute for Language, Cognition and Computation and the Centre for Speech Technology Research. I have a research background in theoretical linguistics, articulatory and acoustic phonetics and computer animation. My current PhD work incorporates parallel projects in facial modeling and articulation modeling, and development of a platform for automated facial animation called Carnival. My supervisors for the PhD are Hiroshi Shimodaira and Simon King. I am also CTO and Founder of Speech Graphics Ltd, a provider of automated lip-sync solutions for the video game industry.
I have worked in visual speech synthesis both in a research lab and as an independent business. It is now the subject of my PhD research. The goal of this research is to automate facial animation synchronized with speech. Facial animation may be automated from audio, text or motion capture input; I am focusing on audio- and text-driven approaches, which require rich modeling of speech. The aim is to obtain high-fidelity 3D output that can be verified by objective measures.
After performing audio analysis on speech input or preprocessing text input, we obtain a representation of speech as a sequence of phonemic segments. Converting this discrete representation into continuous facial motion requires a model of the speech process, in particular coarticulation. Previous approaches follow the coproduction paradigm, in which coarticulation results from the overlap of abstract influence functions. I have developed an alternative behavior model in which phoneme targets are constrained by a general system of muscular dynamics, obviating the need for influence functions. This model also attempts to account for several important natural properties of speech, including tradeoffs between accuracy and energy efficiency, effects of syllable emphasis and syllable structure on phoneme production, aerodynamic objectives, anatagonistic muscle relations, and differences in time scales between articulators.
Accurate facial dynamics alone are insufficient to produce high-quality speech-synchronised animation. Also needed is a good 3D face model on which to execute the motions. A common shortcoming of facial models is that they have good static shape but deformers that are unrealistic or fail to reflect the full array of muscles active in a real person's face during speech. Unrealistic deformation in facial models imposes an upper limit on animation quality because even the most accurate facial dynamics will fail to be visualised realistically. In the current state of the art, facial deformers are still designed manually by artists. I have developed an approach to constructing high-fidelity facial models with the precise shape and deformation patterns of a real speaker's face, utilizing high-resolution 3D capture. I am also interested in reconstructing the deformations caused by multiple muscles activating simultaneously, which involves nonlinear interactions.
Automated facial animation has a large number of applications, including video game development, film animation, and embodied conversational agents. I also hope that this work will contribute to the scientific understanding of speech production.
Writing, languages, philosophy, yoga, meditation, foot reflexology, running.
