The Centre for Speech Technology Research
University of Edinburgh
Informatics Forum
10 Crichton Street
Edinburgh EH8 9AB
United Kingdom
phone: +44 131 651 1768 (office)
email: user@sms.ed.ac.uk (user = "m.a.berger")
I am a PhD Student in the School of Informatics at the University of Edinburgh, where I am a member of the Institute for Communicating and Collaborative Systems and the Centre for Speech Technology Research. I have a background in theoretical linguistics, computational linguistics, articulatory and acoustic phonetics, and computer graphics. Currently, for my PhD work, I am developing a visual speech synthesizer; this incorporates two parallel projects in facial modeling and articulation modeling. My supervisors for the PhD are Hiroshi Shimodaira and Simon King. Since 2006 I have also been been working as a consultant for a research company CReSS LLC (Consulting and Research Services in Acoustics and Speech), and I am co-founder and president of Speech Graphics Inc.
I have worked in visual speech synthesis both in a research lab and as an independent business. It is now the subject of my PhD research. The goal of this research is to automatically synthesize highly realistic 3-D facial animation synchronized with auditory speech. The animation is driven either by an audio recording of natural speech, or by text (with synthetic audio output generated by an external TTS system). To the right is a video demo of my former prototype model. [Demo will be onine soon.]
This project consists of two sub-projects:
This work has a large number of potential applications, such as virtual humans on websites, talking characters in 3D games, interactive dialog agents, automated characters in film animation, and so on. Moreover, the direction of human-machine interaction is towards convergence with human-to-human interaction. A natural result is that computers will communicate with us through face-to-face speech.
I hope that this work contributes not only to realistic speech synthesis applications but also to a better theoretical understanding of speech production.
With Richard McGowan of CReSS LLC I am working on a large-scale empirical study of articulatory-acoustic relations, including forward and inverse mappings. We developed a technique for estimating these mappings using loess, or Locally Weighted Linear Regression, which is a locally linear but globally non-linear approach. Unlike neural networks or other approaches which obscure their functioning, regression provides an explicit account of the mappings, which may shed light on theoretical issues. I developed a set of software tools for this project, including scripts in MATLAB and Praat, as well as a Java application for efficient manual editing of formant tracks generated by Praat. These tools will be made publicly available.
During my Masters research I worked on a method for measuring nasality over time in vowels from the acoustic signal. This is tantamount to an inverse problem of inferring velar position from acoustics. The task of isolating an acoustic dimension corresponding to velar position is difficult because the effects of nasalization vary depending on other articulatory factors, including vowel shape and speaker anatomy. My approach involved a procedure to normalize a nasality measure over these other factors.
Yoga, meditation, foot reflexology; writing and drawing; philosophy and mysticism; running; hiking, being in the wilderness; travel; looking at architecture; learning languages; cooking; raising plants; the weather; the elements.
