Korin Richmond

Centre for Speech Technology Research

All publications sorted by date (pdf)


[1] K. Richmond and S. King. Smooth talking: articulatory join costs for unit selection. In Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 5150–5154. March 2016. [pdf]

[2] Q. Hu, J. Yamagishi, K. Richmond, K. Subramanian, and Y. Stylianou. Initial investigation of speech synthesis based on complex-valued neural networks. In Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 5630–5634. March 2016. [pdf]

[3] R. Dall, S. Brognaux, K. Richmond, C. Valentini-Botinhao, G. E. Henter, J. Hirschberg, and J. Yamagishi. Testing the consistency assumption: pronunciation variant forced alignment in read and spontaneous speech synthesis. In Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 5155–5159. March 2016. [pdf]

[4] K. Richmond, Z. Ling, and J. Yamagishi. The use of articulatory movement data in speech synthesis applications: an overview - application of articulatory movements using machine learning algorithms [invited review]. Acoustical Science and Technology, 36(6):467–477, 2015. doi:10.1250/ast.36.467. [doi]

[5] K. Richmond, J. Yamagishi, and Z.-H. Ling. Applications of articulatory movements based on machine learning. Journal of the Acoustical Society of Japan, 70(10):539–545, 2015.

[6] A. Hewer, I. Steiner, T. Bolkart, S. Wuhrer, and K. Richmond. A statistical shape space model of the palate surface trained on 3D MRI scans of the vocal tract. In The Scottish Consortium for ICPhS 2015, editor, Proceedings of the 18th International Congress of Phonetic Sciences. Glasgow, United Kingdom, August 2015. [pdf]

[7] Q. Hu, Z. Wu, K. Richmond, J. Yamagishi, Y. Stylianou, and R. Maia. Fusion of multiple parameterisations for DNN-based sinusoidal speech synthesis with multi-task learning. In Proc. Interspeech. Dresden, Germany, September 2015. [pdf]

[8] Q. Hu, Y. Stylianou, R. Maia, K. Richmond, and J. Yamagishi. Methods for applying dynamic sinusoidal models to statistical parametric speech synthesis. In Proc. ICASSP. Brisbane, Austrilia, April 2015. [pdf]

[9] A. Hewer, S. Wuhrer, I. Steiner, and K. Richmond. Tongue mesh extraction from 3D MRI data of the human vocal tract. Mathematics and Visualization. Springer, 2015, (in press).

[10] Q. Hu, Y. Stylianou, K. Richmond, R. Maia, J. Yamagishi, and J. Latorre. A fixed dimension and perceptually based dynamic sinusoidal model of speech. In Proc. ICASSP, 6311–6315. Florence, Italy, May 2014. [pdf]

[11] J. Cabral, K. Richmond, J. Yamagishi, and S. Renals. Glottal spectral separation for speech synthesis. Selected Topics in Signal Processing, IEEE Journal of, 8(2):195–208, April 2014. doi:10.1109/JSTSP.2014.2307274. [doi]

[12] Q. Hu, Y. Stylianou, R. Maia, K. Richmond, J. Yamagishi, and J. Latorre. An investigation of the application of dynamic sinusoidal models to statistical parametric speech synthesis. In Proc. Interspeech, 780–784. Singapore, September 2014. [pdf]

[13] I. Steiner, K. Richmond, and S. Ouni. Speech animation using electromagnetic articulography as motion capture data. In Proc. 12th International Conference on Auditory-Visual Speech Processing, 55–60. 2013. [pdf]

[14] K. Richmond, Z. Ling, J. Yamagishi, and B. Uría. On the evaluation of inversion mapping performance in the acoustic domain. In Proc. Interspeech. Lyon, France, August 2013. [pdf]

[15] J. Scobbie, A. Turk, C. Geng, S. King, R. Lickley, and K. Richmond. The Edinburgh speech production facility DoubleTalk corpus. In Proc. Interspeech. Lyon, France, August 2013. [pdf]

[16] M. Astrinaki, A. Moinet, J. Yamagishi, K. Richmond, Z.-H. Ling, S. King, and T. Dutoit. Mage-HMM-based speech synthesis reactively controlled by the articulators. In 8th ISCA Workshop on Speech Synthesis, 243. Barcelona, Spain, August 2013. [pdf]

[17] Q. Hu, K. Richmond, J. Yamagishi, and J. Latorre. An experimental comparison of multiple vocoder types. In 8th ISCA Workshop on Speech Synthesis, 155–160. Barcelona, Spain, August 2013. [pdf]

[18] C. Geng, A. Turk, J. M. Scobbie, C. Macmartin, P. Hoole, K. Richmond, A. Wrench, M. Pouplier, E. G. Bard, Z. Campbell, C. Dickie, E. Dubourg, W. Hardcastle, E. Kainada, S. King, R. Lickley, S. Nakai, S. Renals, K. White, and R. Wiegand. Recording speech articulation in dialogue: evaluating a synchronized double electromagnetic articulography setup. Journal of Phonetics, 41(6):421 – 431, 2013. doi:10.1016/j.wocn.2013.07.002. [doi]

[19] M. Astrinaki, A. Moinet, J. Yamagishi, K. Richmond, Z.-H. Ling, S. King, and T. Dutoit. Mage - reactive articulatory feature control of HMM-based parametric speech synthesis. In 8th ISCA Workshop on Speech Synthesis, 227–231. Barcelona, Spain, August 2013. [pdf]

[20] Z. Ling, K. Richmond, and J. Yamagishi. Articulatory control of HMM-based parametric speech synthesis using feature-space-switched multiple regression. Audio, Speech, and Language Processing, IEEE Transactions on, 21(1):207–219, January 2013. doi:10.1109/TASL.2012.2215600. [doi]

[21] B. Uria, I. Murray, S. Renals, and K. Richmond. Deep architectures for articulatory inversion. In Proc. Interspeech. Portland, Oregon, USA, September 2012. [pdf]

[22] K. Richmond and S. Renals. Ultrax: an animated midsagittal vocal tract display for speech therapy. In Proc. Interspeech. Portland, Oregon, USA, September 2012. [pdf]

[23] I. Steiner, K. Richmond, and S. Ouni. Using multimodal speech production data to evaluate articulatory animation for audiovisual speech synthesis. In 3rd International Symposium on Facial Analysis and Animation. Vienna, Austria, 2012. [pdf]

[24] I. Steiner, K. Richmond, I. Marshall, and C. D. Gray. The magnetic resonance imaging subset of the mngu0 articulatory corpus. The Journal of the Acoustical Society of America, 131(2):EL106–EL111, January 2012. doi:10.1121/1.3675459. [pdf | doi]

[25] Z. Ling, K. Richmond, and J. Yamagishi. Vowel creation by articulatory control in HMM-based parametric speech synthesis. In Proc. The Listening Talker Workshop, 72. Edinburgh, UK, May 2012. [pdf]

[26] Z.-H. Ling, K. Richmond, and J. Yamagishi. Vowel creation by articulatory control in HMM-based parametric speech synthesis. In Proc. Interspeech. Portland, Oregon, USA, September 2012. [pdf]

[27] B. Uria, S. Renals, and K. Richmond. A deep neural network for acoustic-articulatory speech inversion. In Proc. NIPS 2011 Workshop on Deep Learning and Unsupervised Feature Learning. Sierra Nevada, Spain, December 2011. [pdf]

[28] K. Richmond, P. Hoole, and S. King. Announcing the electromagnetic articulography (day 1) subset of the mngu0 articulatory corpus. In Proc. Interspeech, 1505–1508. Florence, Italy, August 2011. [pdf]

[29] Z.-H. Ling, K. Richmond, and J. Yamagishi. Feature-space transform tying in unified acoustic-articulatory modelling of articulatory control of HMM-based speech synthesis. In Proc. Interspeech, 117–120. Florence, Italy, August 2011. [pdf]

[30] L. Ming, J. Yamagishi, K. Richmond, Z.-H. Ling, S. King, and L.-R. Dai. Formant-controlled HMM-based speech synthesis. In Proc. Interspeech, 2777–2780. Florence, Italy, August 2011. [pdf]

[31] J. P. Cabral, S. Renals, J. Yamagishi, and K. Richmond. HMM-based speech synthesiser using the LF-model of the glottal source. In Acoustics, Speech and Signal Processing (ICASSP), 2011 IEEE International Conference on, 4704–4707. May 2011. doi:10.1109/ICASSP.2011.5947405. [pdf | doi]

[32] A. Turk, J. Scobbie, C. Geng, C. Macmartin, E. Bard, B. Campbell, C. Dickie, E. Dubourg, B. Hardcastle, P. Hoole, E. Kanaida, R. Lickley, S. Nakai, M. Pouplier, S. King, S. Renals, K. Richmond, S. Schaeffler, R. Wiegand, K. White, and A. Wrench. The Edinburgh Speech Production Facility's articulatory corpus of spontaneous dialogue. The Journal of the Acoustical Society of America, 128(4):2429–2429, 2010. doi:10.1121/1.3508679. [doi]

[33] A. Turk, J. Scobbie, C. Geng, B. Campbell, C. Dickie, E. Dubourg, E. G. Bard, W. Hardcastle, M. Hartinger, S. King, R. Lickley, C. Macmartin, S. Nakai, S. Renals, K. Richmond, S. Schaeffler, K. White, R. Wiegand, and A. Wrench. An Edinburgh speech production facility. Poster presented at the 12th Conference on Laboratory Phonology, Albuquerque, New Mexico., July 2010. [pdf]

[34] K. Richmond, R. Clark, and S. Fitt. On generating Combilex pronunciations via morphological analysis. In Proc. Interspeech, 1974–1977. Makuhari, Japan, September 2010. [pdf]

[35] Z.-H. Ling, K. Richmond, and J. Yamagishi. HMM-based text-to-articulatory-movement prediction and analysis of critical articulators. In Proc. Interspeech, 2194–2197. Makuhari, Japan, September 2010. [pdf]

[36] G. Hofer and K. Richmond. Comparison of HMM and TMDN methods for lip synchronisation. In Proc. Interspeech, 454–457. Makuhari, Japan, September 2010. [pdf]

[37] D. Felps, C. Geng, M. Berger, K. Richmond, and R. Gutierrez-Osuna. Relying on critical articulators to estimate vocal tract spectra in an articulatory-acoustic database. In Proc. Interspeech, 1990–1993. September 2010. [pdf]

[38] J. Cabral, S. Renals, K. Richmond, and J. Yamagishi. Transforming voice source parameters in a HMM-based speech synthesiser with glottal post-filtering. In Proc. 7th ISCA Speech Synthesis Workshop (SSW7), 365–370. NICT/ATR, Kyoto, Japan, September 2010. [pdf]

[39] Z.-H. Ling, K. Richmond, and J. Yamagishi. An analysis of HMM-based prediction of articulatory movements. Speech Communication, 52(10):834–846, October 2010. doi:10.1016/j.specom.2010.06.006. [doi]

[40] G. Hofer, K. Richmond, and M. Berger. Lip synchronization by acoustic inversion. Poster at Siggraph 2010, 2010. [pdf]

[41] I. Steiner and K. Richmond. Towards unsupervised articulatory resynthesis of German utterances using EMA data. In Proc. Interspeech, 2055–2058. Brighton, UK, September 2009. [pdf]

[42] K. Richmond. Preliminary inversion mapping results with a new EMA corpus. In Proc. Interspeech, 2835–2838. Brighton, UK, September 2009. [pdf]

[43] K. Richmond, R. Clark, and S. Fitt. Robust LTS rules with the Combilex speech technology lexicon. In Proc. Interspeech, 1295–1298. Brighton, UK, September 2009. [pdf]

[44] Z. Ling, K. Richmond, J. Yamagishi, and R. Wang. Integrating articulatory features into HMM-based parametric speech synthesis. IEEE Transactions on Audio, Speech and Language Processing, 17(6):1171–1185, August 2009. \textbf IEEE SPS 2010 Young Author Best Paper Award. doi:10.1109/TASL.2009.2014796. [doi]

[45] J. Cabral, S. Renals, K. Richmond, and J. Yamagishi. HMM-based speech synthesis with an acoustic glottal source model. In Proc. The First Young Researchers Workshop in Speech Technology. April 2009. [pdf]

[46] I. Steiner and K. Richmond. Generating gestural timing from EMA data using articulatory resynthesis. In Proc. 8th International Seminar on Speech Production. Strasbourg, France, December 2008.

[47] C. Qin, M. Carreira-Perpinan, K. Richmond, A. Wrench, and S. Renals. Predicting tongue shapes from a few landmark locations. In Proc. Interspeech, 2306–2309. Brisbane, Australia, September 2008. [pdf]

[48] Z.-H. Ling, K. Richmond, J. Yamagishi, and R.-H. Wang. Articulatory control of HMM-based parametric speech synthesis driven by phonetic knowledge. In Proc. Interspeech, 573–576. Brisbane, Australia, September 2008. [pdf]

[49] J. Cabral, S. Renals, K. Richmond, and J. Yamagishi. Glottal spectral separation for parametric speech synthesis. In Proc. Interspeech, 1829–1832. Brisbane, Australia, September 2008. [pdf]

[50] K. Richmond, V. Strom, R. Clark, J. Yamagishi, and S. Fitt. Festival multisyn voices for the 2007 blizzard challenge. In Proc. Blizzard Challenge Workshop (in Proc. SSW6). Bonn, Germany, August 2007. [pdf]

[51] K. Richmond. A multitask learning perspective on acoustic-articulatory inversion. In Proc. Interspeech. Antwerp, Belgium, August 2007. [pdf]

[52] K. Richmond. Trajectory mixture density networks with multiple mixtures for acoustic-articulatory inversion. In M. Chetouani, A. Hussain, B. Gas, M. Milgram, and J.-L. Zarader, editors, Advances in Nonlinear Speech Processing, International Conference on Non-Linear Speech Processing, NOLISP 2007, volume 4885 of Lecture Notes in Computer Science, 263–272. Springer-Verlag Berlin Heidelberg, December 2007. doi:10.1007/978-3-540-77347-4_23. [pdf | doi]

[53] S. King, J. Frankel, K. Livescu, E. McDermott, K. Richmond, and M. Wester. Speech production knowledge in automatic speech recognition. Journal of the Acoustical Society of America, 121(2):723–742, February 2007. [pdf]

[54] R. A. J. Clark, K. Richmond, and S. King. Multisyn: open-domain unit selection for the Festival speech synthesis system. Speech Communication, 49(4):317–330, 2007. doi:10.1016/j.specom.2007.01.014. [pdf | doi]

[55] J. Cabral, S. Renals, K. Richmond, and J. Yamagishi. Towards an improved modeling of the glottal source in statistical parametric speech synthesis. In Proc.of the 6th ISCA Workshop on Speech Synthesis. Bonn, Germany, 2007. [pdf]

[56] K. Richmond. A trajectory mixture density network for the acoustic-articulatory inversion mapping. In Proc. Interspeech. Pittsburgh, USA, September 2006. [pdf]

[57] S. Fitt and K. Richmond. Redundancy and productivity in the speech technology lexicon - can we do better? In Proc. Interspeech 2006. September 2006. [pdf]

[58] R. Clark, K. Richmond, V. Strom, and S. King. Multisyn voices for the Blizzard Challenge 2006. In Proc. Blizzard Challenge Workshop (Interspeech Satellite). Pittsburgh, USA, September 2006. (http://festvox.org/blizzard/blizzard2006.html). [pdf]

[59] L. Onnis, P. Monaghan, K. Richmond, and N. Chater. Phonology impacts segmentation in speech processing. Journal of Memory and Language, 53(2):225–237, 2005. [pdf]

[60] G. Hofer, K. Richmond, and R. Clark. Informed blending of databases for emotional speech synthesis. In Proc. Interspeech. September 2005. [pdf | ps]

[61] R. A. J. Clark, K. Richmond, and S. King. Multisyn voices from ARCTIC data for the Blizzard challenge. In Proc. Interspeech 2005. September 2005. [pdf]

[62] R. A. J. Clark, K. Richmond, and S. King. Festival 2 – build your own general purpose unit selection speech synthesiser. In Proc. 5th ISCA workshop on speech synthesis. 2004. [pdf | ps]

[63] D. Toney, D. Feinberg, and K. Richmond. Acoustic features for profiling mobile users of conversational interfaces. In S. Brewster and M. Dunlop, editors, 6th International Symposium on Mobile Human-Computer Interaction - MobileHCI 2004, 394–398. Glasgow, Scotland, September 2004. Springer.

[64] K. Richmond, S. King, and P. Taylor. Modelling the uncertainty in recovering articulation from acoustics. Computer Speech and Language, 17:153–172, 2003. [pdf]

[65] K. Richmond. Estimating Articulatory Parameters from the Acoustic Speech Signal. PhD thesis, The Centre for Speech Technology Research, Edinburgh University, 2002. [ps]

[66] K. Richmond. Mixture density networks, human articulatory data and acoustic-to-articulatory inversion of continuous speech. In Proc. Workshop on Innovation in Speech Processing, 259–276. Institute of Acoustics, April 2001. [ps]

[67] A. Wrench and K. Richmond. Continuous speech recognition using articulatory data. In Proc. ICSLP 2000. Beijing, China, 2000. [pdf | ps]

[68] S. King, P. Taylor, J. Frankel, and K. Richmond. Speech recognition via phonetically-featured syllables. In PHONUS, volume 5, 15–34. Institute of Phonetics, University of the Saarland, 2000. [pdf | ps]

[69] J. Frankel, K. Richmond, S. King, and P. Taylor. An automatic speech recognition system using neural networks and linear dynamic models to recover and model articulatory traces. In Proc. ICSLP. 2000. [pdf | ps]

[70] K. Richmond. Estimating velum height from acoustics during continuous speech. In Proc. Eurospeech, volume 1, 149–152. Budapest, Hungary, 1999. [pdf | ps]

[71] K. Richmond. A proposal for the compartmental modelling of stellate cells in the anteroventral cochlear nucleus, using realistic auditory nerve inputs. Master's thesis, Centre for Cognitive Science, University of Edinburgh, September 1997.

[72] K. Richmond, A. Smith, and E. Amitay. Detecting subject boundaries within text: a language-independent statistical approach. In Proc. The Second Conference on Empirical Methods in Natural Language Processing, 47–54. Brown University, Providence, USA, August 1997. [pdf | ps]