SICSA logo

Speech Technology and Human Computer Interaction Workshop

9.00am, March 27th, Informatics Forum, Edinburgh University, UK


Groupwork at the workshop will be organised across a number of themes:

  • In Car: A great example of eyes-free/hands-free environment.
  • In the Home: Use of pervasive technology integrated into our living space.
  • Emotional Systems I: Giving the machine emotion and personality.
  • Emotional Systems II: Interacting with a machine with personality.
  • Robots: Interacting a physically present machine.
  • Meetings: A great example of pervasive use of speech technology to help record and access meeting data.
  • Infrastructure: An example of ASR for mobile apps.

The themes are by no means exhaustive, but intend to offer a starting point for discussion. For each theme we have a short video and we have selected two loosely connected academic papers. Below arte links to the videos and the papers we selected (Hard copies will be available on the day).

In Car

Healey, Jennifer, and Dalila Szostak."Relating to speech evoked car personalities." CHI'13 Extended Abstracts on Human Factors in Computing Systems. ACM, 2013.

Pon-Barry, Heather, Fuliang Weng, and Sebastian Varges. "Evaluation of content presentation strategies for an in-car spoken dialogue system." INTERSPEECH. 2006.

In The Home

Mahkonen, Katariina, et al. "Mapping Sparse Representation to State Likelihoods in Noise-Robust Automatic Speech Recognition." INTERSPEECH. 2011.

Stifelman, Lisa, Adam Elman, and Anne Sullivan. "Designing natural speech interactions for the living room." CHI'13 Extended Abstracts on Human Factors in Computing Systems. ACM, 2013.

Emotional Systems I

Aylett, M.P., Pidcock, C.J.,"The CereVoice Characterful Speech Synthesiser SDK", AISB 2007, Newcastle. pp.174-8

Bickmore, Timothy, and Daniel Schulman. "The comforting presence of relational agents." CHI'06 Extended Abstracts on Human Factors in Computing Systems. ACM, 2006.

Emotional Systems II

Petridis, Stavros, Maja Pantic, and Jeffrey F. Cohn. "Prediction-based classification for audiovisual discrimination between laughter and speech." Automatic Face & Gesture Recognition and Workshops (FG 2011), 2011 IEEE International Conference on. IEEE, 2011.

Petridis, Stavros, and Maja Pantic. "Audiovisual laughter detection based on temporal features." Proceedings of the 10th international conference on Multimodal interfaces. ACM, 2008.

Brennan, Susan E. "The grounding problem in conversations with and through computers." Social and cognitive psychological approaches to interpersonal communication (1998): 201-225.


Briggs, Gordon, and Matthias Scheutz. "Facilitating mental modeling in collaborative human-robot interaction through adverbial cues." Proceedings of the SIGDIAL 2011 Conference. Association for Computational Linguistics, 2011.

Briggs, Gordon, and Matthias Scheutz. "Multi-modal belief updates in multi-robot human-robot dialogue interaction." Proc. of 2012 Symposium on Linguistic and Cognitive Approaches to Dialogue Agents. 2012.


Whittaker, Steve, et al. "Design and evaluation of systems to support interaction capture and retrieval." Personal and Ubiquitous Computing 12.3 (2008): 197-221.

Maganti, Hari Krishna, Petr Motlicek, and Daniel Gatica-Perez. "Unsupervised speech/non-speech detection for automatic speech recognition in meeting rooms." Acoustics, Speech and Signal Processing, 2007. ICASSP 2007. IEEE International Conference on. Vol. 4. IEEE, 2007.


Zhang, Rui, Stephen North, and Eleftherios Koutsofios. "A comparison of speech and GUI input for navigation in complex visualizations on mobile devices." Proceedings of the 12th international conference on Human computer interaction with mobile devices and services. ACM, 2010.

Bocchieri, Enrico, and Diamantino Caseiro. "Use of geographical meta-data in ASR language and acoustic models." Acoustics Speech and Signal Processing (ICASSP), 2010 IEEE International Conference on. IEEE, 2010.