CHI 2021 Course

Conversational Voice User Interfaces

Outline and learning objectives

  • How Automatic Speech Recognition (ASR) and Speech Synthesis (or Text-To-Speech – TTS) work and why these are such computationally-difficult problems

  • Where are ASR and TTS used in current commercial interactive applications

  • What are the usability issues surrounding speech-based interaction systems, particularly in mobile and pervasive computing

  • What are the challenges in enabling speech as a modality for mobile interaction

  • What is the current state-of-the-art in ASR and TTS research

  • What are the differences between the commercial ASR systems' accuracy claims and the needs of mobile interactive applications

  • What are the difficulties in evaluating the quality of TTS systems, particularly from a usability and user perspective

  • What opportunities exist for HCI researchers in terms of enhancing systems' interactivity by enabling speech

  • How do current heuristic guidelines apply to voice interfaces, and how are these influenced by engineering limitations

Recent updates for 2021

New to 2021 is a theoretical and practical review of most recent research on developing design guidelines for conversation user interfaces, which we are contextualizing in terms of the engineering capabilities of the underlying speech processing systems. Hands-on activities will also be carried out specific to this topic, with participants invited to conduct usability walk-throughs of a readily-available system (e.g Alexa, Google Assistant, Siri) as guided by heuristic guidelines.

Hands-on activities

The course includes three interactive, hands-on activities. The first activity will engage participants in proposing design alternatives for the error-handling interaction of a smartphone's voice-based search assistant, based on an empirical assessment of the type of ASR errors exhibited (e.g. acoustic, language, semantic). For the second activity, participants will conduct an evaluation of the quality of the synthetic speech output typically employed in mobile-based speech interfaces, and propose alternate evaluation methods that better reflect the mobile user experience. NEW ACTIVITY: The third activity will center around uncovering speech processing errors of a home-based personal assistant and designing interactions that maintain a positive user experience in the face of unexpected variations in speech processing accuracy.