Mobile Gesture-Controlled Speech Synthesis for Performance
[Full article published in eContact! 11.2 — Figures canadiennes II / Canadian Figures II]
Toronto Electroacoustic Symposium, Session 3: The Electroacoustic Voice
Friday 8 August, 14:15–15:15. Faculty of Music, University of Toronto
View full schedule
The electroacoustic community has invested a great deal of time and effort in the refinement of hardware, synthesis methods, protocols, diffusion systems, and sample playback. However, while a variety of sound generation, control and synthesis methods exist for use in performance, there are very few systems available for the creation and improvisation of speech in real-time situations. Additionally, there are few such systems that allow for the performer to move about to enhance the performance. We have addressed this situation by reconstructing and reconfiguring the original Glove-TalkII and GRASSP speech synthesis systems so that they are portable and wearable for use in performance. The original stationary systems used a parallel formant speech synthesis system controlled by hand and finger gestures. Our new system — a Digital Ventriloquized Actor (DIVA) — continues the use of the formant synthesis method but has significant differences. We have reconfigured the hardware interfaces, solved power supply problems, redesigned and reconfigured the software, added a small sound system, and created a special harness and garment to hold all of the equipment. As part of the garment design we solved problems in comfort, usability, wiring, and fabric contacts. Additionally, we were able to remain true to a chosen æsthetic which emphasizes the human over the technology, supporting the artistic goals in performance.
We have developed simpler, easier approaches to training users for performance and this has increased the responsiveness of the system and reduced the training time. For users we simplified the training approach by shifting from a gradient descent with a large training set to a regression-based approach that uses a small sample size for learning the parameters of the normalized radial basis functions that are used to map hand gestures to speech. As well, we unified the overall scaling and personalization of the user interface so that each user has a unique, personal set of data associated with hand locations and finger angles. Each user is now able to re-audition and edit their data, thereby increasing the quality of their synthetic speech, reducing their training time, and increasing the system sensitivity to personal physiological attributes. In our paper we also discuss how sound diffusion, control of virtual faces, and control of robotic assemblies are also possible within the performance environment. Finally, we discuss significant issues that have arisen during our work, possible solutions to these problems, and some interesting future developments concerning acoustic tube modeling as well as articulatory synthesis.
Bob Pritchard’s creative work includes concert music, interactive music and video pieces, chaos-algorithm synthesis, video and film, and software development. He teaches music theory, special topics, and interdisciplinary courses at the University of British Columbia School of Music, is co-director of UBC’s MUsic,Sound and Electroacoustic Technology group (MUSET), and is a researcher with the Institute for Computing, Information, and Cognitive Science (ICICS), and the Media And Graphics Interdisciplinary Centre (MAGIC). In 2004 he received a 3-year Research/Creation grant from SSHRC, refining cyberglove-controlled speech synthesis and in 2007 Pritchard, Fels, and Vatikiotis-Bateson received a Canada Council/NSERC grant for the development of Digital Ventriloquized Actors (DiVAs) that combine gestural control of speech synthesis with virtual faces. In 2007 his interactive piece Strength for saxophone and video received a Unique Award of Merit from the Canadian Society of Cinematographers. He is Vice-chair of the Canadian Music Centre’s British Columbia region.
Sidney Fels, Ph.D. (Toronto, 1994), P.Eng, BASc (Waterloo, 2988) is an Associate Professor in the department of Electrical & Computer Engineering at the University of British Columbia since 1998. He was recognized as a Distinguished University Scholar at UBC since 2004. He was a visiting researcher at ATR Media Integration & Communications Research Laboratories in Kyoto, Japan from 1996 to 1997. He is internationally known for his work in human-computer interaction, biomechanical modeling, neural networks, intelligent agents, new interfaces for musical expression and interactive arts with over 100 scholarly publications, numerous exhibitions and performance works. He has been the Director of MAGIC since 2001.
Paper originally presented at the Toronto Electroacoustic Symposium 2008, August 2008.