Segmental Intelligibility of Three Text-To-Speech Synthesis Methods in Reverberant Environments
By Venkatagiri, Horabail S.; Augmentative and Alternative Communication (AAC), Vol. 20, No. 3, pp. 150-163Publication Date: September 2004
Study conducted to investigate the intelligibility of three text-to-speech engines: (1) ViaVoice (IBM), (2) Festival Version 1.4.2, and (3) Natural Voices (AT&T). Each application uses a different method of speech generation. For example, ViaVoice uses allophone-based format coding, while Festival Version 1.4.2 is an open-source freely available diphone-based linear predictive coding synthesis product. Natural Voices uses a harmonic-plus-noise method that is discussed in another article. Thirty-two people were recruited as listeners for the study. Each participant listened to eight tapes of twenty-five stimuli each, four tapes of room reverberation, and four tapes of hall reverberation. The participants were instructed to write down the last word they heard in each phrase presented. The number of words without errors and the number of errors in the initial, medial, and final word positions were tallied for each text-to-speech product and for recordings of a human voice in each of the conditions. The human speech condition yielded the highest overall intelligibility, followed by Natural Voices, Festival 1.4.2, and ViaVoice, respectively. Quantitative results are displayed in table format. The author concludes that the results have implications for the need to improve intelligibility in text-to-speech communication devices.
Assistive Products Discussed: VIAVOICE STANDARD
VIAVOICE PRO USB EDITION
VIAVOICE FOR MAC OS X
AT&T NATURAL VOICES TEXT-TO-SPEECH DESKTOP SDK
Published by: International Society for Augmentative and Alternative Communication (ISAAC) (Website:http://www.isaac-online.org)
This publication is included in the library of the National Rehabilitation Information Center (NARIC), accession number J46868

