TESS: Testing Evaluation of Speech Synthesis

Project Summary

TESS is a project designed to investigate the psychoacoustic processes underlying human auditory evaluation of synthetic speech, with the goal of developing more perceptually principled evaluation methods.

Project Details

The main aim of this project is to take the first steps toward developing a perceptually rigorous method, or set of methods, for testing synthetic speech quality at the sub- and supra-segmental level.

The motivation for the project is a general lack of understanding of how listeners perform the complex task of auditory evaluation of synthetic speech. Thus, although there is a clear understanding in the field of what paradigms are available for testing perceived quality, and an awareness of the necessity of principled synthesis evaluation, very little research has actually been dedicated to determining what stimulus-specific and paradigm-specific factors influence listeners' behaviour when rating synthetic speech. Most commercial and academic evaluation studies therefore continue to make use of methods that have been chosen on an ad-hoc basis, and as a result, human evaluation of the quality of speech synthesis systems tends to lack strong inter- or intra-rater consistency or reliability.

To achieve the overall goal of the project, we are systematically investigating:

what acoustic information listeners pay attention to, and are otherwise influenced by, when rating synthetic speech
what training or presentation methods are most appropriate for changing listeners' default patterns of perceptual attention in order to encourage them to focus on the dimension under evaluation
what type of evaluation tasks are most appropriate for ensuring consistent and reliable rating of various dimensions of synthetic speech
whether the use of, or choice of, reference stimuli impacts on listeners' synthetic speech evaluation behaviour

The results of this study will allow us to determine whether the use of more appropriate perceptual evaluation methods does indeed improve inter- and intra-rater consistency and reliability, and whether higher levels of inter- and intra-rater consistency and reliability will allow for higher levels of correlation between objective and subjective measures of synthetic speech quality.

Personnel

Funding Source

The Engineering and Physical Science Research Council (EPSRC grant EP/C53042X/1)