TESS: Testing Evaluation of Speech Synthesis
TESS is a project designed to investigate the psychoacoustic processes underlying human auditory evaluation of synthetic speech, with the goal of developing more perceptually principled evaluation methods.
The main aim of this project is to take the first steps toward developing a perceptually rigorous method, or set of methods, for testing synthetic speech quality at the sub- and supra-segmental level.
The motivation for the project is a general lack of understanding of how listeners perform the complex task of auditory evaluation of synthetic speech. Thus, although there is a clear understanding in the field of what paradigms are available for testing perceived quality, and an awareness of the necessity of principled synthesis evaluation, very little research has actually been dedicated to determining what stimulus-specific and paradigm-specific factors influence listeners' behaviour when rating synthetic speech. Most commercial and academic evaluation studies therefore continue to make use of methods that have been chosen on an ad-hoc basis, and as a result, human evaluation of the quality of speech synthesis systems tends to lack strong inter- or intra-rater consistency or reliability.
To achieve the overall goal of the project, we are systematically investigating:
- what acoustic information listeners pay attention to, and are otherwise influenced by, when rating synthetic speech
- what training or presentation methods are most appropriate for changing listeners' default patterns of perceptual attention in order to encourage them to focus on the dimension under evaluation
- what type of evaluation tasks are most appropriate for ensuring consistent and reliable rating of various dimensions of synthetic speech
- whether the use of, or choice of, reference stimuli impacts on listeners' synthetic speech evaluation behaviour
The results of this study will allow us to determine whether the use of more appropriate perceptual evaluation methods does indeed improve inter- and intra-rater consistency and reliability, and whether higher levels of inter- and intra-rater consistency and reliability will allow for higher levels of correlation between objective and subjective measures of synthetic speech quality.
The Engineering and Physical Science Research Council (EPSRC grant EP/C53042X/1)