The Centre for Speech Technology Research, The university of Edinburgh

Voice transformation

Project Summary

Transforming the quality and intonation of the speech of one speaker so that it sounds like another speaker

Project Details

Transforming Voice Quality and Intonation


Voice transformation is the process of transforming the characteristics of speech uttered by a source speaker, such that a listener would believe the speech was uttered by a target speaker. In this thesis two aspects of the transformation problem are addressed: voice quality and intonation.

The voice quality transformation component of our system has two main parts corresponding to the two components of the source-filter model. The first component transforms the spectral envelope as represented by a linear prediction model. The transformation is achieved using a Gaussian mixture model, which is trained on aligned speech from source and target speakers. The second part of the system predicts the spectral detail from the transformed LPC parameters. A novel approach is proposed, which is based on a classifier and residual codebooks. The system has some similarities with earlier work by Kain, however the work reported here is not restricted to speech spoken in a monotone and with mimicked prosody. Also, on the basis of a number of performance metrics it outperforms existing systems.

We also present a new method for the transformation of pitch contours from one speaker to another based on a small linguistically motivated parameter set. The system performs a piecewise-linear mapping using these parameters. A perceptual experiment, clearly demonstrates that the presented system is at least as good as the existing technique for all speaker pairs, and that in many cases it is much better and almost as good as using the target pitch contour.

