The Cougar project investigates using an articulatory(-like) domain for the calculation of join costs in conjunction with smoothing unit transitions in unit selection speech synthesis.
Cougar stands for "Concatenation Of Units Guided by ARticulation."
Cougar is implemented within the Festival speech synthesis system.
In unit selection speech synthesis, computing join costs and performing spectral smoothing across joins are clearly inter-related tasks. For instance, it would be desirable to assign a lower join cost in cases where smoothing could be performed well between two segments. It would be attractive to include in the join cost an estimate of how well a join may be smoothed. Despite this, they are typically implemented independent of each other.
The Cougar project aims to integrate these two aspects of unit concatenation by using an articulatory(-like) domain for representation of the speech signal. In short, we are attempting to improve state-of-the-art speech synthesis performance by employing an articulatory(-like) representation and a priori knowledge about that representation in the unit selection synthesis process.
The term "articulation" can apply to data recorded using articulatory measurement systems, such as electromagnetic articulography. Alternatively, we also mean it to apply to some empirically learned representation of the speech signal (e.g. a hidden variable in a probabilistic model) with articulatory-like properties, such that trajectories through the pseudo-articulatory space are continuous and relatively smoothly and slowly varying.
Note that we are not proposing an articulatory model for waveform generation. Rather, we propose that there are benefits to calculating join costs and performing join smoothing in a unified manner using an underlying articulatory-like representation of the waveform fragments to be selected and joined.
The Engineering and Physical Science Research Council (EPSRC grant GR/R94688/01)