The Centre for Speech Technology Research, The university of Edinburgh

Altering Pause durations


This discussion relates mainly to diphone voice.

For some applications the default pause lengths applied after commas and fullstops are too short. Outlined below is one way of changing them.

By default Festival employs two levels of phrase break. Breaks (B) and big breaks (BB). Breaks generally occur after commas and semi colons, Bg breaks at the end of sentences. Read the Festival and Festvox documentation on phrasing for more specifics.

The pause inserted at a break has a defualt length of 0.2s and a pause inserted at a big break has a default length of 0.4s. These pauses are inserted as silence segments, and their durations are contolled by the default duration model in much th same way as the durations for other segments are controlled.

Duration is controlled in two ways. Each segment has a fixed average duration, which is then manipulated slightly depending on the segments context. this manipulation is performed by a statistically trained model for most segments. For silences however, it is done by hand.

The settings for durations are contained within a file which comes with the voice that you using. For example the settings for the kal voice are in the file:

festival/lib/voices/english/kal_diphone/festvox/kaldurtreeZ.scm
The default average durations are set by the variable kal_durs and the modifications are defined by the CART tree kal_duration_cart_tree. It is suggested that you do not change the default average durations unless you need particularly long pauses (see below), but rather just change the manipulation specified by the tree.

The modifications for silences have been appended to the top of the tree by hand. First, make your own copy of the tree (all of it) in a file and call it something like: my_kal_suration_cart_tree. It should look like this:

(set! my_kal_duration_cart_tree
'
((name is pau)
 ((emph_sil is +)
  ((0.0 -0.5))
  ((p.R:SylStructure.parent.parent.pbreak is BB)
   ((0.0 2.0))
   ((0.0 0.0))))
... rest of tree ...
The numbers that need to be changed are shown in red (for big breaks) and blue (for breaks). The formula for calulating total duration is:
d = m + s*v      d - total duration
m - defualt mean duratation (from kal_durs)
s - default standard deviation (from kal_durs)
v - value from tree
This gives us: d = 0.2 + 0.1*v. So v=0.0 gives us d=0.2, and v=2.0 gives us d=0.4 for the default total durations.

The hilighted numbers can be changed as required. There is a problem that Festival will not allow a v>3 (As this is really inapproprite for non-silence segments). So if you require a silence greater than 0.5s You will have to change the defualt mean and standard deviation. By introducing a my_kal_durs variable. Change the values for the pau segment and use the above formula to calculate your durations.

To use your own versions of the trees and defualt durations for synthesis. Set the following after selecting the voice you are using has been set up:

(set! duration_cart_tree my_kal_duration_cart_tree)
(set! duration_ph_info my_kal_durs)

[back to Festival FAQ]

If you can't find what you need to know here, try the festival-talk mailing list.