The Centre for Speech Technology Research, The university of Edinburgh

Festival FAQ: Running Festival

4

Altering Pause durations

This discussion relates mainly to diphone voice.

For some applications the default pause lengths applied after commas and fullstops are too short. Outlined below is one way of changing them.

By default Festival employs two levels of phrase break. Breaks (B) and big breaks (BB). Breaks generally occur after commas and semi colons, Bg breaks at the end of sentences. Read the Festival and Festvox documentation on phrasing for more specifics.

The pause inserted at a break has a defualt length of 0.2s and a pause inserted at a big break has a default length of 0.4s. These pauses are inserted as silence segments, and their durations are contolled by the default duration model in much th same way as the durations for other segments are controlled.

Duration is controlled in two ways. Each segment has a fixed average duration, which is then manipulated slightly depending on the segments context. this manipulation is performed by a statistically trained model for most segments. For silences however, it is done by hand.

The settings for durations are contained within a file which comes with the voice that you using. For example the settings for the kal voice are in the file:

festival/lib/voices/english/kal_diphone/festvox/kaldurtreeZ.scm
The default average durations are set by the variable kal_durs and the modifications are defined by the CART tree kal_duration_cart_tree. It is suggested that you do not change the default average durations unless you need particularly long pauses (see below), but rather just change the manipulation specified by the tree.

The modifications for silences have been appended to the top of the tree by hand. First, make your own copy of the tree (all of it) in a file and call it something like: my_kal_suration_cart_tree. It should look like this:

(set! my_kal_duration_cart_tree
'
((name is pau)
 ((emph_sil is +)
  ((0.0 -0.5))
  ((p.R:SylStructure.parent.parent.pbreak is BB)
   ((0.0 2.0))
   ((0.0 0.0))))
... rest of tree ...
The numbers that need to be changed are shown in red (for big breaks) and blue (for breaks). The formula for calulating total duration is:
d = m + s*v      d - total duration
m - defualt mean duratation (from kal_durs)
s - default standard deviation (from kal_durs)
v - value from tree
This gives us: d = 0.2 + 0.1*v. So v=0.0 gives us d=0.2, and v=2.0 gives us d=0.4 for the default total durations.

The hilighted numbers can be changed as required. There is a problem that Festival will not allow a v>3 (As this is really inapproprite for non-silence segments). So if you require a silence greater than 0.5s You will have to change the defualt mean and standard deviation. By introducing a my_kal_durs variable. Change the values for the pau segment and use the above formula to calculate your durations.

To use your own versions of the trees and defualt durations for synthesis. Set the following after selecting the voice you are using has been set up:

(set! duration_cart_tree my_kal_duration_cart_tree)
(set! duration_ph_info my_kal_durs)


3

Festival is very quiet. How do I make it louder?

The default loudness of an utterance is determined by how load the original database used to synthesise it. If you feel that festival is generating utterances that are too quiet, you can globally rescale the volume of all waveforms that are generated. Add the following:

 
      (set!  default_after_synth_hooks 
        (list 
          (lambda (utt)
            (utt.wave.rescale utt 1.0 t))))

to your siteinit.scm file. This file lives in festival/lib. Create it if it does not already exist.

It has been noted that the above code gets overridden when using the voice kal_diphone and fails to work. The solution here is to find the file festival/lib/voices/english/kal_diphone/festvox/kal_diphone.scm and change the line:

          (utt.wave.rescale utt 2.6)))
to
          (utt.wave.rescale utt 1 t)))



2

Festival speaks at double speed!

This seems to happen with linux and sound hardware which cannot natively support mono audio. In the past if mono audio was sent to a stereo-only sound card, the driver would send a copy to both channels, now it splits it between the channels, effectively doubling the speaking rate and raising the pitch by an octave.

It is left to the audio application to now check the hardware and send something that it can handle. Until festival's audio code is updated this will cause problems with some hardware.

There are 2 quick fixes: An expensive one and a cheap one.

  1. The expensive one is to get a decent sound card. This problem seems to occur mostly with cheap on-motherboard sound devices. Buy a reasonably priced SB16 or something.

  2. The cheap fix is to use and external play program (like play from the sox package)

    Create the file festival/lib/siteinit.scm (if you don't already have it) and add the following

    (Parameter.set 'Audio_Method 'Audio_Command)
    (Parameter.set 'Audio_Command "sox -t raw -sw -r $SR $FILE -c2 -t ossdsp /dev/dsp")
    


1

How do I run festival in server mode?

Festival 1.4.1 and later includes a script called festival_server which will start a festival server for you. For more information run:

festival_server --help

The simplest use of a the provided festival client connecting to this server would be:

echo "hello world" | festival_client --ttw | na_play



[back to Festival FAQ]

If you can't find what you need to know here, try the festival-talk mailing list.