Go to the first, previous, next, last section, table of contents.


18 Intonation

A number of different intonation modules are available with varying levels of control. In general intonation is generated in two steps.

  1. Prediction of accents (and/or end tones) on a per syllable basis.
  2. Prediction of F0 target values, this must be done after durations are predicted.

Reflecting this split there are two main intonation modules that call sub-modules depending on the desired intonation methods. The Intonation and Int_Targets modules are defined in Lisp (`lib/intonation.scm') and call sub-modules which are (so far) in C++.

18.1 Default intonation

This is the simplest form of intonation and offers the modules Intonation_Default and Intonation_Targets_Default. The first of which actually does nothing at all. Intonation_Targets_Default simply creates a target at the start of the utterance, and one at the end. The values of which, by default are 130 Hz and 110 Hz. These values may be set through the parameter duffint_params for example the following will general a monotone at 150Hz.

(set! duffint_params '((start 150) (end 150)))
(Parameter.set 'Int_Method 'DuffInt)
(Parameter.set 'Int_Target_Method Int_Targets_Default)

18.2 Simple intonation

This module uses the CART tree in int_accent_cart_tree to predict if each syllable is accented or not. A predicted value of NONE means no accent is generated by the corresponding Int_Targets_Simple function. Any other predicted value will cause a `hat' accent to be put on that syllable.

A default int_accent_cart_tree is available in the value simple_accent_cart_tree in `lib/intonation.scm'. It simply predicts accents on the stressed syllables on content words in poly-syllabic words, and on the only syllable in single syllable content words. Its form is

(set! simple_accent_cart_tree
 '
  ((R:SylStructure.parent.gpos is content)
   ((stress is 1)
    ((Accented))
    ((position_type is single)
     ((Accented))
     ((NONE))))
   ((NONE))))

The function Int_Targets_Simple uses parameters in the a-list in variable int_simple_params. There are two interesting parameters f0_mean which gives the mean F0 for this speaker (default 110 Hz) and f0_std is the standard deviation of F0 for this speaker (default 25 Hz). This second value is used to determine the amount of variation to be put in the generated targets.

For each Phrase in the given utterance an F0 is generated starting at f0_code+(f0_std*0.6) and declines f0_std Hz over the length of the phrase until the last syllable whose end is set to f0_code-f0_std. An imaginary line called baseline is drawn from start to the end (minus the final extra fall), For each syllable that is accented (i.e. has an IntEvent related to it) three targets are added. One at the start, one in mid vowel, and one at the end. The start and end are at position baseline Hz (as declined for that syllable) and the mid vowel is set to baseline+f0_std.

Note this model is not supposed to be complex or comprehensive but it offers a very quick and easy way to generate something other than a fixed line F0. Something similar to this has been for Spanish and Welsh without (too many) people complaining. However it is not designed as a serious intonation module.

18.3 Tree intonation

This module is more flexible. Two different CART trees can be used to predict `accents' and `endtones'. Although at present this module is used for an implementation of the ToBI intonation labelling system it could be used for many different types of intonation system.

The target module for this method uses a Linear Regression model to predict start mid-vowel and end targets for each syllable using arbitrarily specified features. This follows the work described in black96. The LR models are held as as described below See section 25.5 Linear regression. Three models are used in the variables f0_lr_start, f0_lr_mid and f0_lr_end.

18.4 Tilt intonation

Tilt description to be inserted.

18.5 General intonation

As there seems to be a number of intonation theories that predict F0 contours by rule (possibly using trained parameters) this module aids the external specification of such rules for a wide class of intonation theories (through primarily those that might be referred to as the ToBI group). This is designed to be multi-lingual and offer a quick way to port often pre-existing rules into Festival without writing new C++ code.

The accent prediction part uses the same mechanisms as the Simple intonation method described above, a decision tree for accent prediction, thus the tree in the variable int_accent_cart_tree is used on each syllable to predict an IntEvent.

The target part calls a specified Scheme function which returns a list of target points for a syllable. In this way any arbitrary tests may be done to produce the target points. For example here is a function which returns three target points for each syllable with an IntEvent related to it (i.e. accented syllables).

(define (targ_func1 utt syl)
  "(targ_func1 UTT STREAMITEM)
Returns a list of targets for the given syllable."
  (let ((start (item.feat syl 'syllable_start))
        (end (item.feat syl 'syllable_end)))
    (if (equal? (item.feat syl "R:Intonation.daughter1.name") "Accented")
        (list
         (list start 110)
         (list (/ (+ start end) 2.0) 140)
         (list end 100)))))

This function may be identified as the function to call by the following setup parameters.

(Parameter.set 'Int_Method 'General)
(Parameter.set 'Int_Target_Method Int_Targets_General)

(set! int_general_params
      (list 
       (list 'targ_func targ_func1)))

18.6 Using ToBI

An example implementation of a ToBI to F0 target module is included in `lib/tobi_rules.scm' based on the rules described in jilka96. This uses the general intonation method discussed in the previous section. This is designed to be useful to people who are experimenting with ToBI (silverman92), rather than general text to speech.

To use this method you need to load `lib/tobi_rules.scm' and call setup_tobi_f0_method. The default is in a male's pitch range, i.e. for voice_rab_diphone. You can change it for other pitch ranges by changing the folwoing variables.

(Parameter.set 'Default_Topline 110)
(Parameter.set 'Default_Start_Baseline 87)
(Parameter.set 'Default_End_Baseline 83)
(Parameter.set 'Current_Topline (Parameter.get 'Default_Topline))
(Parameter.set 'Valley_Dip 75)

An example using this from STML is given in `examples/tobi.stml'. But it can also be used from Scheme. For example before defining an utterance you should execute the following either from teh command line on in some setup file

(voice_rab_diphone)
(require 'tobi_rules)
(setup_tobi_f0_method)

In order to allow specification of accents, tones, and break levels you must use an utterance type that allows such specification. For example

(Utterance 
 Words
 (boy
  (saw ((accent H*)))
   the
   (girl ((accent H*)))
   in the 
   (park ((accent H*) (tone H-)))
   with the 
   (telescope ((accent H*) (tone H-H%)))))

(Utterance Words 
 (The
  (boy ((accent L*)))
  saw
  the
  (girl ((accent H*) (tone L-)))
  with 
  the
  (telescope ((accent H*) (tone H-H%))))))

You can display the the synthesized form of these utterance in Xwaves. Start an Xwaves and an Xlabeller and call the function display on the synthesized utterance.


Go to the first, previous, next, last section, table of contents.