This tutorial is designed to be an introduction to the Festival Speech Synthesis System, along with an introduction to building voices for the new Multisyn engine.
The commands that you have to type in this tutorial come in two types, firstly those that you have to type into a Unix shell and secondly that you have to type into Festival. Shell commands should to typed at a prompt which looks something like this:
It won't actually say `machine' and `user', it will say the name of the machine you are sitting at and the username you are logged in as.
Once Festival is running, the prompt will change (to remind you that Festival is running) and look like this:
You need to make sure you type the right commands at the right prompt, or you will get errors. Additionally, remember that all Festival commands MUST be enclosed in brackets, otherwise you will get errors.
All of the commands that you need to type in this tutorial are presented in the boxes with blue (or grey if it is printed in black and white) backgrounds, and the prompt is included before each command to make clear which are shell commands and which are Festival commands.
The very first time you login, you need to run the following command to create some files and directories for you:
Then, each time you log in (including the first time) you need to set a number of variables and paths for the system to run correctly:
First you need to familiarise yourself with Festival. Festival is started by running the `festival' command at a shell prompt.
Once Festival is running you can issue it commands to make it speak, change voice or many other things.
To make Festival speak you can use the SayText command:
If you want to keep the data structure that Festival generates during synthesis, you need to set a variable to the result of the SayText command. You do this like this:
Once you have synthesised an utterance you can do lots of things with it. Here are a few examples.
To change voice in Festival you need to run a voice command to select a new voice. All voice commands start with the name voice_. Try the following voice:
Compare the quality of this voice to the original voice.
Once you have started to type a command in Festival if you press the TAB key it will list any completions to this command that Festival recognises. So if you type (voice_ and press TAB, it will tell you which voices are available. You should see three voices listed. The default diphone voice: voice_kal_diphone. The HTS voice you just loaded: voice_cmu_us_awb_arctic_hts. There is also a multisyn voice called voice_em_nina_multisyn which won't run yet as this is the voice you are going to build! (You also see a voice_reset function with just resets the current voice.)
Finally to exit festival and return to the shell use the exit command.
First you must decide which voice you want to build. Your options are summarised below:
Voice Name | Voice Description |
2000 | A large database general purpose unit selection synthesiser |
2000f | A unit selection synthesiser for the communicator flight information domain |
500 | A small database unit selection synthesiser |
Each voice uses a different subset of the available data, and which subset you choose will determine the characteristics of your voice.
Once you have chosen which voice you want to build, run the following command (replacing VOICE with either 2000, 2000f or 500):
The actual voice is built from speech data in a number of different formats. Some of the formats are prepared in advance for you, others you will have to generate yourself. The different directories of data a listed below.
Directory | Description | Type |
---|---|---|
wav | Wave files for the utterance | provided |
pm | Pitch mark files | provided |
mfcc | MFCCs for alignment | provided |
lab | Label files from automatic alignment | to be made |
utt | Festival utterance structures | to be made |
f0 | Pitch contours | provided |
coef | MFCCs + f0, for join cost | provided |
coef2 | MFCCs + f0, stripped for join cost | to be made |
lpc | LPC and residuals, used for synthesis | provided |
Under normal circumstances you would have to generate all of the provided files yourself, but some are provided here to save time.
First you need to generate an initial label file containing the phone sequences for each utterance.
[unilex-rpx is the name of the pronunciation lexicon which is used]
Next you need to run the alignment script to generate an aligned label file:
This will take about 20 minutes for the 500 sentence voices, or about 1 hour and 20 minutes for the 2000 and 200f sentence voice. This may be a good time to go for lunch or take a look at the Festival manual.
Then you need to split the label file into individual files for each utterance.
This creates a number of files in the lab directory.
Start Wavesurfer
This should load the waveform. Now in the waveform window, right click in the dark blue box where the filename is displayed and select apply configuration. Select the configuration Euromasters
The next stage is to build utterance files for the database. These files describe the linguistic structure of each utterances in terms of phrases, words, syllables etc. The timings from the aligned label files are incorporated into this structure.
Festival is used to build the utterance.
This should create a number of files in the utt directory.
If you have changed a label file in such a way that the voice fails to build You can fix it with the following procedure to replace the broken label file with the original.
** You only need to follow this procedure if your voice failed to build ** Create a new label directory
Break the mlf file into this new directory
Copy the file in question (Look at the error generated by the voice building procedure)
Where XXXX completes the filename in question. Now rebuild the voice.
The final step is to generate the join cost coefficients. This step extracts appropriate frames which relate to the join points used by your labelling.
Run the following script
Your voice should now be built and is ready for testing.
To use your voice you need to run festival and then select the voice.
Start Festival and load the voice:
To make the voice speak, use the SayText command
Try different types of sentences and see how the voice behaves. In particular try to generate examples sentence giving flight information. Find someone next to you who has built a different voice and compare the same text with each of these voices.
To set the beam width for each list of candidate diphones:
To set the beam width to control the paths which are kept at each stage:
Synthesise utterances with different levels of pruning and compare the output.
You can see more information about which units we chosen by printing the Unit relation.
For each diphone a list of features describes which diphone was chosen and how good it was thought to be.
You now have sufficient knowledge to identify problems with your voice. Find an example of some badly synthesised speech, and try to track down what the problem is.
These exercises are designed to be more challenging that the main voice building exercise, and you are not necessarily expected to get this far or be able to complete these exercises. Feel free to pick and choose from them if you have time.
The Festival Manual can be found at http://www.cstr.ed.ac.uk/projects/festival/manual
The Festvox documentation for building voices can be found at http://www.festvox.org/festvox/festvox_toc.html (This includes a scheme overview and tutorial)
The online version of this document and accompanying slides used to introduce each session can be found at http://data.cstr.ed.ac.uk/euromasters
The full set of tools for building multisyn voices, including further documentation on their use can be found at http://www.cstr.ed.ac.uk/downloads/festival/multisyn_build
This document was generated using the LaTeX2HTML translator Version 2002-2-1 (1.70)
Copyright © 1993, 1994, 1995, 1996,
Nikos Drakos,
Computer Based Learning Unit, University of Leeds.
Copyright © 1997, 1998, 1999,
Ross Moore,
Mathematics Department, Macquarie University, Sydney.
The command line arguments were:
latex2html -split 0 -init_file stuff.perl tutorial.tex
The translation was initiated by Rob Clark on 2005-07-05