This tutorial is designed to be an introduction to the Festival Speech Synthesis System, along with an introduction to building voices for the new Multisyn engine.
The commands that you have to type in this tutorial come in two types, firstly those that you have to type into a Unix shell and secondly that you have to type into Festival. Shell commands should to typed at a prompt which looks something like this:
It won't actually say `machine' and `user', it will say the name of the machine you are sitting at and the username you are logged in as.
Once Festival is running, the prompt will change (to remind you that Festival is running) and look like this:
You need to make sure you type the right commands at the right prompt, or you will get errors. Additionally, remember that all Festival commands MUST be enclosed in brackets, otherwise you will get errors.
All of the commands that you need to type in this tutorial are presented in the boxes with blue (or grey if it is printed in black and white) backgrounds, and the prompt is included before each command to make clear which are shell commands and which are Festival commands.
First you need to familiarise yourself with Festival. Festival is started by running the `festival' command at a shell prompt.
Once Festival is running you can issue it commands to make it speak, change voice or many other things.
To make Festival speak you can use the SayText command:
If you want to keep the data structure that Festival generates during synthesis, you need to set a variable to the result of the SayText command. You do this like this:
Once you have synthesised an utterance you can do lots of things with it. Here are a few examples.
To change voice in Festival you need to run a voice command to select a new voice. All voice commands start with the name voice_. Try the following voice:
Compare the quality of this voice to the original voice.
Once you have started to type a command in Festival if you press the TAB key it will list any completions to this command that Festival recognises. So if you type (voice_ and press TAB, it will tell you which voices are available. You should see three voices listed. The default diphone voice: voice_kal_diphone. The HTS voice you just loaded: voice_cmu_us_awb_arctic_hts. There is also a multisyn voice called voice_em_nina_multisyn which won't run yet as this is the voice you are going to build! (You also see a voice_reset function with just resets the current voice.)
Finally to exit festival and return to the shell use the exit command.
First you must decide which voice you want to build. Your options are summarised below:
|Voice Name||Voice Description|
|2000||A large database general purpose unit selection synthesiser|
|2000f||A unit selection synthesiser for the communicator flight information domain|
|500||A small database unit selection synthesiser|
Each voice uses a different subset of the available data, and which subset you choose will determine the characteristics of your voice.
The actual voice is built from speech data in a number of different formats. Some of the formats are prepared in advance for you, others you will have to generate yourself. The different directories of data a listed below.
|wav||Wave files for the utterance||provided|
|pm||Pitch mark files||provided|
|mfcc||MFCCs for alignment||provided|
|lab||Label files from automatic alignment||to be made|
|utt||Festival utterance structures||to be made|
|coef||MFCCs + f0, for join cost||provided|
|coef2||MFCCs + f0, stripped for join cost||to be made|
|lpc||LPC and residuals, used for synthesis||provided|
Under normal circumstances you would have to generate all of the provided files yourself, but some are provided here to save time.
The next stage is to build utterance files for the database. These files describe the linguistic structure of each utterances in terms of phrases, words, syllables etc. The timings from the aligned label files are incorporated into this structure.
If you have changed a label file in such a way that the voice fails to build You can fix it with the following procedure to replace the broken label file with the original.
The final step is to generate the join cost coefficients. This step extracts appropriate frames which relate to the join points used by your labelling.
Your voice should now be built and is ready for testing.
To use your voice you need to run festival and then select the voice.
Try different types of sentences and see how the voice behaves. In particular try to generate examples sentence giving flight information. Find someone next to you who has built a different voice and compare the same text with each of these voices.
Synthesise utterances with different levels of pruning and compare the output.
You can see more information about which units we chosen by printing the Unit relation.
For each diphone a list of features describes which diphone was chosen and how good it was thought to be.
You now have sufficient knowledge to identify problems with your voice. Find an example of some badly synthesised speech, and try to track down what the problem is.
These exercises are designed to be more challenging that the main voice building exercise, and you are not necessarily expected to get this far or be able to complete these exercises. Feel free to pick and choose from them if you have time.
The Festival Manual can be found at http://www.cstr.ed.ac.uk/projects/festival/manual
The Festvox documentation for building voices can be found at http://www.festvox.org/festvox/festvox_toc.html (This includes a scheme overview and tutorial)
The online version of this document and accompanying slides used to introduce each session can be found at http://data.cstr.ed.ac.uk/euromasters
The full set of tools for building multisyn voices, including further documentation on their use can be found at http://www.cstr.ed.ac.uk/downloads/festival/multisyn_build
This document was generated using the LaTeX2HTML translator Version 2002-2-1 (1.70)
Copyright © 1993, 1994, 1995, 1996,
Computer Based Learning Unit, University of Leeds.
Copyright © 1997, 1998, 1999, Ross Moore, Mathematics Department, Macquarie University, Sydney.
The command line arguments were:
latex2html -split 0 -init_file stuff.perl tutorial.tex
The translation was initiated by Rob Clark on 2005-07-05