Festival Text-to-Speech Online Demo - Technical
This is an interactive demo of CSTR's "Festival" speech synthesiser, which is software capable of making artificial speech in place of a real human. Festival is the most complete freeware multilingual, general-purpose synthesis system available. It is used by numerous research sites and other projects around the world. Further information is available on the Festival project page.
Unlike the simpler demo here, the demo on this page gives access to many more voices which have been developed for Festival. This is intended to allow closer scrutiny of the results of different synthesis methods and different subsystems at various stages of development. The following voices are included at present, with an indication of the amount of speech data used to build the voice:
- Scottish male - Alan (ARCTIC), Jon (2hr)
- English RP male - Nick (8hr), Roger (13hr), Korin (TIMIT, ~20mins)
- English RP female - Nina (3hr)
- American male - KAL (Communicator), RMS (ARCTIC), BDL (ARCTIC), JMK (ARCTIC)
- American female - SLT, CLB (both ARCTIC)
- HTS - a statistical parametric approach (both the 2005 and 2007 systems)
- Multisyn - standard unit selection concatenative approach
- Diphone - single instance diphone concatenation
(the previous TTS generation technology, from mid 1980's to mid 1990's).
If you have any questions, comments or suggestions, or experience any difficulties using this demo, then please consult the FAQ in the first place. If that doesn't address your query, please mail