Festival Text-to-Speech Online Demo - Technical

Select a Voice Type the text to synthesise (max 70 chars)

This is an interactive demo of CSTR's "Festival" speech synthesiser, which is software capable of making artificial speech in place of a real human. Festival is the most complete freeware multilingual, general-purpose synthesis system available. It is used by numerous research sites and other projects around the world. Further information is available on the Festival project page.

Unlike the simpler demo here, the demo on this page gives access to many more voices which have been developed for Festival. This is intended to allow closer scrutiny of the results of different synthesis methods and different subsystems at various stages of development. The following voices are included at present, with an indication of the amount of speech data used to build the voice:

Scottish male - Alan (ARCTIC), Jon (2hr)
English RP male - Nick (8hr), Roger (13hr), Korin (TIMIT, ~20mins)
English RP female - Nina (3hr)
American male - KAL (Communicator), RMS (ARCTIC), BDL (ARCTIC), JMK (ARCTIC)
American female - SLT, CLB (both ARCTIC)

Broadly, three synthesis methods are available in this demo:

HTS - a statistical parametric approach (both the 2005 and 2007 systems)
Multisyn - standard unit selection concatenative approach
Diphone - single instance diphone concatenation
(the previous TTS generation technology, from mid 1980's to mid 1990's).

Further questions...

If you have any questions, comments or suggestions, or experience any difficulties using this demo, then please consult the FAQ in the first place. If that doesn't address your query, please mail

(NOTE: This page is provided for demonstration purposes only. Direct use of the CGI synthesis interface is not permitted (computer resources reserved for this demo are limited). The speech audio produced by this demo is for private, non-commercial use only. For further details on our usage policy, see the usage section of the FAQ.)