The Centre for Speech Technology Research, The university of Edinburgh

Phase Two data release for Blizzard 2012

These data are in fact the same as the Phase One data.

Speech data

Data for Phase Two of the Blizzard Challenge 2012 was generously provided by Toshiba Research Europe Ltd, Cambridge Research Laboratory

This data is released under a slightly modified Creative Commons Attribution Share-Alike license. To download, you MUST REGISTER AS A PARTICIPANT IN THE CHALLENGE. Then, read and accept the license. Once we have received and manually verified your license, we will email you a password.


Please note that the voice building data for Phase Two are exactly the same as for Phase One, but see below for the development data which are new for Phase Two. The following files (sizes and md5 checksums are given in parentheses) are available to download from here: Each file contains data from one audiobook, along with transcriptions, labels and other information. Refer to the README file in each distribution for more information.

Updated labels for the above four audiobooks have been released - see the Readme in the distribution for an explanation.

Development data, tools, benchmark voices

For Phase Two, there is a development set comprising a variety of text types, along with synthetic speech from several example systems (and natural speech for a subset of the text types). This is primarily intended for use in task EH2.2, but may be used by participants in task EH2.1, subject to the rules (available on the main Blizzard website). The development data are available to download from here.

Test sentences

The test data are available to download from here.

Contact Simon King for more details.