Submissions and listening test results from previous Blizzard Challenges

These distributions include the synthetic speech submitted by participants in the challenge along with listeners' scores for the subset synthetic speech used in the listening test. Possible uses of these materials include:

Re-running the challenge, such as with a subset of the systems plus a new system
Development of instrumental (objective signal-based) measures of synthetic speech naturalness, speaker similarity or intelligibility
Development of synthetic speech detectors for use as an anti-spoofing countermeasure against attacks on speaker identification systems

Terms of use

Generally, these data (including the wav files submitted by participants and the corresponding listening test scores) may be used for:

Research by academic institutions
Non-commercial research, including research conducted within commercial organisations

and some parts of the data also allow

Commercial research and development

The following restrictions apply to all data:

NOT for marketing or sales purposes
NOT for re-distribution

However, varying restrictions are associated with certain parts of the data (e.g., wav files from certain systems may be used for commercial research and development, whilst others may only be used for academic research). Where applicable, this is specified within the distributions. It is your responsibility to check the permissions described within each distribution and to ensure that you only use the data in a way that is consistent with those permissions. File sizes and checksums are given in parentheses.

2008 Version 1 (259 MB; 5360686aac07ffe22420ffd00c90ea74)
2009 Version 1 (11 GB; 1ffdf2c0ddb5f2e0c97908a70d0302b2)
2010 Version 1 (6.3 GB; e60d504c4e3a95d7792e6fd056e74aec)
2011 Version 1 (3.4 GB; 8e59a48f88568f86d1962644fdd568c5)
2012 Version 1 (4.3 GB; 5cc91609f1cd206e5753d73df403c277)
2013 Version 2 (4.3 GB; 837e20399409393332322fdd59d114de) - EH tasks only
2014 Version 1 (2.4GB; d749deb2d37d8065fe5e06170b3949ce)
2015 Version 1 (1.5GB; d13450210e627630e7a216771ffd7172)
2016 Version 1 (20GB; fff9d42a97161835f2545e02e5392e06)
2019 Version 1 (2.5GB; d99e6b7a8f6ec9219eec0e75d209de61)
2020 Version 1 (9.3GB; 65ed0a651db277ad8ee57c8c5dc45b90)
2021 Version 2 (2.4GB; 2052d9206151a1d3af4c754335864f4c)
2013 extension conducted in 2023 Version 1 (270 MB; 0cac8fc8de1d56ed57e4186966e70180) - EH2 task only (paper under review)
2023 Version 1 (7.9GB; d7f749a211a0ae577d2517d9a12c80b6)
2025 Version 1 (15GB; 73fba001b8af043ea3ace16ea4e6293f)

If you obtain any results based on this data, please:

Let us know (email Simon King)
Put an acknowledgement in all publications to "The organisers of the Blizzard Challenge"
Cite an appropriate reference, such as:
- "Measuring a decade of progress in Text-to-Speech", Simon King. Loquens, Vol 1, No 1 (2014). doi:10.3989/loquens.2014.006
- "The Blizzard Challenge 2013", Simon King and Vasilis Karaiskos, in Proc. Blizzard Challenge workshop 2013.
- "The Blizzard Challenge 2012", Simon King and Vasilis Karaiskos, in Proc. Blizzard Challenge workshop 2012.
- "The Blizzard Challenge 2011", Simon King and Vasilis Karaiskos, in Proc. Blizzard Challenge workshop 2011.
- "The Blizzard Challenge 2010", Simon King and Vasilis Karaiskos, in Proc. Blizzard Challenge workshop 2010.
- "The Blizzard Challenge 2009", Simon King and Vasilis Karaiskos, in Proc. Blizzard Challenge workshop 2009.
- "The Blizzard Challenge 2008", Vasilis Karaiskos, Simon King, Robert A. J. Clark, Catherine Mayo, in Proc. Blizzard Challenge workshop 2008.
- "The Blizzard Challenge -- 2005: Evaluating Corpus-Based Speech Synthesis on Common Datasets", Alan W. Black, Keiichi Tokuda, in Proc. Interspeech 2005, Lisbon, Portugal.
which can be found via the Blizzard Challenge website.

Additional data

Samsung have released a data set comprising newly-created synthetic speech for the test materials from the above 2007-2016 Challenges, and other sources, along with a very substantial number of listener ratings, as SOMOS: The Samsung Open MOS Dataset for the Evaluation of Neural Text-to-Speech Synthesis