Srikanth Ronanki, Oliver Watts, Simon King, and Gustav Eje Henter. Median-Based Generation of Synthetic Speech Durations using a Non-Parametric Approach. In Proc. IEEE Workshop on Spoken Language Technology (SLT), December 2016. [ bib | .pdf | Abstract ]

Ondrej Klejch, Peter Bell, and Steve Renals. Punctuated transcription of multi-genre broadcasts using acoustic and lexical approaches. In Proc. IEEE Workshop on Spoken Language Technology, San Diego, USA, December 2016. [ bib | .pdf | Abstract ]

P. Swietojanski and S. Renals. Differentiable Pooling for Unsupervised Acoustic Model Adaptation. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 24(10):1773-1784, October 2016. [ bib | DOI | .pdf | Abstract ]

Joachim Fainberg, Peter Bell, Mike Lincoln, and Steve Renals. Improving children's speech recognition through out-of-domain data augmentation. In Proc. Interspeech, San Francisco, USA, September 2016. [ bib | .pdf | Abstract ]

Srikanth Ronanki, Siva Reddy, Bajibabu Bollepalli, and Simon King. DNN-based Speech Synthesis for Indian Languages from ASCII text. In Proc. 9th ISCA Speech Synthesis Workshop (SSW9), Sunnyvale, CA, USA, September 2016. [ bib | .pdf | Abstract ]

Srikanth Ronanki, Gustav Eje Henter, Zhizheng Wu, and Simon King. A template-based approach for speech synthesis intonation generation using LSTMs. In Proc. Interspeech, San Francisco, USA, September 2016. [ bib | .pdf | Abstract ]

Siva Reddy Gangireddy, Pawel Swietojanski, Peter Bell, and Steve Renals. Unsupervised adaptation of Recurrent Neural Network Language Models. In Proc. Interspeech, San Francisco, USA, September 2016. [ bib | .pdf | Abstract ]

Jean-Philippe Goldman, Pierre-Edouard Honnet, Rob Clark, Philip N Garner, Maria Ivanova, Alexandros Lazaridis, Hui Liang, Tiago Macedo, Beat Pfister, Manuel Sam Ribeiro, et al. The SIWIS database: a multilingual speech database with acted emphasis. In Proceedings of Interspeech, San Francisco, United States, September 2016. [ bib | .PDF | Abstract ]

Manuel Sam Ribeiro, Oliver Watts, and Junichi Yamagishi. Syllable-level representations of suprasegmental features for DNN-based text-to-speech synthesis. In Proceedings of Interspeech, San Francisco, United States, September 2016. [ bib | .PDF | Abstract ]

Manuel Sam Ribeiro, Oliver Watts, and Junichi Yamagishi. Parallel and cascaded deep neural networks for text-to-speech synthesis. In 9th ISCA Workshop on Speech Synthesis (SSW9), Sunnyvale, United States, September 2016. [ bib | .pdf | Abstract ]

Cassia Valentini-Botinhao, Xin Wang, Shinji Takaki, and Junichi Yamagishi. Speech enhancement for a noise-robust text-to-speech synthesis system using deep recurrent neural networks. In Interspeech, pages 352-356. ISCA, September 2016. [ bib | DOI | .pdf | Abstract ]

Cassia Valentini-Botinhao, Xin Wang, Shinji Takaki, and Junichi Yamagishi. Investigating RNN-based speech enhancement methods for noise-robust text-to-speech. In Proceedings of 9th ISCA Speech Synthesis Workshop, pages 159-165, September 2016. [ bib | .pdf | Abstract ]

Srikanth Ronanki, Zhizheng Wu, Oliver Watts, and Simon King. A Demonstration of the Merlin Open Source Neural Network Speech Synthesis System. In Proc. Speech Synthesis Workshop (SSW9), September 2016. [ bib | .pdf | Abstract ]

Felipe Espic, Cassia Valentini-Botinhao, Zhizheng Wu, and Simon King. Waveform generation based on signal reshaping for statistical parametric speech synthesis. In Proc. Interspeech, pages 2263-2267, San Francisco, CA, USA, September 2016. [ bib | .PDF | Abstract ]

Fernando Villavicencio, Junichi Yamagishi, Jordi Bonada, and Felipe Espic. Applying spectral normalisation and efficient envelope estimation and statistical transformation for the voice conversion challenge 2016. In Interspeech, pages 1657-61, San Francisco, USA, September 2016. [ bib | DOI | http | .PDF | Abstract ]

Takenori Yoshimura, Gustav Eje Henter, Oliver Watts, Mirjam Wester, Junichi Yamagishi, and Keiichi Tokuda. A hierarchical predictor of synthetic speech naturalness using neural networks. In Interspeech 2016, pages 342-346. International Speech Communication Association, September 2016. [ bib | DOI | .PDF | Abstract ]

Zhizheng Wu, Oliver Watts, and Simon King. Merlin: An open source neural network speech synthesis system. In 9th ISCA Speech Synthesis Workshop (2016), pages 218-223, September 2016. [ bib | .pdf | Abstract ]

P. Swietojanski, J. Li, and S. Renals. Learning hidden unit contributions for unsupervised acoustic model adaptation. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 24(8):1450-1463, August 2016. [ bib | DOI | .pdf | Abstract ]

Mirjam Wester, Oliver Watts, and Gustav Eje Henter. Evaluating comprehension of natural and synthetic conversational speech. In Speech Prosody, volume 8, pages 736-740, Boston, MA, June 2016. [ bib | .pdf | .pdf | Abstract ]

Rasmus Dall, Sandrine Brognaux, Korin Richmond, Cassia Valentini-Botinhao, Gustav Eje Henter, Julia Hirschberg, and Junichi Yamagishi. Testing the consistency assumption: pronunciation variant forced alignment in read and spontaneous speech synthesis. In Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), pages 5155-5159, March 2016. [ bib | .pdf | Abstract ]

Qiong Hu, Junichi Yamagishi, Korin Richmond, Kartick Subramanian, and Yannis Stylianou. Initial investigation of speech synthesis based on complex-valued neural networks. In Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), pages 5630-5634, March 2016. [ bib | .pdf | Abstract ]

Korin Richmond and Simon King. Smooth talking: Articulatory join costs for unit selection. In Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), pages 5150-5154, March 2016. [ bib | .pdf | Abstract ]

P. Swietojanski and S. Renals. Sat-lhuc: Speaker adaptive training for learning hidden unit contributions. In Proc. IEEE ICASSP, Shanghai, China, March 2016. [ bib | .pdf | Abstract ]

Arne Leijon, Gustav Eje Henter, and Martin Dahlquist. Bayesian analysis of phoneme confusion matrices. IEEE/ACM T. Audio Speech, 24(3):469-482, March 2016. [ bib | http | .pdf | Abstract ]

Gustav Eje Henter, Srikanth Ronanki, Oliver Watts, Mirjam Wester, Zhizheng Wu, and Simon King. Robust TTS duration modelling using DNNs. In Proc. ICASSP, volume 41, pages 5130-5134, Shanghai, China, March 2016. [ bib | http | .pdf | Abstract ]

Oliver Watts, Gustav Eje Henter, Thomas Merritt, Zhizheng Wu, and Simon King. From HMMs to DNNs: where do the improvements come from? In Proc. ICASSP, volume 41, pages 5505-5509, Shanghai, China, March 2016. [ bib | http | .pdf | Abstract ]

Manuel Sam Ribeiro, Oliver Watts, Junichi Yamagishi, and Robert A. J. Clark. Wavelet-based decomposition of f0 as a secondary task for DNN-based speech synthesis with multi-task learning. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Shanghai, China, March 2016. [ bib | .pdf | Abstract ]

Yan Tang, Martin Cooke, and Cassia Valentini-Botinhao. Evaluating the predictions of objective intelligibility metrics for modified and synthetic speech. Computer Speech & Language, 35:73 - 92, 2016. [ bib | DOI | Abstract ]

Adriana Stan, Yoshitaka Mamiya, Junichi Yamagishi, Peter Bell, Oliver Watts, Rob Clark, and Simon King. ALISA: An automatic lightly supervised speech segmentation and alignment tool. Computer Speech and Language, 35:116-133, 2016. [ bib | DOI | http | .pdf | Abstract ]

Tomoki Toda, Ling-Hui Chen, Daisuke Saito, Fernando Villavicencio, Mirjam Wester, Zhizheng Wu, and Junichi Yamagishi. The voice conversion challenge 2016. In Proc. Interspeech, 2016. [ bib | .pdf | Abstract ]

Mirjam Wester, Zhizheng Wu, and Junichi Yamagishi. Analysis of the voice conversion challenge 2016 evaluation results. In Proc. Interspeech, 2016. [ bib | .pdf | Abstract ]

Mirjam Wester, Zhizheng Wu, and Junichi Yamagishi. Multidimensional scaling of systems in the voice conversion challenge 2016. In Proc. Speech Synthesis Workshop 9, Sunnyvale, CA., 2016. [ bib | .pdf | Abstract ]

Ahmed Ali, Najim Dehak, Patrick Cardinal, Sameer Khurana, Sree Harsha Yella, James Glass, Peter Bell, and Steve Renals. Automatic dialect detection in arabic broadcast speech. In Proc. Interspeech, 2016. [ bib | .pdf | Abstract ]

Rasmus Dall, Kei Hashimoto, Keiichiro Oura, Yoshihiko Nankaku, and Keiichi Tokuda. Redefining the Linguistic Context Feature Set for HMM and DNN TTS Through Position and Parsing. In Proc. Interspeech, San Francisco, CA, USA, 2016. [ bib | .pdf | Abstract ]

Rasmus Dall, Marcus Tomalin, and Mirjam Wester. Synthesising Filled Pauses: Representation and Datamixing. In Proc. SSW9, Cupertino, CA, USA, 2016. [ bib | .pdf | Abstract ]

Rasmus Dall and Xavi Gonzalvo. JNDSLAM: A SLAM extension for Speech Synthesis. In Proc. Speech Prosody, Boston, USA, 2016. [ bib | .pdf | Abstract ]

P. Swietojanski and S. Renals. SAT-LHUC: Speaker adaptive training for learning hidden unit contributions. In Proc. IEEE Int. Conf. Acoustic, Speech Signal Processing (ICASSP), pages 5010-5014, 2016. [ bib | .pdf | Abstract ]

P. Swietojanski. Learning Representations for Speech Recognition using Artificial Neural Networks. PhD thesis, University of Edinburgh, 2016. [ bib | .pdf | Abstract ]

Thomas Merritt, Robert A J Clark, Zhizheng Wu, Junichi Yamagishi, and Simon King. Deep neural network-guided unit selection synthesis. In Proc. ICASSP, 2016. [ bib | .pdf | Abstract ]

Thomas Merritt, Srikanth Ronanki, Zhizheng Wu, and Oliver Watts. The CSTR entry to the Blizzard Challenge 2016. In Proc. Blizzard Challenge, 2016. [ bib | .pdf | Abstract ]

Mireia Farrus, Catherine Lai, and Johanna D. Moore. Paragraph-based prosodic cues for speech synthesis applications. In Proceedings of Speech Prosody 2016, pages 1143-1147, Boston, MA, USA, 2016. [ bib | DOI | .pdf | Abstract ]

Catherine Lai, Mireia Farrus, and Johanna Moore. Automatic Paragraph Segmentation with Lexical and Prosodic Features. In Proceedings of Interspeech 2016, San Francisco, CA, USA, 2016. [ bib | .pdf | Abstract ]

Qiong Hu. Statistical parametric speech synthesis based on sinusoidal models. PhD thesis, University of Edinburgh, 2016. [ bib | .pdf | Abstract ]

Adriana Stan, Cassia Valentini-Botinhao, Bogdan Orza, and Mircea Giurgiu. Blind speech segmentation using spectrogram image-based features and mel cepstral coefficients. In SLT, pages 597-602. IEEE, 2016. [ bib | DOI | .pdf | Abstract ]

A. Ali, P. Bell, J. Glass, Y. Messaoui, H. Mubarak, S. Renals, and Y. Zhang. The MGB-2 Challenge: Arabic multi-dialect broadcast media recognition. In Proc. SLT, 2016. [ bib | .pdf | Abstract ]

Leimin Tian, Johanna Moore, and Catherine Lai. Recognizing emotions in spoken dialogue with hierarchically fused acoustic and lexical features. In Spoken Language Technology Workshop (SLT), 2016 IEEE, pages 565-572. IEEE, 2016. [ bib | .pdf | Abstract ]