The Centre for Speech Technology Research, The university of Edinburgh

CSTR member publications

2018 Onwards

Publications from 2018 onwards can be found via the CSTR part of the Edinburgh Research Explorer website.

2017

Ahmed Ali, Preslav Nakov, Peter Bell, and Steve Renals. Werd: Using social text spelling variants for evaluating dialectal speech recognition. In Proc. ASRU. IEEE, December 2017. [ bib | .pdf | Abstract ]

Joanna Rownicka, Steve Renals, and Peter Bell. Simplifying very deep convolutional neural network architectures for robust speech recognition. In Proc. 2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), Okinawa, Japan, December 2017. [ bib | .pdf | Abstract ]

Emiru Tsunoo, Ondrej Klejch, Peter Bell, and Steve Renals. Hierarchical recurrent neural network for story segmentation using fusion of lexical and acoustic features. In Proc. IEEE Automatic Speech Recognition and Understanding Workshop, Okinawa, Japan, December 2017. [ bib | .pdf | Abstract ]

Leimin Tian, Johanna Moore, and Catherine Lai. Recognizing Emotions in Spoken Dialogue with Acoustic and Lexical Cues. In ICMI 2017 Satellite Workshop Investigating Social Interactions with Artificial Agents, November 2017. [ bib | .pdf | Abstract ]

Manuel Sam Ribeiro, Oliver Watts, and Junichi Yamagishi. Learning word vector representations based on acoustic counts. In Proceedings of Interspeech, Stockholm, Sweden, August 2017. [ bib | .PDF | Abstract ]

Srikanth Ronanki, Oliver Watts, and Simon King. A Hierarchical Encoder-Decoder Model for Statistical Parametric Speech Synthesis. In Proc. Interspeech 2017, August 2017. [ bib | .pdf | Abstract ]

Srikanth Ronanki, Sam Ribeiro, Felipe Espic, and Oliver Watts. The CSTR entry to the Blizzard Challenge 2017. In Proc. Blizzard Challenge Workshop (Interspeech Satellite), Stockholm, Sweden, August 2017. [ bib | .pdf | Abstract ]

Emiru Tsunoo, Ondrej Klejch, Peter Bell, and Steve Renals. Hierarchical recurrent neural network for story segmentation using fusion of lexical and acoustic features. In Proc. ASRU. IEEE, August 2017. [ bib | .pdf | Abstract ]

Peter Bell, Joachim Fainberg, Catherine Lai, and Mark Sinclair. A system for real-time collaborative transcription correction. In Proc. Interspeech (demo session), August 2017. [ bib | .pdf | Abstract ]

Emiru Tsunoo, Peter Bell, and Steve Renals. Hierarchical recurrent neural network for story segmentation. In Proc. Interspeech, August 2017. [ bib | .pdf | Abstract ]

Felipe Espic, Cassia Valentini-Botinhao, and Simon King. Direct modelling of magnitude and phase spectra for statistical parametric speech synthesis. In Proc. Interspeech, Stochohlm, Sweden, August 2017. [ bib | .PDF | Abstract ]

Michael Pucher, Bettina Zillinger, Markus Toman, Dietmar Schabus, Cassia Valentini-Botinhao, Junichi Yamagishi, Erich Schmid, and Thomas Woltron. Influence of speaker familiarity on blind and visually impaired children and young adults perception of synthetic voices. Computer Speech and Language, 46:179-195, June 2017. [ bib | DOI | Abstract ]

Joseph Mendelson, Pilar Oplustil, Oliver Watts, and Simon King. Nativization of foreign names in tts for automatic reading of world news in swahili. In Interspeech 2017, May 2017. [ bib | .pdf | Abstract ]

Renars Liepins, Ulrich Germann, Guntis Barzdins, Alexandra Birch, Steve Renals, Susanne Weber, Peggy van der Kreeft, Hervé Bourlard, João Prieto, Ondřej Klejch, Peter Bell, Alexandros Lazaridis, Alfonso Mendes, Sebastian Riedel, Mariana S. C. Almeida, Pedro Balage, Shay Cohen, Tomasz Dwojak, Phil Garner, Andreas Giefer, Marcin Junczys-Dowmunt, Hina Imrani, David Nogueira, Ahmed Ali, Sebastião Miranda, Andrei Popescu-Belis, Lesly Miculicich Werlen, Nikos Papasarantopoulos, Abiola Obamuyide, Clive Jones, Fahim Dalvi, Andreas Vlachos, Yang Wang, Sibo Tong, Rico Sennrich, Nikolaos Pappas, Shashi Narayan, Marco Damonte, Nadir Durrani, Sameer Khurana, Ahmed Abdelali, Hassan Sajjad, Stephan Vogel, David Sheppey, and Chris Hernon. The summa platform prototype. In Proceedings of the EACL 2017 Software Demonstrations, page 116–119. Association for Computational Linguistics (ACL), April 2017. [ bib | .pdf | Abstract ]

Ondrej Klejch, Peter Bell, and Steve Renals. Sequence-to-sequence models for punctuated transcription combining lexical and acoustic features. In Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), New Orleans, USA, March 2017. [ bib | .pdf | Abstract ]

Cassia Valentini-Botinhao and Junichi Yamagishi. Speech intelligibility in cars: the effect of speaking style, noise and listener age. In Interspeech, 2017. [ bib | .pdf | Abstract ]

Jaime Lorenzo-Trueba, Cassia Valentini-Botinhao, Gustav Henter, and Junichi Yamagishi. Misperceptions of the emotional content of natural and vocoded speech in a car. In Interspeech, 2017. [ bib | .pdf | Abstract ]

Joachim Fainberg, Steve Renals, and Peter Bell. Factorised representations for neural network adaptation to diverse acoustic environments. Proc. Interspeech 2017, pages 749-753, 2017. [ bib | .pdf | Abstract ]

Peter Bell, Pawel Swietojanski, and Steve Renals. Multitask learning of context-dependent targets in deep neural network acoustic models. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 25(2):238-247, 2017. [ bib | .pdf | Abstract ]

Matthew P. Aylett, Alessandro Vinciarelli, and Mirjam Wester. Speech synthesis for the generation of artificial personality. IEEE Transactions on Affective Computing, 2017. [ bib | DOI | .pdf | Abstract ]

Peter Bell, Joachim Fainberg, Catherine Lai, and Mark Sinclair. A system for real time collaborative transcription correction. In Proceedings of Interspeech 2017, pages 817-818, 2017. [ bib | .PDF | Abstract ]

Leimin Tian, Michal Muszynski, Catherine Lai, Johanna Moore, Theodoros Kostoulas, Patrizia Lombardo, Thierry Pun, and Guillame Chanel. Recognizing Induced Emotions of Movie Audiences: Are Induced and Perceived Emotions the Same? In Seventh International Conference on Affective Computing and Intelligent Interaction (ACII2017), 2017. [ bib | .pdf | Abstract ]

Janine Kleinhans, Mireia Farrús, Agustín Gravano, Juan Manuel Pérez, Catherine Lai, and Leo Wanner. Using prosody to classify discourse relations. In Proceedings of Interspeech 2017, pages 3201-3205, 2017. [ bib | .PDF | Abstract ]

Srikanth Ronanki, Manuel Sam Ribeiro, Felipe Espic, and Oliver Watts. The CSTR entry to the Blizzard Challenge 2017. In Proc. Blizzard Challenge, 2017. [ bib | .pdf | Abstract ]

2016

Srikanth Ronanki, Oliver Watts, Simon King, and Gustav Eje Henter. Median-Based Generation of Synthetic Speech Durations using a Non-Parametric Approach. In Proc. IEEE Workshop on Spoken Language Technology (SLT), December 2016. [ bib | .pdf | Abstract ]

Ondrej Klejch, Peter Bell, and Steve Renals. Punctuated transcription of multi-genre broadcasts using acoustic and lexical approaches. In Proc. IEEE Workshop on Spoken Language Technology, San Diego, USA, December 2016. [ bib | .pdf | Abstract ]

P. Swietojanski and S. Renals. Differentiable Pooling for Unsupervised Acoustic Model Adaptation. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 24(10):1773-1784, October 2016. [ bib | DOI | .pdf | Abstract ]

Joachim Fainberg, Peter Bell, Mike Lincoln, and Steve Renals. Improving children's speech recognition through out-of-domain data augmentation. In Proc. Interspeech, San Francisco, USA, September 2016. [ bib | .pdf | Abstract ]

Srikanth Ronanki, Siva Reddy, Bajibabu Bollepalli, and Simon King. DNN-based Speech Synthesis for Indian Languages from ASCII text. In Proc. 9th ISCA Speech Synthesis Workshop (SSW9), Sunnyvale, CA, USA, September 2016. [ bib | .pdf | Abstract ]

Srikanth Ronanki, Gustav Eje Henter, Zhizheng Wu, and Simon King. A template-based approach for speech synthesis intonation generation using LSTMs. In Proc. Interspeech, San Francisco, USA, September 2016. [ bib | .pdf | Abstract ]

Siva Reddy Gangireddy, Pawel Swietojanski, Peter Bell, and Steve Renals. Unsupervised adaptation of Recurrent Neural Network Language Models. In Proc. Interspeech, San Francisco, USA, September 2016. [ bib | .pdf | Abstract ]

Jean-Philippe Goldman, Pierre-Edouard Honnet, Rob Clark, Philip N Garner, Maria Ivanova, Alexandros Lazaridis, Hui Liang, Tiago Macedo, Beat Pfister, Manuel Sam Ribeiro, et al. The SIWIS database: a multilingual speech database with acted emphasis. In Proceedings of Interspeech, San Francisco, United States, September 2016. [ bib | .PDF | Abstract ]

Manuel Sam Ribeiro, Oliver Watts, and Junichi Yamagishi. Syllable-level representations of suprasegmental features for DNN-based text-to-speech synthesis. In Proceedings of Interspeech, San Francisco, United States, September 2016. [ bib | .PDF | Abstract ]

Manuel Sam Ribeiro, Oliver Watts, and Junichi Yamagishi. Parallel and cascaded deep neural networks for text-to-speech synthesis. In 9th ISCA Workshop on Speech Synthesis (SSW9), Sunnyvale, United States, September 2016. [ bib | .pdf | Abstract ]

Cassia Valentini-Botinhao, Xin Wang, Shinji Takaki, and Junichi Yamagishi. Speech enhancement for a noise-robust text-to-speech synthesis system using deep recurrent neural networks. In Interspeech, pages 352-356. ISCA, September 2016. [ bib | DOI | .pdf | Abstract ]

Cassia Valentini-Botinhao, Xin Wang, Shinji Takaki, and Junichi Yamagishi. Investigating RNN-based speech enhancement methods for noise-robust text-to-speech. In Proceedings of 9th ISCA Speech Synthesis Workshop, pages 159-165, September 2016. [ bib | .pdf | Abstract ]

Srikanth Ronanki, Zhizheng Wu, Oliver Watts, and Simon King. A Demonstration of the Merlin Open Source Neural Network Speech Synthesis System. In Proc. Speech Synthesis Workshop (SSW9), September 2016. [ bib | .pdf | Abstract ]

Felipe Espic, Cassia Valentini-Botinhao, Zhizheng Wu, and Simon King. Waveform generation based on signal reshaping for statistical parametric speech synthesis. In Proc. Interspeech, pages 2263-2267, San Francisco, CA, USA, September 2016. [ bib | .PDF | Abstract ]

Fernando Villavicencio, Junichi Yamagishi, Jordi Bonada, and Felipe Espic. Applying spectral normalisation and efficient envelope estimation and statistical transformation for the voice conversion challenge 2016. In Interspeech, pages 1657-61, San Francisco, USA, September 2016. [ bib | DOI | http | .PDF | Abstract ]

Takenori Yoshimura, Gustav Eje Henter, Oliver Watts, Mirjam Wester, Junichi Yamagishi, and Keiichi Tokuda. A hierarchical predictor of synthetic speech naturalness using neural networks. In Interspeech 2016, pages 342-346. International Speech Communication Association, September 2016. [ bib | DOI | .PDF | Abstract ]

Zhizheng Wu, Oliver Watts, and Simon King. Merlin: An open source neural network speech synthesis system. In 9th ISCA Speech Synthesis Workshop (2016), pages 218-223, September 2016. [ bib | .pdf | Abstract ]

P. Swietojanski, J. Li, and S. Renals. Learning hidden unit contributions for unsupervised acoustic model adaptation. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 24(8):1450-1463, August 2016. [ bib | DOI | .pdf | Abstract ]

Mirjam Wester, Oliver Watts, and Gustav Eje Henter. Evaluating comprehension of natural and synthetic conversational speech. In Speech Prosody, volume 8, pages 736-740, Boston, MA, June 2016. [ bib | .pdf | .pdf | Abstract ]

Rasmus Dall, Sandrine Brognaux, Korin Richmond, Cassia Valentini-Botinhao, Gustav Eje Henter, Julia Hirschberg, and Junichi Yamagishi. Testing the consistency assumption: pronunciation variant forced alignment in read and spontaneous speech synthesis. In Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), pages 5155-5159, March 2016. [ bib | .pdf | Abstract ]

Qiong Hu, Junichi Yamagishi, Korin Richmond, Kartick Subramanian, and Yannis Stylianou. Initial investigation of speech synthesis based on complex-valued neural networks. In Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), pages 5630-5634, March 2016. [ bib | .pdf | Abstract ]

Korin Richmond and Simon King. Smooth talking: Articulatory join costs for unit selection. In Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), pages 5150-5154, March 2016. [ bib | .pdf | Abstract ]

P. Swietojanski and S. Renals. Sat-lhuc: Speaker adaptive training for learning hidden unit contributions. In Proc. IEEE ICASSP, Shanghai, China, March 2016. [ bib | .pdf | Abstract ]

Arne Leijon, Gustav Eje Henter, and Martin Dahlquist. Bayesian analysis of phoneme confusion matrices. IEEE/ACM T. Audio Speech, 24(3):469-482, March 2016. [ bib | http | .pdf | Abstract ]

Gustav Eje Henter, Srikanth Ronanki, Oliver Watts, Mirjam Wester, Zhizheng Wu, and Simon King. Robust TTS duration modelling using DNNs. In Proc. ICASSP, volume 41, pages 5130-5134, Shanghai, China, March 2016. [ bib | http | .pdf | Abstract ]

Oliver Watts, Gustav Eje Henter, Thomas Merritt, Zhizheng Wu, and Simon King. From HMMs to DNNs: where do the improvements come from? In Proc. ICASSP, volume 41, pages 5505-5509, Shanghai, China, March 2016. [ bib | http | .pdf | Abstract ]

Manuel Sam Ribeiro, Oliver Watts, Junichi Yamagishi, and Robert A. J. Clark. Wavelet-based decomposition of f0 as a secondary task for DNN-based speech synthesis with multi-task learning. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Shanghai, China, March 2016. [ bib | .pdf | Abstract ]

Yan Tang, Martin Cooke, and Cassia Valentini-Botinhao. Evaluating the predictions of objective intelligibility metrics for modified and synthetic speech. Computer Speech & Language, 35:73 - 92, 2016. [ bib | DOI | Abstract ]

Adriana Stan, Yoshitaka Mamiya, Junichi Yamagishi, Peter Bell, Oliver Watts, Rob Clark, and Simon King. ALISA: An automatic lightly supervised speech segmentation and alignment tool. Computer Speech and Language, 35:116-133, 2016. [ bib | DOI | http | .pdf | Abstract ]

Tomoki Toda, Ling-Hui Chen, Daisuke Saito, Fernando Villavicencio, Mirjam Wester, Zhizheng Wu, and Junichi Yamagishi. The voice conversion challenge 2016. In Proc. Interspeech, 2016. [ bib | .pdf | Abstract ]

Mirjam Wester, Zhizheng Wu, and Junichi Yamagishi. Analysis of the voice conversion challenge 2016 evaluation results. In Proc. Interspeech, 2016. [ bib | .pdf | Abstract ]

Mirjam Wester, Zhizheng Wu, and Junichi Yamagishi. Multidimensional scaling of systems in the voice conversion challenge 2016. In Proc. Speech Synthesis Workshop 9, Sunnyvale, CA., 2016. [ bib | .pdf | Abstract ]

Ahmed Ali, Najim Dehak, Patrick Cardinal, Sameer Khurana, Sree Harsha Yella, James Glass, Peter Bell, and Steve Renals. Automatic dialect detection in arabic broadcast speech. In Proc. Interspeech, 2016. [ bib | .pdf | Abstract ]

Rasmus Dall, Kei Hashimoto, Keiichiro Oura, Yoshihiko Nankaku, and Keiichi Tokuda. Redefining the Linguistic Context Feature Set for HMM and DNN TTS Through Position and Parsing. In Proc. Interspeech, San Francisco, CA, USA, 2016. [ bib | .pdf | Abstract ]

Rasmus Dall, Marcus Tomalin, and Mirjam Wester. Synthesising Filled Pauses: Representation and Datamixing. In Proc. SSW9, Cupertino, CA, USA, 2016. [ bib | .pdf | Abstract ]

Rasmus Dall and Xavi Gonzalvo. JNDSLAM: A SLAM extension for Speech Synthesis. In Proc. Speech Prosody, Boston, USA, 2016. [ bib | .pdf | Abstract ]

P. Swietojanski and S. Renals. SAT-LHUC: Speaker adaptive training for learning hidden unit contributions. In Proc. IEEE Int. Conf. Acoustic, Speech Signal Processing (ICASSP), pages 5010-5014, 2016. [ bib | .pdf | Abstract ]

P. Swietojanski. Learning Representations for Speech Recognition using Artificial Neural Networks. PhD thesis, University of Edinburgh, 2016. [ bib | .pdf | Abstract ]

Thomas Merritt, Robert A J Clark, Zhizheng Wu, Junichi Yamagishi, and Simon King. Deep neural network-guided unit selection synthesis. In Proc. ICASSP, 2016. [ bib | .pdf | Abstract ]

Thomas Merritt, Srikanth Ronanki, Zhizheng Wu, and Oliver Watts. The CSTR entry to the Blizzard Challenge 2016. In Proc. Blizzard Challenge, 2016. [ bib | .pdf | Abstract ]

Mireia Farrus, Catherine Lai, and Johanna D. Moore. Paragraph-based prosodic cues for speech synthesis applications. In Proceedings of Speech Prosody 2016, pages 1143-1147, Boston, MA, USA, 2016. [ bib | DOI | .pdf | Abstract ]

Catherine Lai, Mireia Farrus, and Johanna Moore. Automatic Paragraph Segmentation with Lexical and Prosodic Features. In Proceedings of Interspeech 2016, San Francisco, CA, USA, 2016. [ bib | .pdf | Abstract ]

Qiong Hu. Statistical parametric speech synthesis based on sinusoidal models. PhD thesis, University of Edinburgh, 2016. [ bib | .pdf | Abstract ]

Adriana Stan, Cassia Valentini-Botinhao, Bogdan Orza, and Mircea Giurgiu. Blind speech segmentation using spectrogram image-based features and mel cepstral coefficients. In SLT, pages 597-602. IEEE, 2016. [ bib | DOI | .pdf | Abstract ]

A. Ali, P. Bell, J. Glass, Y. Messaoui, H. Mubarak, S. Renals, and Y. Zhang. The MGB-2 Challenge: Arabic multi-dialect broadcast media recognition. In Proc. SLT, 2016. [ bib | .pdf | Abstract ]

Leimin Tian, Johanna Moore, and Catherine Lai. Recognizing emotions in spoken dialogue with hierarchically fused acoustic and lexical features. In Spoken Language Technology Workshop (SLT), 2016 IEEE, pages 565-572. IEEE, 2016. [ bib | .pdf | Abstract ]

2015

Lau Chee Yong, Oliver Watts, and Simon King. Combining lightly-supervised learning and user feedback to construct and improve a statistical parametric speech synthesizer for malay. Research Journal of Applied Sciences, Engineering and Technology, 11(11):1227-1232, December 2015. [ bib | .pdf | Abstract ]

Cassia Valentini-Botinhao, Markus Toman, Michael Pucher, Dietmar Schabus, and Junichi Yamagishi. Intelligibility of time-compressed synthetic speech: Compression method and speaking style. Speech Communication, October 2015. [ bib | DOI | Abstract ]

P. Swietojanski, P. Bell, and S. Renals. Structured output layer with auxiliary targets for context-dependent acoustic modelling. In Proc. Interspeech, Dresden, Germany, September 2015. [ bib | DOI | .pdf | Abstract ]

C. Valentini-Botinhao, Z. Wu, and S. King. Towards minimum perceptual error training for DNN-based speech synthesis. In Proc. Interspeech, Dresden, Germany, September 2015. [ bib | .pdf | Abstract ]

M. Pucher, M. Toman, D. Schabus, C. Valentini-Botinhao, J. Yamagishi, B. Zillinger, and E Schmid. Influence of speaker familiarity on blind and visually impaired children's perception of synthetic voices in audio games. In Proc. Interspeech, Dresden, Germany, September 2015. [ bib | .pdf | Abstract ]

Thomas Merritt, Junichi Yamagishi, Zhizheng Wu, Oliver Watts, and Simon King. Deep neural network context embeddings for model selection in rich-context HMM synthesis. In Proc. Interspeech, Dresden, September 2015. [ bib | .pdf | Abstract ]

Manuel Sam Ribeiro, Junichi Yamagishi, and Robert A. J. Clark. A perceptual investigation of wavelet-based decomposition of f0 for text-to-speech synthesis. In Proc. Interspeech, Dresden, Germany, September 2015. [ bib | .pdf | Abstract ]

Peter Bell and Steve Renals. Complementary tasks for context-dependent deep neural network acoustic models. In Proc. Interspeech, Dresden, Germany, September 2015. [ bib | .pdf | Abstract ]

Peter Bell, Catherine Lai, Clare Llewellyn, Alexandra Birch, and Mark Sinclair. A system for automatic broadcast news summarisation, geolocation and translation. In Proc. Interspeech (demo session), Dresden, Germany, September 2015. [ bib | .pdf | Abstract ]

Alessandra Cervone, Catherine Lai, Silvia Pareti, and Peter Bell. Towards automatic detection of reported speech in dialogue using prosodic cues. In Proc. Interspeech, Dresden, Germany, September 2015. [ bib | .pdf | Abstract ]

Mirjam Wester, Cassia Valentini-Botinhao, and Gustav Eje Henter. Are we using enough listeners? No! An empirically-supported critique of Interspeech 2014 TTS evaluations. In Proc. Interspeech, pages 3476-3480, Dresden, September 2015. [ bib | .pdf | Abstract ]

Mirjam Wester, Matthew Aylett, Marcus Tomalin, and Rasmus Dall. Artificial personality and disfluency. In Proc. Interspeech, Dresden, September 2015. [ bib | .pdf | Abstract ]

Mirjam Wester, Zhizheng Wu, and Junichi Yamagishi. Human vs machine spoofing detection on wideband and narrowband data. In Proc. Interspeech, Dresden, September 2015. [ bib | .pdf | Abstract ]

Qiong Hu, Zhizheng Wu, Korin Richmond, Junichi Yamagishi, Yannis Stylianou, and Ranniery Maia. Fusion of multiple parameterisations for DNN-based sinusoidal speech synthesis with multi-task learning. In Proc. Interspeech, Dresden, Germany, September 2015. [ bib | .pdf | Abstract ]

Siva Reddy Gangireddy, Steve Renals, Yoshihiko Nankaku, and Akinobu Lee. Prosodically-enahanced recurrent neural network language models. In Proc. Interspeech, page 2390—2394, Dresden, Germany, September 2015. [ bib | .pdf | Abstract ]

Oliver Watts, Srikanth Ronanki, Zhizheng Wu, Tuomo Raitio, and Antti Suni. The NST-GlottHMM entry to the Blizzard Challenge 2015. In Proc. Blizzard Challenge Workshop (Interspeech Satellite), Berlin, Germany, September 2015. [ bib | .pdf | Abstract ]

Oliver Watts, Srikanth Ronanki, Zhizheng Wu, Tuomo Raitio, and A. Suni. The nst-glotthmm entry to the blizzard challenge 2015. In Proceedings of Blizzard Challenge 2015, September 2015. [ bib | .pdf | Abstract ]

Oliver Watts, Zhizheng Wu, and Simon King. Sentence-level control vectors for deep neural network speech synthesis. In INTERSPEECH 2015 16th Annual Conference of the International Speech Communication Association, pages 2217-2221. International Speech Communication Association, September 2015. [ bib | .pdf | Abstract ]

Mirjam Wester, M. Luisa Garcia Lecumberri, and Martin Cooke. /u/-fronting in English speakers' L1 but not in their L2. In Proc. ICPhS, Glasgow, August 2015. [ bib | .pdf | Abstract ]

Marcus Tomalin, Mirjam Wester, Rasmus Dall, Bill Byrne, and Simon King. A lattice-based approach to automatic filled pause insertion. In Proc. DiSS 2015, Edinburgh, August 2015. [ bib | .pdf | Abstract ]

Mirjam Wester, Martin Corley, and Rasmus Dall. The temporal delay hypothesis: Natural, vocoded and synthetic speech. In Proc. DiSS 2015, Edinburgh, August 2015. [ bib | .pdf | Abstract ]

Rasmus Dall, Mirjam Wester, and Martin Corley. Disfluencies in change detection in natural, vocoded and synthetic speech. In Proc. DiSS 2015, Edinburgh, August 2015. [ bib | .pdf | Abstract ]

Alexander Hewer, Ingmar Steiner, Timo Bolkart, Stefanie Wuhrer, and Korin Richmond. A statistical shape space model of the palate surface trained on 3D MRI scans of the vocal tract. In The Scottish Consortium for ICPhS 2015, editor, Proceedings of the 18th International Congress of Phonetic Sciences, Glasgow, United Kingdom, August 2015. retrieved from http://www.icphs2015.info/pdfs/Papers/ICPHS0724.pdf. [ bib | .pdf | Abstract ]

Z. Wu, C. Valentini-Botinhao, O. Watts, and S. King. Deep neural networks employing multi-task learning and stacked bottleneck features for speech synthesis. In Proc. ICASSP, pages 4460-4464, Brisbane, Australia, April 2015. [ bib | .pdf | Abstract ]

B. Uria, I. Murray, S. Renals, C. Valentini-Botinhao, and J. Bridle. Modelling acoustic feature dependencies with artificial neural networks: Trajectory-RNADE. In Proc. ICASSP, pages 4465-4469, Brisbane, Australia, April 2015. [ bib | .pdf | Abstract ]

Thomas Merritt, Javier Latorre, and Simon King. Attributing modelling errors in HMM synthesis by stepping gradually from natural to modelled speech. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), pages 4220-4224, Brisbane, April 2015. [ bib | .pdf | Abstract ]

Manuel Sam Ribeiro and Robert A. J. Clark. A multi-level representation of f0 using the continuous wavelet transform and the discrete cosine transform. In IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP, Brisbane, Australia, April 2015. [ bib | .pdf | Abstract ]

P. Bell and S. Renals. Regularization of context-dependent deep neural networks with context-independent multi-task training. In Proc. ICASSP, Brisbane, Australia, April 2015. [ bib | .pdf | Abstract ]

Qiong Hu, Yannis Stylianou, Ranniery Maia, Korin Richmond, and Junichi Yamagishi. Methods for applying dynamic sinusoidal models to statistical parametric speech synthesis. In Proc. ICASSP, Brisbane, Austrilia, April 2015. [ bib | .pdf | Abstract ]

P. Swietojanski and S. Renals. Differentiable pooling for unsupervised speaker adaptation. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2015. [ bib | .pdf | Abstract ]

Ling-Hui Chen, T. Raitio, C. Valentini-Botinhao, Z. Ling, and J. Yamagishi. A deep generative architecture for postfiltering in statistical parametric speech synthesis. Audio, Speech, and Language Processing, IEEE/ACM Transactions on, 23(11):2003-2014, 2015. [ bib | DOI | Abstract ]

H. Kamper, M. Elsner, A. Jansen, and S. J. Goldwater. Unsupervised neural network based feature extraction using weak top-down constraints. In Proc. ICASSP, 2015. [ bib | .pdf | Abstract ]

Herman Kamper, S. J. Goldwater, and Aren Jansen. Fully unsupervised small-vocabulary speech recognition using a segmental Bayesian model. In Proc. Interspeech, 2015. [ bib | .pdf | Abstract ]

Aleksandr Sizov, Elie Khoury, Tomi Kinnunen, Zhizheng Wu, and Sebastien Marcel. Joint speaker verification and antispoofing in the-vector space. IEEE Transactions on Information Forensics and Security, 10(4):821-832, 2015. [ bib | .pdf ]

Zhizheng Wu and Simon King. Minimum trajectory error training for deep neural networks, combined with stacked bottleneck features. In Interspeech, 2015. [ bib | .pdf ]

Zhizheng Wu, Pawel Swietojanski, Christophe Veaux, Steve Renals, and Simon King. A study of speaker adaptation for DNN-based speech synthesis. In Interspeech, 2015. [ bib | .pdf ]

Zhizheng Wu, Tomi Kinnunen, Nicholas Evans, Junichi Yamagishi, Cemal Hanilci, Md Sahidullah, and Aleksandr Sizov. ASVspoof 2015: the first automatic speaker verification spoofing and countermeasures challenge. In Interspeech, 2015. [ bib | .pdf ]

Xiaohai Tian, Zhizheng Wu, Siu-Wa Lee, Quy Hy Nguyen, Minghui Dong, and Eng Siong Chng. System fusion for high-performance voice conversion. In Interspeech, 2015. [ bib | .pdf ]

Zhizheng Wu, Cassia Valentini-Botinhao, Oliver Watts, and Simon King. Deep neural network employing multi-task learning and stacked bottleneck features for speech synthesis. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2015. [ bib | .pdf ]

Zhizheng Wu, Ali Khodabakhsh, Cenk Demiroglu, Junichi Yamagishi, Daisuke Saito, Tomoki Toda, and Simon King. SAS: A speaker verification spoofing database containing diverse attacks. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2015. [ bib | .pdf ]

Xiaohai Tian, Zhizheng Wu, Siu-Wa Lee, Quy Hy Nguyen, Eng Siong Chng, and Minghui Dong. Sparse representation for frequency warping based voice conversion. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2015. [ bib | .pdf ]

Liang Lu, Xingxing Zhang, KyungHyun Cho, and Steve Renals. A study of the recurrent neural network encoder-decoder for large vocabulary speech recognition. In Proc. Interspeech, 2015. [ bib | .pdf | Abstract ]

Liang Lu and Steve Renals. Feature-space speaker adaptation for probabilistic linear discriminant analysis acoustic models. In Proc. Interspeech, 2015. [ bib | .pdf | Abstract ]

Liang Lu and Steve Renals. Multi-frame factorisation for long-span acoustic modelling. In Proc. ICASSP, 2015. [ bib | .pdf | Abstract ]

Leimin Tian, Catherine Lai, and Johanna D. Moore. Recognizing emotions in dialogue with disfluences and non-verbal vocalisations. In Proceedings of the 4th Interdisciplinary Workshop on Laughter and Other Non-verbal Vocalisations in Speech, volume 14, page 15, 2015. [ bib | .pdf | Abstract ]

Leimin Tian, Johanna D. Moore, and Catherine Lai. Emotion Recognition in Spontaneous and Acted Dialogues. In Proceedings of ACII 2015, Xi'an, China, 2015. [ bib | .pdf | Abstract ]

Jaime Lorenzo-Trueba, Roberto Barra-Chicote, Rubén San-Segundo, Javier Ferreiros, Junichi Yamagishi, and Juan M. Montero. Emotion transplantation through adaptation in hmm-based speech synthesis. Computer Speech & Language, 34(1):292 - 307, 2015. [ bib | DOI | http | Abstract ]

Alexander Hewer, Stefanie Wuhrer, Ingmar Steiner, and Korin Richmond. Tongue mesh extraction from 3D MRI data of the human vocal tract. In Michael Breuß, Alfred M. Bruckstein, Petros Maragos, and Stefanie Wuhrer, editors, Perspectives in Shape Analysis, Mathematics and Visualization. Springer, 2015. (in press). [ bib ]

Korin Richmond, Zhen-Hua Ling, and Junichi Yamagishi. The use of articulatory movement data in speech synthesis applications: An overview - application of articulatory movements using machine learning algorithms [invited review]. Acoustical Science and Technology, 36(6):467-477, 2015. [ bib | DOI ]

Korin Richmond, Junichi Yamagishi, and Zhen-Hua Ling. Applications of articulatory movements based on machine learning. Journal of the Acoustical Society of Japan, 70(10):539-545, 2015. [ bib ]

Peter Bell and Steve Renals. A system for automatic alignment of broadcast media captions using weighted finite-state transducers. In Proc. ASRU, 2015. [ bib | .pdf | Abstract ]

Ahmed Ali, Walid Magdy, Peter Bell, and Steve Renals. Multi-reference WER for evaluating ASR for languages with no orthographic rules. In Proc. ASRU, 2015. [ bib | .pdf | Abstract ]

Peter Bell, Mark Gales, Thomas Hain, Jonathan Kilgour, Pierre Lanchantin, Xunying Liu, Andrew McParland, Steve Renals, Oscar Saz, Mirjam Wester, and Phil Woodland. The MGB challenge: Evaluating multi-genre broadcast media recognition. In Proc. ASRU, 2015. [ bib | .pdf | Abstract ]

Victor Poblete, Felipe Espic, Simon King, Richard M. Stern, Fernando Huenupan, Josue Fredes, and Nestor Becerra Yoma. A perceptually-motivated low-complexity instantaneous linear channel normalization technique applied to speaker verification. Computer Speech & Language, 31(1):1 - 27, 2015. [ bib | DOI | http | .pdf | Abstract ]

Rosie Kay, Oliver Watts, Roberto Barra-Chicote, and Cassie Mayo. Knowledge versus data in tts: evaluation of a continuum of synthesis systems. In INTERSPEECH 2015, 16th Annual Conference of the International Speech Communication Association, Dresden, Germany, September 6-10, 2015, pages 3335-3339, 2015. [ bib | .pdf | Abstract ]

2014

P. Swietojanski and S. Renals. Learning hidden unit contributions for unsupervised speaker adaptation of neural network acoustic models. In Proc. IEEE Workshop on Spoken Language Technology, Lake Tahoe, USA, December 2014. [ bib | .pdf | Abstract ]

Peter Bell, Pawel Swietojanski, Joris Driesen, Mark Sinclair, Fergus McInnes, and Steve Renals. The UEDIN ASR systems for the IWSLT 2014 evaluation. In Proc. IWSLT, South Lake Tahoe, USA, December 2014. [ bib | .pdf | Abstract ]

Cassia Valentini-Botinhao, Junichi Yamagishi, and Simon King. Intelligibility enhancement of speech in noise. In Proceedings of the Institute of Acoustics, volume 36 Pt. 2, pages 96-103, Birmingham, UK, October 2014. [ bib | .pdf | Abstract ]

P. Swietojanski, A. Ghoshal, and S. Renals. Convolutional neural networks for distant speech recognition. Signal Processing Letters, IEEE, 21(9):1120-1124, September 2014. [ bib | DOI | .pdf | Abstract ]

C. Valentini-Botinhao and M. Wester. Using linguistic predictability and the Lombard effect to increase the intelligibility of synthetic speech in noise. In Proc. Interspeech, pages 2063-2067, Singapore, September 2014. [ bib | .pdf | Abstract ]

Antti Suni, Tuomo Raitio, Dhananjaya Gowda, Reima Karhila, Matt Gibson, and Oliver Watts. The Simple4All entry to the Blizzard Challenge 2014. In Proc. Blizzard Challenge 2014, September 2014. [ bib | .pdf | Abstract ]

Thomas Merritt, Tuomo Raitio, and Simon King. Investigating source and filter contributions, and their interaction, to statistical parametric speech synthesis. In Proc. Interspeech, pages 1509-1513, Singapore, September 2014. [ bib | .pdf | Abstract ]

Qiong Hu, Yannis Stylianou, Ranniery Maia, Korin Richmond, Junichi Yamagishi, and Javier Latorre. An investigation of the application of dynamic sinusoidal models to statistical parametric speech synthesis. In Proc. Interspeech, pages 780-784, Singapore, September 2014. [ bib | .pdf | Abstract ]

L.-H. Chen, T. Raitio, C. Valentini-Botinhao, J. Yamagishi, and Z.-H. Ling. DNN-Based Stochastic Postfilter for HMM-Based Speech Synthesis. In Proc. Interspeech, pages 1954-1958, Singapore, September 2014. [ bib | .pdf | Abstract ]

C. Valentini-Botinhao, M. Toman, M. Pucher, D. Schabus, and J. Yamagishi. Intelligibility Analysis of Fast Synthesized Speech. In Proc. Interspeech, pages 2922-2926, Singapore, September 2014. [ bib | .pdf | Abstract ]

Siva Reddy Gangireddy, Fergus McInnes, and Steve Renals. Feed forward pre-training for recurrent neural network language models. In Proc. Interspeech, pages 2620-2624, September 2014. [ bib | .pdf | Abstract ]

Mark Sinclair, Peter Bell, Alexandra Birch, and Fergus McInnes. A semi-markov model for speech segmentation with an utterance-break prior. In Proc. Interspeech, September 2014. [ bib | .pdf | Abstract ]

Gustav Eje Henter, Thomas Merritt, Matt Shannon, Catherine Mayo, and Simon King. Measuring the perceptual effects of modelling assumptions in speech synthesis using stimuli constructed from repeated natural speech. In Proc. Interspeech, volume 15, pages 1504-1508, September 2014. [ bib | .pdf | Abstract ]

Matthew Aylett, Rasmus Dall, Arnab Ghoshal, Gustav Eje Henter, and Thomas Merritt. A flexible front-end for HTS. In Proc. Interspeech, pages 1283-1287, September 2014. [ bib | .pdf | Abstract ]

Wei Zhang, Robert A. J. Clark, and Yongyuan Wang. Unsupervised language filtering using the latent Dirichlet allocation. In Proc. Interspeech, pages 1268-1272, September 2014. [ bib | .pdf | Abstract ]

Susana Palmaz López-Peláez and Robert A. J. Clark. Speech synthesis reactive to dynamic noise environmental conditions. In Proc. Interspeech, pages 2927-2931, September 2014. [ bib | .pdf | Abstract ]

Philip N Garner, Rob Clark, Jean-Philippe Goldman, Pierre-Edouard Honnet, Maria Ivanova, Alexandros Lazaridis, Hui Liang, Beat Pfister, Manuel Sam Ribeiro, Eric Wehrli, et al. Translation and prosody in swiss languages. In Nouveaux cahiers de linguistique francaise, 31. 3rd Swiss Workshop on Prosody, Geneva, Switzerland, September 2014. [ bib | .pdf | Abstract ]

Aciel Eshky. Generative Probabilistic Models of Goal-Directed Users in Task Oriented Dialogs. PhD thesis, School of Informatics, University of Edinburgh, The University of Edinburgh, 10 Crichton Street, Edinburgh UK, EH8 9AB, July 2014. [ bib | .pdf | Abstract ]

Nicholas W D Evans, Tomi Kinnunen, Junichi Yamagishi, Zhizheng Wu, Federico Alegre, and Phillip De Leon. Speaker recognition anti-spoofing. Book Chapter in "Handbook of Biometric Anti-spoofing", Springer, S. Marcel, S. Li and M. Nixon, Eds., 2014, June 2014. [ bib | DOI | .pdf | Abstract ]

Atef Ben Youssef, Hiroshi Shimodaira, and David Braude. Speech driven talking head from estimated articulatory features. In Proc. ICASSP, pages 4606-4610, Florence, Italy, May 2014. [ bib | .pdf | Abstract ]

Mirjam Wester and Cassie Mayo. Accent rating by native and non-native listeners. In Proceedings of ICASSP, pages 7749-7753, Florence, Italy, May 2014. [ bib | .pdf | Abstract ]

Tiberiu Boroș, Adriana Stan, Oliver Watts, and Stefan Daniel Dumitrescu. RSS-TOBI - a prosodically enhanced Romanian speech corpus. In Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14), Reykjavik, Iceland, May 2014. [ bib | .pdf | Abstract ]

Oliver Watts, Siva Gangireddy, Junichi Yamagishi, Simon King, Steve Renals, Adriana Stan, and Mircea Giurgiu. Neural net word representations for phrase-break prediction without a part of speech tagger. In Proc. ICASSP, pages 2618-2622, Florence, Italy, May 2014. [ bib | .pdf | Abstract ]

Rasmus Dall, Junichi Yamagishi, and Simon King. Rating naturalness in speech synthesis: The effect of style and expectation. In Proc. Speech Prosody, May 2014. [ bib | .pdf | Abstract ]

Qiong Hu, Yannis Stylianou, Korin Richmond, Ranniery Maia, Junichi Yamagishi, and Javier Latorre. A fixed dimension and perceptually based dynamic sinusoidal model of speech. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), pages 6311-6315, Florence, Italy, May 2014. [ bib | .pdf | Abstract ]

L. Saheer, J. Yamagishi, P.N. Garner, and J. Dines. Combining vocal tract length normalization with hierarchical linear transformations. Selected Topics in Signal Processing, IEEE Journal of, 8(2):262-272, April 2014. [ bib | DOI ]

J.P. Cabral, K. Richmond, J. Yamagishi, and S. Renals. Glottal spectral separation for speech synthesis. Selected Topics in Signal Processing, IEEE Journal of, 8(2):195-208, April 2014. [ bib | DOI | .pdf | Abstract ]

Maria K. Wolters. The minimal effective dose of reminder technology. In Proceedings of the extended abstracts of the 32nd annual ACM conference on Human factors in computing systems - CHI EA '14, pages 771-780, New York, New York, USA, April 2014. ACM Press. [ bib | DOI | http | Abstract ]

Maria K. Wolters, Elaine Niven, and Robert H. Logie. The art of deleting snapshots. In Proceedings of the extended abstracts of the 32nd annual ACM conference on Human factors in computing systems - CHI EA '14, pages 2521-2526, New York, New York, USA, April 2014. ACM Press. [ bib | DOI | http | Abstract ]

C. Valentini-Botinhao, J. Yamagishi, S. King, and R. Maia. Intelligibility enhancement of HMM-generated speech in additive noise by modifying mel cepstral coefficients to increase the glimpse proportion. Computer Speech and Language, 28(2):665-686, 2014. [ bib | DOI | .pdf | Abstract ]

Moses Ekpenyong, Eno-Abasi Urua, Oliver Watts, Simon King, and Junichi Yamagishi. Statistical parametric speech synthesis for Ibibio. Speech Communication, 56:243-251, January 2014. [ bib | DOI | http | .pdf | Abstract ]

Liang Lu, Arnab Ghoshal, and Steve Renals. Cross-lingual subspace Gaussian mixture model for low-resource speech recognition. IEEE Transactions on Audio, Speech and Language Processing, 22(1):17-27, 2014. [ bib | DOI | .pdf | Abstract ]

Johanna D. Moore, Leimin Tian, and Catherine Lai. Word-level emotion recognition using high-level features. In Alexander Gelbukh, editor, Computational Linguistics and Intelligent Text Processing, volume 8404 of Lecture Notes in Computer Science, pages 17-31. Springer Berlin Heidelberg, 2014. [ bib | DOI | .pdf | Abstract ]

Catherine Lai. Interpreting final rises: Task and role factors. In Proceedings of Speech Prosody 7, Dublin, Ireland, 2014. [ bib | .pdf | Abstract ]

P. Lanchantin, M. J. F. Gales, S. King, and J. Yamagishi. Multiple-average-voice-based speech synthesis. In Proc. ICASSP, 2014. [ bib | Abstract ]

David Abelman and Robert Clark. Altering speech synthesis prosody through real time natural gestural control. In Proc. Speech Prosody 2014, Dublin Ireland, 2014. [ bib | .pdf | Abstract ]

P. Swietojanski, J. Li, and J-T Huang. Investigation of maxout networks for speech recognition. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2014. [ bib | .pdf | Abstract ]

S. Renals and P. Swietojanski. Neural networks for distant speech recognition. In The 4th Joint Workshop on Hands-free Speech Communication and Microphone Arrays (HSCMA), 2014. [ bib | .pdf | Abstract ]

R. Makowski, P. Swietojanski, and R. Wielgat. Automatyczne rozpoznawanie mowy. In T. Zielinski, P. Korohoda, and R. Rumian, editors, Cyfrowe Przetwarzanie Sygnalow w Telekomunikacji. Podstawy, multimedia, transmisja. Wydawnictwo Naukowe PWN - Polish Scientific Publishers PWN, Warszawa, 2014. [ bib | http | Abstract ]

Liang Lu and Steve Renals. Probabilistic linear discriminant analysis for acoustic modelling. IEEE Signal Processing Letters, 21(6):702-706, 2014. [ bib | DOI | .pdf | Abstract ]

Rasmus Dall, Mirjam Wester, and Martin Corley. The effect of filled pauses and speaking rate on speech comprehension in natural, vocoded and synthetic speech. In Proc. Interspeech, 2014. [ bib | .pdf | Abstract ]

Mirjam Wester, M. Luisa Garcia Lecumberri, and Martin Cooke. DIAPIX-FL: A symmetric corpus of problem-solving dialogues in first and second languages. In Proc. Interspeech, 2014. [ bib | .pdf | Abstract ]

Rasmus Dall, Marcus Tomalin, Mirjam Wester, William Byrne, and Simon King. Investigating automatic & human filled pause insertion for speech synthesis. In Proc. Interspeech, 2014. [ bib | .pdf | Abstract ]

Catherine Lai and Steve Renals. Incorporating lexical and prosodic information at different levels for meeting summarization. In Proc. Interspeech 2014, 2014. [ bib | .pdf | Abstract ]

Liang Lu and Steve Renals. Probabilistic linear discriminant analysis with bottleneck features for speech recognition. In Proc. Interspeech, 2014. [ bib | .pdf | Abstract ]

P. Bell, J. Driesen, and S. Renals. Cross-lingual adaptation with multi-task adaptive networks. In Proc. Interspeech, 2014. [ bib | .pdf | Abstract ]

A. Cervone, S. Pareti, P. Bell, I. Prodanof, and T. Caselli. Detecting attribution relations in speech: a corpus study. In Proc. Italian Conference on Computational Linguistics, Pisa, Italy, 2014. [ bib | .pdf | Abstract ]

Nicolas d’Alessandro, Joëlle Tilmanne, Maria Astrinaki, Thomas Hueber, Rasmus Dall, Thierry Ravet, Alexis Moinet, Huseyin Cakmak, Onur Babacan, Adela Barbulescu, Valentin Parfait, Victor Huguenin, EmineSümeyye Kalaycı, and Qiong Hu. Reactive statistical mapping: Towards the sketching of performative control with data. In Yves Rybarczyk, Tiago Cardoso, João Rosas, and Luis M. Camarinha-Matos, editors, Innovative and Creative Developments in Multimodal Interaction Systems, volume 425 of IFIP Advances in Information and Communication Technology, pages 20-49. Springer Berlin Heidelberg, 2014. [ bib | .pdf | Abstract ]

Herman Kamper, Aren Jansen, Simon King, and S. J. Goldwater. Unsupervised lexical clustering of speech segments using fixed-dimensional acoustic embeddings. In Proc. SLT, 2014. [ bib | .pdf | Abstract ]

Maria Luisa Garcia Lecumberri, Roberto Barra-Chicote, Rubén Pérez Ramón, Junichi Yamagishi, and Martin Cooke. Generating segmental foreign accent. In Fifteenth Annual Conference of the International Speech Communication Association, 2014. [ bib | .pdf | Abstract ]

Aciel Eshky, Ben Allison, Subramanian Ramamoorthy, and Mark Steedman. A generative model for user simulation in a spatial navigation domain. In EACL, pages 626-635, 2014. [ bib | .pdf | Abstract ]

2013

P. Swietojanski, A. Ghoshal, and S. Renals. Hybrid acoustic models for distant and multichannel large vocabulary speech recognition. In Proc. IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU), December 2013. [ bib | DOI | .pdf | Abstract ]

Joris Driesen and Steve Renals. Lightly supervised automatic subtitling of weather forecasts. In Proc. Automatic Speech Recognition and Understanding Workshop, Olomouc, Czech Republic, December 2013. [ bib | DOI | .pdf | Abstract ]

Joris Driesen, Peter Bell, Mark Sinclair, and Steve Renals. Description of the UEDIN system for German ASR. In Proc IWSLT, Heidelberg, Germany, December 2013. [ bib | .pdf | Abstract ]

C. Bhatt, A. Popescu-Belis, M. Habibi, S. Ingram, S. Masneri, F. McInnes, N. Pappas, and O. Schreer. Multi-factor segmentation for topic visualization and recommendation: the MUST-VIS system. In Proceedings of ACM Multimedia 2013, Barcelona, Spain, October 2013. [ bib | .pdf | Abstract ]

Atef Ben Youssef, Hiroshi Shimodaira, and David A. Braude. Head motion analysis and synthesis over different tasks. In Proc. Intelligent Virtual Agents, pages 285-294. Springer, September 2013. [ bib | .pdf | Abstract ]

C. Valentini-Botinhao, J. Yamagishi, S. King, and Y. Stylianou. Combining perceptually-motivated spectral shaping with loudness and duration modification for intelligibility enhancement of HMM-based synthetic speech in noise. In Proc. Interspeech, Lyon, France, August 2013. [ bib | .pdf ]

M. Cooke, C. Mayo, and C. Valentini-Botinhao. Intelligibility-enhancing speech modifications: the Hurricane Challenge. In Proc. Interspeech, Lyon, France, August 2013. [ bib | .pdf ]

Cassia Valentini-Botinhao, Mirjam Wester, Junichi Yamagishi, and Simon King. Using neighbourhood density and selective SNR boosting to increase the intelligibility of synthetic speech in noise. In 8th ISCA Workshop on Speech Synthesis, pages 133-138, Barcelona, Spain, August 2013. [ bib | .pdf | Abstract ]

Thomas Merritt and Simon King. Investigating the shortcomings of HMM synthesis. In 8th ISCA Workshop on Speech Synthesis, pages 185-190, Barcelona, Spain, August 2013. [ bib | .pdf | Abstract ]

Nicholas W D Evans, Tomi Kinnunen, and Junichi Yamagishi. Spoofing and countermeasures for automatic speaker verification. In Interspeech 2013, 14th Annual Conference of the International Speech Communication Association, August 25-29, 2013, Lyon, France, Lyon, FRANCE, August 2013. [ bib | .pdf ]

Maria Astrinaki, Alexis Moinet, Junichi Yamagishi, Korin Richmond, Zhen-Hua Ling, Simon King, and Thierry Dutoit. Mage - reactive articulatory feature control of HMM-based parametric speech synthesis. In 8th ISCA Workshop on Speech Synthesis, pages 227-231, Barcelona, Spain, August 2013. [ bib | .pdf ]

Qiong Hu, Korin Richmond, Junichi Yamagishi, and Javier Latorre. An experimental comparison of multiple vocoder types. In 8th ISCA Workshop on Speech Synthesis, pages 155-160, Barcelona, Spain, August 2013. [ bib | .pdf | Abstract ]

Peter Bell, Hitoshi Yamamoto, Pawel Swietojanski, Youzheng Wu, Fergus McInnes, Chiori Hori, and Steve Renals. A lecture transcription system combining neural network acoustic and language models. In Proc. Interspeech, Lyon, France, August 2013. [ bib | .pdf | Abstract ]

Adriana Stan, Peter Bell, Junichi Yamagishi, and Simon King. Lightly supervised discriminative training of grapheme models for improved sentence-level alignment of speech and text data. In Proc. Interspeech, Lyon, France, August 2013. [ bib | .pdf | Abstract ]

H. Christensen, M. Aniol, P. Bell, P. Green, T. Hain, S. King, and P. Swietojanski. Combining in-domain and out-of-domain speech data for automatic recognition of disordered speech. In Proc. Interspeech, Lyon, France, August 2013. [ bib | .pdf | Abstract ]

Kayoko Yanagisawa, Javier Latorre, Vincent Wan, Mark J. F. Gales, and Simon King. Noise robustness in HMM-TTS speaker adaptation. In 8th ISCA Workshop on Speech Synthesis, pages 139-144, Barcelona, Spain, August 2013. [ bib | .pdf | Abstract ]

Rubén San-Segundo, Juan Manuel Montero, Mircea Giurgiu, Ioana Muresan, and Simon King. Multilingual number transcription for text-to-speech conversion. In 8th ISCA Workshop on Speech Synthesis, pages 85-89, Barcelona, Spain, August 2013. [ bib | .pdf | Abstract ]

Heng Lu, Simon King, and Oliver Watts. Combining a vector space representation of linguistic context with a deep neural network for text-to-speech synthesis. In 8th ISCA Workshop on Speech Synthesis, pages 281-285, Barcelona, Spain, August 2013. [ bib | .pdf | Abstract ]

H. Bourlard, M. Ferras, N. Pappas, A. Popescu-Belis, S. Renals, F. McInnes, P. Bell, S. Ingram, and M. Guillemot. Processing and linking audio events in large multimedia archives: The EU inEvent project. In Proceedings of SLAM 2013 (First Workshop on Speech, Language and Audio in Multimedia), Marseille, France, August 2013. [ bib | .pdf | Abstract ]

Yoshitaka Mamiya, Adriana Stan, Junichi Yamagishi, Peter Bell, Oliver Watts, Robert Clark, and Simon King. Using adaptation to improve speech transcription alignment in noisy and reverberant environments. In 8th ISCA Workshop on Speech Synthesis, pages 61-66, Barcelona, Spain, August 2013. [ bib | .pdf | Abstract ]

Oliver Watts, Adriana Stan, Rob Clark, Yoshitaka Mamiya, Mircea Giurgiu, Junichi Yamagishi, and Simon King. Unsupervised and lightly-supervised learning for rapid construction of TTS systems in multiple languages from 'found' data: evaluation and analysis. In 8th ISCA Workshop on Speech Synthesis, pages 121-126, Barcelona, Spain, August 2013. [ bib | .pdf | Abstract ]

Jaime Lorenzo-Trueba, Roberto Barra-Chicote, Junichi Yamagishi, Oliver Watts, and Juan M. Montero. Towards speaking style transplantation in speech synthesis. In 8th ISCA Workshop on Speech Synthesis, pages 179-183, Barcelona, Spain, August 2013. [ bib | .pdf | Abstract ]

Adriana Stan, Oliver Watts, Yoshitaka Mamiya, Mircea Giurgiu, Rob Clark, Junichi Yamagishi, and Simon King. TUNDRA: A Multilingual Corpus of Found Data for TTS Research Created with Light Supervision. In Proc. Interspeech, Lyon, France, August 2013. [ bib | .pdf | Abstract ]

Oliver Watts, Adriana Stan, Yoshitaka Mamiya, Antti Suni, José Martín Burgos, and Juan Manuel Montero. The Simple4All entry to the Blizzard Challenge 2013. In Proc. Blizzard Challenge 2013, August 2013. [ bib | .pdf | Abstract ]

Atef Ben Youssef, Hiroshi Shimodaira, and David A. Braude. Articulatory features for speech-driven head motion synthesis. In Proc. Interspeech, pages 2758-2762, Lyon, France, August 2013. [ bib | .pdf | Abstract ]

David A. Braude, Hiroshi Shimodaira, and Atef Ben Youssef. Template-warping based speech driven head motion synthesis. In Proc. Interspeech, pages 2763-2767, Lyon, France, August 2013. [ bib | .pdf | Abstract ]

Karel Vesely, Arnab Ghoshal, Lukáš Burget, and Daniel Povey. Sequence-discriminative training of deep neural networks. In Proceedings of the Annual Conference of the International Speech Communication Association (Interspeech), Lyon, France, August 2013. [ bib | .pdf | Abstract ]

Korin Richmond, Zhenhua Ling, Junichi Yamagishi, and Benigno Uría. On the evaluation of inversion mapping performance in the acoustic domain. In Proc. Interspeech, pages 1012-1016, Lyon, France, August 2013. [ bib | .pdf | Abstract ]

James Scobbie, Alice Turk, Christian Geng, Simon King, Robin Lickley, and Korin Richmond. The Edinburgh speech production facility DoubleTalk corpus. In Proc. Interspeech, Lyon, France, August 2013. [ bib | .pdf | Abstract ]

Maria Astrinaki, Alexis Moinet, Junichi Yamagishi, Korin Richmond, Zhen-Hua Ling, Simon King, and Thierry Dutoit. Mage - HMM-based speech synthesis reactively controlled by the articulators. In 8th ISCA Workshop on Speech Synthesis, page 243, Barcelona, Spain, August 2013. [ bib | .pdf | Abstract ]

Chee-Ming Ting, Simon King, Sh-Hussain Salleh, and A. K. Ariff. Discriminative tandem features for HMM-based EEG classification. In Proc. 35th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC 13), Osaka, Japan, July 2013. [ bib | .pdf | Abstract ]

Keiichi Tokuda, Yoshihiko Nankaku, Tomoki Today, Heiga Zen, Junichi Yamagishi, and Keiichiro Oura. Speech synthesis based on hidden markov models. Proceedings of the IEEE, 101(6), June 2013. (in press). [ bib | Abstract ]

C. Valentini-Botinhao, E. Godoy, Y. Stylianou, B. Sauert, S. King, and J. Yamagishi. Improving intelligibility in noise of HMM-generated speech via noise-dependent and -independent methods. In Proc. ICASSP, Vancouver, Canada, May 2013. [ bib | .pdf ]

H. Lu and S. King. Factorized context modelling for text-to-speech synthesis. In Proc. ICASSP, Vancouver, Canada, May 2013. [ bib | .pdf | Abstract ]

Mark Sinclair and Simon King. Where are the challenges in speaker diarization? In Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on, Vancouver, British Columbia, USA, May 2013. [ bib | .pdf | Abstract ]

Ramya Rasipuram, Peter Bell, and Mathew Magimai.-Doss. Grapheme and multilingual posterior features for under-resourced speech recognition: a study on Scottish Gaelic. In Proc. ICASSP, Vancouver, Canada, May 2013. [ bib | .pdf | Abstract ]

Peter Bell, Pawel Swietojanski, and Steve Renals. Multi-level adaptive networks in tandem and hybrid ASR systems. In Proc. ICASSP, Vancouver, Canada, May 2013. [ bib | DOI | .pdf | Abstract ]

Mark Hartswood, Maria Wolters, Jenny Ure, Stuart Anderson, and Marina Jirotka. Socio-material design for computer mediated social sensemaking. In Proc. CHI Workshop on Explorations in Social Interaction Design, April 2013. [ bib | .pdf | Abstract ]

John Dines, Hui Liang, Lakshmi Saheer, Matthew Gibson, William Byrne, Keiichiro Oura, Keiichi Tokuda, Junichi Yamagishi, Simon King, Mirjam Wester, Teemu Hirsimäki, Reima Karhila, and Mikko Kurimo. Personalising speech-to-speech translation: Unsupervised cross-lingual speaker adaptation for HMM-based speech synthesis. Computer Speech and Language, 27(2):420-437, February 2013. [ bib | DOI | http | Abstract ]

Cassia Valentini-Botinhao. Intelligibility enhancement of synthetic speech in noise. PhD thesis, University of Edinburgh, 2013. [ bib | .pdf | Abstract ]

Y. Tang, M. Cooke, and C. Valentini-Botinhao. A distortion-weighted glimpse-based intelligibility metric for modified and synthetic speech. In Proc. SPIN, 2013. [ bib | .pdf ]

M. Cooke, C. Mayo, C. Valentini-Botinhao, Y. Stylianou, B. Sauert, and Y. Tang. Evaluating the intelligibility benefit of speech modifications in known noise conditions. Speech Communication, 55:572-585, 2013. [ bib | .pdf | Abstract ]

Liang Lu, KK Chin, Arnab Ghoshal, and Steve Renals. Joint uncertainty decoding for noise robust subspace Gaussian mixture models. IEEE Transactions on Audio, Speech and Language Processing, 21(9):1791-1804, 2013. [ bib | DOI | .pdf | Abstract ]

Pawel Swietojanski, Arnab Ghoshal, and Steve Renals. Revisiting hybrid and GMM-HMM system combination techniques. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2013. [ bib | DOI | .pdf | Abstract ]

Arnab Ghoshal, Pawel Swietojanski, and Steve Renals. Multilingual training of deep neural networks. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2013. [ bib | DOI | .pdf | Abstract ]

Z. Ling, K. Richmond, and J. Yamagishi. Articulatory control of HMM-based parametric speech synthesis using feature-space-switched multiple regression. Audio, Speech, and Language Processing, IEEE Transactions on, 21(1):207-219, 2013. [ bib | DOI | .pdf | Abstract ]

Àngel Calzada Defez, Joan Claudi Socoró Carrié, and Robert Clark. Parametric model for vocal effort interpolation with harmonics plus noise models. In Proc. 8th ISCA Speech Synthesis Workshop, pages 25-30, 2013. [ bib | .pdf | Abstract ]

Sarah Creer, Stuart Cunningham, Phil Green, and Junichi Yamagishi. Building personalised synthetic voices for individuals with severe speech impairment. Computer Speech & Language, 27(6):1178 - 1193, 2013. <ce:title>Special Issue on Speech and Language Processing for Assistive Technology</ce:title>. [ bib | DOI | http ]

Catherine Lai, Jean Carletta, and Steve Renals. Detecting summarization hot spots in meetings using group level involvement and turn-taking features. In Proc. Interspeech 2013, Lyon, France, 2013. [ bib | .pdf | Abstract ]

Catherine Lai, Jean Carletta, and Steve Renals. Modelling participant affect in meetings with turn-taking features. In Proceedings of WASSS 2013, Grenoble, France, 2013. [ bib | .pdf | Abstract ]

Catherine Lai, Keelan Evanini, and Klaus Zechner. Applying rhythm metrics to non-native spontaneous speech. In Proceedings of SLaTE 2013, Grenoble, France, 2013. [ bib | .pdf | Abstract ]

P. Lanchantin, P. Bell, M. Gales, T. Hain, X. Liu, Y. Long, J. Quinnell, S. Renals, O. Saz, M. Seigel, P. Swietojanski, and P. Woodland. Automatic transcription of multi-genre media archives. In Proc. Workshop on Speech, Language and Audio in Multimedia, Marseille, France, 2013. [ bib | .pdf | Abstract ]

David Adam Braude, Hiroshi Shimodaira, and Atef Ben Youssef. Template-warping based speech driven head motion synthesis. In Interspeech, pages 2763 - 2767, 2013. [ bib | .pdf | Abstract ]

Javier Tejedor, Doroteo T. Toledano, Dong Wang, Simon King, and Jose Colas. Feature analysis for discriminative confidence estimation in spoken term detection. Computer Speech and Language, To appear, 2013. [ bib | .pdf | Abstract ]

P. Lal and S. King. Cross-lingual automatic speech recognition using tandem features. IEEE Transactions on Audio, Speech, and Language Processing, To appear, 2013. [ bib | DOI | .pdf | Abstract ]

Yoshitaka Mamiya, Junichi Yamagishi, Oliver Watts, Robert A.J. Clark, Simon King, and Adriana Stan. Lightly supervised gmm vad to use audiobook for speech synthesiser. In Proc. ICASSP, 2013. [ bib | .pdf | Abstract ]

Liang Lu, Arnab Ghoshal, and Steve Renals. Noise adaptive training for subspace Gaussian mixture models. In Proc. Interspeech, 2013. [ bib | .pdf | Abstract ]

Liang Lu, Arnab Ghoshal, and Steve Renals. Acoustic data-driven pronunciation lexicon for large vocabulary speech recognition. In Proc. ASRU, 2013. [ bib | DOI | .pdf | Abstract ]

Liang Lu. Subspace Gaussian Mixture Models for Automatic Speech Recognition. PhD thesis, University of Edinburgh, 2013. [ bib | .pdf | Abstract ]

David A. Braude, Hiroshi Shimodaira, and Atef Ben Youssef. The University of Edinburgh head-motion and audio storytelling (UoE-HaS) dataset. In Proc. Intelligent Virtual Agents, pages 466-467. Springer, 2013. [ bib | .pdf | Abstract ]

Elizabeth Godoy, Catherine Mayo, and Yannis Stylianou. Linking loudness increases in normal and Lombard speech to decreasing vowel formant separation. In Proc. Interspeech, 2013. [ bib | .PDF | Abstract ]

Catherine Mayo, Fiona Gibbon, and Robert A. J. Clark. Phonetically trained and untrained adults' transcription of place of articulation for intervocalic lingual stops with intermediate acoustic cues. Journal of Speech, Language and Hearing Research, 56:779-791, 2013. [ bib | DOI | Abstract ]

Christian Geng, Alice Turk, James M. Scobbie, Cedric Macmartin, Philip Hoole, Korin Richmond, Alan Wrench, Marianne Pouplier, Ellen Gurman Bard, Ziggy Campbell, Catherine Dickie, Eddie Dubourg, William Hardcastle, Evia Kainada, Simon King, Robin Lickley, Satsuki Nakai, Steve Renals, Kevin White, and Ronny Wiegand. Recording speech articulation in dialogue: Evaluating a synchronized double electromagnetic articulography setup. Journal of Phonetics, 41(6):421 - 431, 2013. [ bib | DOI | http | .pdf | Abstract ]

Ingmar Steiner, Korin Richmond, and Slim Ouni. Speech animation using electromagnetic articulography as motion capture data. In Proc. 12th International Conference on Auditory-Visual Speech Processing, pages 55-60, Annecy, France, 2013. [ bib | .pdf | Abstract ]

Peter Bell, Fergus McInnes, Siva Reddy Gangireddy, Mark Sinclair, Alexandra Birch, and Steve Renals. The UEDIN english ASR system for the IWSLT 2013 evaluation. In Proc. International Workshop on Spoken Language Translation, 2013. [ bib | .pdf | Abstract ]

E. Zwyssig, F. Faubel, S. Renals, and M. Lincoln. Recognition of overlapping speech using digital MEMS microphone arrays. In Proc IEEE ICASSP, 2013. [ bib | DOI | .pdf | Abstract ]

2012

P. Swietojanski, A. Ghoshal, and S. Renals. Unsupervised cross-lingual knowledge transfer in DNN-based LVCSR. In Proc. IEEE Workshop on Spoken Language Technology, pages 246-251, Miami, Florida, USA, December 2012. [ bib | DOI | .pdf | Abstract ]

P. Bell, M. Gales, P. Lanchantin, X. Liu, Y. Long, S. Renals, P. Swietojanski, and P. Woodland. Transcription of multi-genre media archives using out-of-domain data. In Proc. IEEE Workshop on Spoken Language Technology, pages 324-329, Miami, Florida, USA, December 2012. [ bib | DOI | .pdf | Abstract ]

Adriana Stan, Peter Bell, and Simon King. A grapheme-based method for automatic alignment of speech and text data. In Proc. IEEE Workshop on Spoken Language Technology, Miami, Florida, USA, December 2012. [ bib | .pdf | Abstract ]

P. L. De Leon, M. Pucher, J. Yamagishi, I. Hernaez, and I. Saratxaga. Evaluation of speaker verification security and detection of HMM-based synthetic speech. Audio, Speech, and Language Processing, IEEE Transactions on, 20(8):2280-2290, October 2012. [ bib | DOI | Abstract ]

Korin Richmond and Steve Renals. Ultrax: An animated midsagittal vocal tract display for speech therapy. In Proc. Interspeech, Portland, Oregon, USA, September 2012. [ bib | .pdf | Abstract ]

Zhen-Hua Ling, Korin Richmond, and Junichi Yamagishi. Vowel creation by articulatory control in HMM-based parametric speech synthesis. In Proc. Interspeech, Portland, Oregon, USA, September 2012. [ bib | .pdf | Abstract ]

Heng Lu and Simon King. Using Bayesian networks to find relevant context features for HMM-based speech synthesis. In Proc. Interspeech, Portland, Oregon, USA, September 2012. [ bib | .pdf | Abstract ]

Phillip L. De Leon, Bryan Stewart, and Junichi Yamagishi. Synthetic speech discrimination using pitch pattern statistics derived from image analysis. In Proc. Interspeech, Portland, Oregon, USA, September 2012. [ bib | Abstract ]

J. Lorenzo, B. Martinez, R. Barra-Chicote, V. Lopez–Ludena, J. Ferreiros, J. Yamagishi, and J.M. Montero. Towards an unsupervised speaking style voice building framework: Multi–style speaker diarization. In Proc. Interspeech, Portland, Oregon, USA, September 2012. [ bib | Abstract ]

Rasmus Dall, Christophe Veaux, Junichi Yamagishi, and Simon King. Analysis of speaker clustering techniques for HMM-based speech synthesis. In Proc. Interspeech, September 2012. [ bib | .pdf | Abstract ]

Jaime Lorenzo-Trueba, Roberto Barra-Chicote, Tuomo Raitio, Nicolas Obin, Paavo Alku, Junichi Yamagishi, and Juan M Montero. Towards glottal source controllability in expressive speech synthesis. In Proc. Interspeech, Portland, Oregon, USA, September 2012. [ bib | Abstract ]

Peter Bell, Myroslava Dzikovska, and Amy Isard. Designing a spoken language interface for a tutorial dialogue system. In Proc. Interspeech, Portland, Oregon, USA, September 2012. [ bib | .pdf | Abstract ]

C. Valentini-Botinhao, J. Yamagishi, and S. King. Evaluating speech intelligibility enhancement for HMM-based synthetic speech in noise. In Proc. Sapa Workshop, Portland, USA, September 2012. [ bib | .pdf | Abstract ]

C. Valentini-Botinhao, S. Degenkolb-Weyers, A. Maier, E. Noeth, U. Eysholdt, and T. Bocklet. Automatic detection of sigmatism in children. In Proc. WOCCI, Portland, USA, September 2012. [ bib | .pdf | Abstract ]

Ruben San-Segundo, Juan M. Montero, Veronica Lopez-Luden, and Simon King. Detecting acronyms from capital letter sequences in spanish. In Proc. Interspeech, Portland, Oregon, USA, September 2012. [ bib | .pdf | Abstract ]

C. Valentini-Botinhao, J. Yamagishi, and S. King. Mel cepstral coefficient modification based on the Glimpse Proportion measure for improving the intelligibility of HMM-generated synthetic speech in noise. In Proc. Interspeech, Portland, USA, September 2012. [ bib | Abstract ]

Benigno Uria, Iain Murray, Steve Renals, and Korin Richmond. Deep architectures for articulatory inversion. In Proc. Interspeech, Portland, Oregon, USA, September 2012. [ bib | .pdf | Abstract ]

Zhenhua Ling, Korin Richmond, and Junichi Yamagishi. Vowel creation by articulatory control in HMM-based parametric speech synthesis. In Proc. The Listening Talker Workshop, page 72, Edinburgh, UK, May 2012. [ bib | .pdf ]

C. Valentini-Botinhao, J. Yamagishi, and S. King. Using an intelligibility measure to create noise robust cepstral coefficients for HMM-based speech synthesis. In Proc. LISTA Workshop, Edinburgh, UK, May 2012. [ bib | .pdf ]

Myroslava O. Dzikovska, Peter Bell, Amy Isard, and Johanna D. Moore. Evaluating language understanding accuracy with respect to objective outcomes in a dialogue system. In Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics, pages 471-481, Avignon, France, April 2012. Association for Computational Linguistics. [ bib | http ]

C. Valentini-Botinhao, R. Maia, J. Yamagishi, S. King, and H. Zen. Cepstral analysis based on the Glimpse proportion measure for improving the intelligibility of HMM-based synthetic speech in noise. In Proc. ICASSP, pages 3997-4000, Kyoto, Japan, March 2012. [ bib | DOI | .pdf | Abstract ]

L. Saheer, J. Yamagishi, P.N. Garner, and J. Dines. Combining vocal tract length normalization with hierarchial linear transformations. In Acoustics, Speech and Signal Processing (ICASSP), 2012 IEEE International Conference on, pages 4493 -4496, March 2012. [ bib | DOI | Abstract ]

Chen-Yu Yang, G. Brown, Liang Lu, J. Yamagishi, and S. King. Noise-robust whispered speech recognition using a non-audible-murmur microphone with vts compensation. In Chinese Spoken Language Processing (ISCSLP), 2012 8th International Symposium on, pages 220-223, 2012. [ bib | DOI | Abstract ]

Jaime Lorenzo-Trueba, Oliver Watts, Roberto Barra-Chicote, Junichi Yamagishi, Simon King, and Juan M Montero. Simple4all proposals for the albayzin evaluations in speech synthesis. In Proc. Iberspeech 2012, 2012. [ bib | .pdf | Abstract ]

Eva Hasler, Peter Bell, Arnab Ghoshal, Barry Haddow, Philipp Koehn, Fergus McInnes, Steve Renals, and Pawel Swietojanski. The UEDIN system for the IWSLT 2012 evaluation. In Proc. International Workshop on Spoken Language Translation, 2012. [ bib | .pdf | Abstract ]

Ravichander Vipperla, Maria Wolters, and Steve Renals. Spoken dialogue interfaces for older people. In Kenneth J. Turner, editor, Advances in Home Care Technologies. IOS Press, 2012. [ bib | .pdf | Abstract ]

E. Zwyssig, S. Renals, and M. Lincoln. On the effect of SNR and superdirective beamforming in speaker diarisation in meetings. In Acoustics, Speech and Signal Processing (ICASSP), 2012 IEEE International Conference on, pages 4177-4180, 2012. [ bib | DOI | .pdf | Abstract ]

E. Zwyssig, S. Renals, and M. Lincoln. Determining the number of speakers in a meeting using microphone array features. In Acoustics, Speech and Signal Processing (ICASSP), 2012 IEEE International Conference on, pages 4765-4768, 2012. [ bib | DOI | .pdf | Abstract ]

Sebastian Andersson, Junichi Yamagishi, and Robert A.J. Clark. Synthesis and evaluation of conversational characteristics in HMM-based speech synthesis. Speech Communication, 54(2):175-188, 2012. [ bib | DOI | http | Abstract ]

Ingmar Steiner, Korin Richmond, Ian Marshall, and Calum D. Gray. The magnetic resonance imaging subset of the mngu0 articulatory corpus. The Journal of the Acoustical Society of America, 131(2):EL106-EL111, January 2012. [ bib | DOI | .pdf | Abstract ]

Christopher Burton, Brian McKinstry, Aurora Szentagotai Tatar, Antoni Serrano-Blanco, Claudia Pagliari, and Maria Wolters. Activity monitoring in patients with depression: A systematic review. Journal of Affective Disorders, 145(1):21-28, 2012. [ bib | DOI | http | Abstract ]

Dong Wang, Javier Tejedor, Simon King, and Joe Frankel. Term-dependent confidence normalization for out-of-vocabulary spoken term detection. Journal of Computer Science and Technology, 27(2), 2012. [ bib | DOI | Abstract ]

Maria Wolters, Karl Isaac, and Jason Doherty. Hold that thought: are spearcons less disruptive than spoken reminders? In CHI '12 Extended Abstracts on Human Factors in Computing Systems, CHI EA '12, pages 1745-1750, New York, NY, USA, 2012. ACM. [ bib | DOI | http ]

Maria Wolters and Colin Matheson. Designing Help4Mood: Trade-offs and choices. In Juan Miguel Garcia-Gomez and Patricia Paniagua-Paniagua, editors, Information and Communication Technologies applied to Mental Health. Editorial Universitat Politecnica de Valencia, 2012. [ bib ]

Oliver Watts. Unsupervised Learning for Text-to-Speech Synthesis. PhD thesis, University of Edinburgh, 2012. [ bib | .pdf | Abstract ]

Maria Wolters, Lucy McCloughan, Martin Gibson, Chris Weatherall, Colin Matheson, Tim Maloney, Juan Carlos Castro-Robles, and Soraya Estevez. Monitoring people with depression in the community-regulatory aspectts. In Workshop on People, Computers and Psychiatry at the British Computer Society's Conference on Human Computer Interaction, pages 1745-1750, 2012. [ bib ]

C. Mayo, V. Aubanel, and M. Cooke. Effect of prosodic changes on speech intelligibility. In Proc. Interspeech, Portland, OR, USA, 2012. [ bib ]

Claudia Pagliari, Maria Wolters, Chris Burton, Brian McKinstry, Aurora Szentagotai, Antoni Serrano-Blanco, Daniel David, Luis Ferrini, Susanna Albertini, Joan Carlos Castro, and Soraya Estévez. Psychosocial implications of avatar use in supporting therapy of depression. In CYBER17-17th Annual CyberPsychology & CyberTherapy Conference, 2012. [ bib ]

Mirjam Wester. Talker discrimination across languages. Speech Communication, 54:781-790, 2012. [ bib | DOI | .pdf | Abstract ]

L. Lu, A. Ghoshal, and S. Renals. Maximum a posteriori adaptation of subspace Gaussian mixture models for cross-lingual speech recognition. In Proc. ICASSP, pages 4877-4880, 2012. [ bib | DOI | .pdf | Abstract ]

S. Andersson, J. Yamagishi, and R.A.J. Clark. Synthesis and evaluation of conversational characteristics in HMM-based speech synthesis. Speech Communication, 54(2):175-188, 2012. [ bib | DOI | Abstract ]

Martin Cooke, Maria Luisa García Lecumberri, Yan Tang, and Mirjam Wester. Do non-native listeners benefit from speech modifications designed to promote intelligibility for native listeners? In Proceedings of The Listening Talker Workshop, page 59, 2012. http://listening-talker.org/workshop/programme.html. [ bib ]

Keiichiro Oura, Junichi Yamagishi, Mirjam Wester, Simon King, and Keiichi Tokuda. Analysis of unsupervised cross-lingual speaker adaptation for HMM-based speech synthesis using KLD-based transform mapping. Speech Communication, 54(6):703-714, 2012. [ bib | DOI | http | Abstract ]

Leonardo Badino, Robert A.J. Clark, and Mirjam Wester. Towards hierarchical prosodic prominence generation in TTS synthesis. In Proc. Interspeech, Portland, USA, 2012. [ bib | .pdf ]

Kei Hashimoto, Junichi Yamagishi, William Byrne, Simon King, and Keiichi Tokuda. Impacts of machine translation and speech synthesis on speech-to-speech translation. Speech Communication, 54(7):857-866, 2012. [ bib | DOI | http | Abstract ]

Maria Wolters, Louis Ferrini, Juan Martinez-Miranda, Helen Hastie, and Chris Burton. Help4Mood - a flexible solution for supporting people with depression in the community across europe. In Proceedings of The International eHealth, Telemedicine and Health ICT Forum For Education, Networking and Business (MedeTel, 2012). International Society for Telemedicine & eHealth (ISfTeH), 2012. [ bib ]

Anna C. Janska, Erich Schröger, Thomas Jacobsen, and Robert A. J. Clark. Asymmetries in the perception of synthesized speech. In Proc. Interspeech, Portland, USA, 2012. [ bib | .pdf ]

M. Koutsogiannaki, M. Pettinato, C. Mayo, V. Kandia, and Y. Stylianou. Can modified casual speech reach the intelligibility of clear speech? In Proc. Interspeech, Portland, OR, USA, 2012. [ bib ]

Managing data in Help4Mood. ICST Transactions in Ambient Systems, (Special Issue on Technology in Mental Health):-, 2012. [ bib ]

L. Lu, A. Ghoshal, and S. Renals. Joint uncertainty decoding with unscented transform for noise robust subspace Gaussian mixture model. In Proc. Sapa-Scale workshop, 2012. [ bib | .pdf | Abstract ]

V. Aubanel, M. Cooke, E. Foster, M. L. Garcia-Lecumberri, and C. Mayo. Effects of the availability of visual information and presence of competing conversations on speech production. In Proc. Interspeech, Portland, OR, USA, 2012. [ bib ]

Soraya Estevez, Juan Carlos Castro-Robles, and Maria Wolters. Help4Mood: First release of a computational distributed system to support the treatment of patients with major depression. In Proceedings of The International eHealth, Telemedicine and Health ICT Forum For Education, Networking and Business (MedeTel, 2012), pages 1745-1750. International Society for Telemedicine & eHealth (ISfTeH), 2012. [ bib ]

L. Lu, KK Chin, A. Ghoshal, and S. Renals. Noise compensation for subspace Gaussian mixture models. In Proc. Interspeech, 2012. [ bib | .pdf | Abstract ]

Maria Wolters, Juan Martínez-Miranda, Helen Hastie, and Colin Matheson. Managing data in Help4Mood. In The 2nd International Workshop on Computing Paradigms for Mental Health - MindCare 2012, 2012. [ bib ]

Junichi Yamagishi, Christophe Veaux, Simon King, and Steve Renals. Speech synthesis technologies for individuals with vocal disabilities: Voice banking and reconstruction. Acoustical Science and Technology, 33(1):1-5, 2012. [ bib | DOI | http | .pdf | Abstract ]

Sarah Creer, Stuart Cunningham, Phil Green, and Junichi Yamagishi. Building personalised synthetic voices for individuals with severe speech impairment. Computer Speech and Language, 27(6):1178-1193, 2012. [ bib | DOI | http | Abstract ]

Thomas Hueber, Atef Ben Youssef, Gérard Bailly, Pierre Badin, and Frédéric Elisei. Cross-speaker acoustic-to-articulatory inversion using phone-based trajectory HMM for pronunciation training. In Proc. Interspeech, Portland, Oregon, USA, 2012. [ bib | .pdf | Abstract ]

Gérard Bailly, Pierre Badin, Lionel Revéret, and Atef Ben Youssef. Sensorimotor characteristics of speech production. Cambridge University Press, 2012. [ bib | DOI ]

Ingmar Steiner, Korin Richmond, and Slim Ouni. Using multimodal speech production data to evaluate articulatory animation for audiovisual speech synthesis. In 3rd International Symposium on Facial Analysis and Animation, Vienna, Austria, 2012. [ bib | .pdf ]

Steve Renals, Hervé Bourlard, Jean Carletta, and Andrei Popescu-Belis, editors. Multimodal Signal Processing: Human Interactions in Meetings. Cambridge University Press, 2012. [ bib ]

Aciel Eshky, Ben Allison, and Mark Steedman. Generative goal-driven user simulation for dialog management. In Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, pages 71-81. Association for Computational Linguistics, 2012. [ bib | .pdf | Abstract ]

2011

Benigno Uria, Steve Renals, and Korin Richmond. A deep neural network for acoustic-articulatory speech inversion. In Proc. NIPS 2011 Workshop on Deep Learning and Unsupervised Feature Learning, Sierra Nevada, Spain, December 2011. [ bib | .pdf | Abstract ]

Atef Ben Youssef. Control of talking heads by acoustic-to-articulatory inversion for language learning and rehabilitation. PhD thesis, Grenoble University, October 2011. [ bib | .pdf | Abstract ]

Oliver Watts, Junichi Yamagishi, and Simon King. Unsupervised continuous-valued word features for phrase-break prediction without a part-of-speech tagger. In Proc. Interspeech, pages 2157-2160, Florence, Italy, August 2011. [ bib | .pdf | Abstract ]

Cassia Valentini-Botinhao, Junichi Yamagishi, and Simon King. Can objective measures predict the intelligibility of modified HMM-based synthetic speech in noise? In Proc. Interspeech, August 2011. [ bib | .pdf | Abstract ]

Korin Richmond, Phil Hoole, and Simon King. Announcing the electromagnetic articulography (day 1) subset of the mngu0 articulatory corpus. In Proc. Interspeech, pages 1505-1508, Florence, Italy, August 2011. [ bib | .pdf | Abstract ]

Ming Lei, Junichi Yamagishi, Korin Richmond, Zhen-Hua Ling, Simon King, and Li-Rong Dai. Formant-controlled HMM-based speech synthesis. In Proc. Interspeech, pages 2777-2780, Florence, Italy, August 2011. [ bib | .pdf | Abstract ]

Oliver Watts and Bowen Zhou. Unsupervised features from text for speech synthesis in a speech-to-speech translation system. In Proc. Interspeech, pages 2153-2156, Florence, Italy, August 2011. [ bib | .pdf | Abstract ]

Zhen-Hua Ling, Korin Richmond, and Junichi Yamagishi. Feature-space transform tying in unified acoustic-articulatory modelling of articulatory control of HMM-based speech synthesis. In Proc. Interspeech, pages 117-120, Florence, Italy, August 2011. [ bib | .pdf | Abstract ]

Atef Ben Youssef, Thomas Hueber, Pierre Badin, and Gérard Bailly. Toward a multi-speaker visual articulatory feedback system. In Proc. Interspeech, pages 589-592, Florence, Italie, August 2011. [ bib | .pdf | Abstract ]

Fergus R. McInnes and Sharon J. Goldwater. Unsupervised extraction of recurring words from infant-directed speech. In Proceedings of CogSci 2011, Boston, Massachusetts, July 2011. [ bib | .pdf | Abstract ]

Myroslava Dzikovska, Amy Isard, Peter Bell, Johanna Moore, Natalie Steinhauser, and Gwendolyn Campbell. Beetle II: an adaptable tutorial dialogue system. In Proceedings of the SIGDIAL 2011 Conference, demo session, pages 338-340, Portland, Oregon, June 2011. Association for Computational Linguistics. [ bib | http | Abstract ]

S. Andraszewicz, J. Yamagishi, and S. King. Vocal attractiveness of statistical speech synthesisers. In Acoustics, Speech and Signal Processing (ICASSP), 2011 IEEE International Conference on, pages 5368-5371, May 2011. [ bib | DOI | Abstract ]

P.L. De Leon, I. Hernaez, I. Saratxaga, M. Pucher, and J. Yamagishi. Detection of synthetic speech for the problem of imposture. In Acoustics, Speech and Signal Processing (ICASSP), 2011 IEEE International Conference on, pages 4844-4847, May 2011. [ bib | DOI | Abstract ]

Cassia Valentini-Botinhao, Junichi Yamagishi, and Simon King. Evaluation of objective measures for intelligibility prediction of HMM-based synthetic speech in noise. In Acoustics, Speech and Signal Processing (ICASSP), 2011 IEEE International Conference on, pages 5112-5115, May 2011. [ bib | DOI | .pdf | Abstract ]

J.P. Cabral, S. Renals, J. Yamagishi, and K. Richmond. HMM-based speech synthesiser using the LF-model of the glottal source. In Acoustics, Speech and Signal Processing (ICASSP), 2011 IEEE International Conference on, pages 4704-4707, May 2011. [ bib | DOI | .pdf | Abstract ]

K. Hashimoto, J. Yamagishi, W. Byrne, S. King, and K. Tokuda. An analysis of machine translation and speech synthesis in speech-to-speech translation system. In Acoustics, Speech and Signal Processing (ICASSP), 2011 IEEE International Conference on, pages 5108-5111, May 2011. [ bib | DOI | Abstract ]

Dong Wang, Nicholas Evans, Raphael Troncy, and Simon King. Handling overlaps in spoken term detection. In Proc. International Conference on Acoustics, Speech and Signal Processing, pages 5656-5659, May 2011. [ bib | DOI | .pdf | Abstract ]

Dong Wang and Simon King. Letter-to-sound pronunciation prediction using conditional random fields. IEEE Signal Processing Letters, 18(2):122-125, February 2011. [ bib | DOI | .pdf | Abstract ]

Reima Karhila and Mirjam Wester. Rapid adaptation of foreign-accented HMM-based speech synthesis. In Proc. Interspeech, Florence, Italy, 2011. [ bib | .pdf | Abstract ]

Myroslava Dzikovska, Amy Isard, Peter Bell, Johanna D. Moore, Natalie B. Steinhauser, Gwendolyn E. Campbell, Leanne S. Taylor, Simon Caine, and Charlie Scott. Adaptive intelligent tutorial dialogue in the Beetle II system. In Artificial Intelligence in Education - 15th International Conference (AIED 2011), interactive event, volume 6738 of Lecture Notes in Computer Science, page 621, Auckland, New Zealand, 2011. Springer. [ bib | DOI ]

Mirjam Wester and Hui Liang. Cross-lingual speaker discrimination using natural and synthetic speech. In Proc. Interspeech, Florence, Italy, 2011. [ bib | .pdf | Abstract ]

T. Raitio, A. Suni, J. Yamagishi, H. Pulakka, J. Nurminen, M. Vainio, and P. Alku. HMM-based speech synthesis utilizing glottal inverse filtering. IEEE Transactions on Audio, Speech and Language Processing, 19(1):153-165, January 2011. [ bib | DOI | Abstract ]

Theresa Wilson and Gregor Hofer. Using linguistic and vocal expressiveness in social role recognition. In Proc Int. Conf. on Intelligent User Interfaces, IUI2011, Palo Alto, USA, 2011. ACM. [ bib | .pdf | Abstract ]

J. Dines, J. Yamagishi, and S. King. Measuring the gap between HMM-based ASR and TTS. IEEE Selected Topics in Signal Processing, 2011. (in press). [ bib | DOI | Abstract ]

Mirjam Wester and Reima Karhila. Speaker similarity evaluation of foreign-accented speech synthesis using HMM-based speaker adaptation. In Proc. ICASSP, pages 5372-5375, Prague, Czech Republic, 2011. [ bib | .pdf | Abstract ]

Maria Klara Wolters, Christine Johnson, and Karl B Isaac. Can the hearing handicap inventory for adults be used as a screen for perception experiments? In Proc. ICPhS XVII, Hong Kong, 2011. [ bib | .pdf | Abstract ]

Adriana Stan, Junichi Yamagishi, Simon King, and Matthew Aylett. The Romanian speech synthesis (RSS) corpus: Building a high quality HMM-based speech synthesis system using a high sampling rate. Speech Communication, 53(3):442-450, 2011. [ bib | DOI | http | Abstract ]

L. Lu, A. Ghoshal, and S. Renals. Regularized subspace gausian mixture models for speech recognition. IEEE Signal Processing Letters, 18(7):419-422, 2011. [ bib | .pdf | Abstract ]

A. G. Pipe, R. Vaidyanathan, C. Melhuish, P. Bremner, P. Robinson, R. A. J. Clark, A. Lenz, K. Eder, N. Hawes, Z. Ghahramani, M. Fraser, M. Mermehdi, P. Healey, and S. Skachek. Affective robotics: Human motion and behavioural inspiration for cooperation between humans and assistive robots. In Yoseph Bar-Cohen, editor, Biomimetics: Nature-Based Innovation, chapter 15. Taylor and Francis, 2011. [ bib ]

Michael A. Berger, Gregor Hofer, and Hiroshi Shimodaira. Carnival - combining speech technology and computer animation. IEEE Computer Graphics and Applications, 31:80-89, 2011. [ bib | DOI ]

Jonathan Kilgour, Jean Carletta, and Steve Renals. The Ambient Spotlight: Personal meeting capture with a microphone array. In Proc. HSCMA, 2011. [ bib | DOI | .pdf | Abstract ]

S Renals. Automatic analysis of multiparty meetings. SADHANA - Academy Proceedings in Engineering Sciences, 36(5):917-932, 2011. [ bib | DOI | .pdf | Abstract ]

Mirjam Wester and Hui Liang. The EMIME Mandarin Bilingual Database. Technical Report EDI-INF-RR-1396, The University of Edinburgh, 2011. [ bib | .pdf | Abstract ]

Andi K. Winterboer, Martin I. Tietze, Maria K. Wolters, and Johanna D. Moore. The user-model based summarize and refine approach improves information presentation in spoken dialog systems. Computer Speech and Language, 25(2):175-191, 2011. [ bib | .pdf | Abstract ]

C. Mayo, R. A. J. Clark, and S. King. Listeners' weighting of acoustic cues to synthetic speech naturalness: A multidimensional scaling analysis. Speech Communication, 53(3):311-326, 2011. [ bib | DOI | Abstract ]

L. Lu, A. Ghoshal, and S. Renals. Regularized subspace Gausian mixture models for cross-lingual speech recognition. In Proc. ASRU, 2011. [ bib | .pdf | Abstract ]

Atef Ben Youssef, Thomas Hueber, Pierre Badin, Gérard Bailly, and Frédéric Elisei. Toward a speaker-independent visual articulatory feedback system. In 9th International Seminar on Speech Production, ISSP9, Montreal, Canada, 2011. [ bib | .pdf ]

Thomas Hueber, Pierre Badin, Gérard Bailly, Atef Ben Youssef, Frédéric Elisei, Bruce Denby, and Gérard Chollet. Statistical mapping between articulatory and acoustic data. application to silent speech interface and visual articulatory feedback. In Proceedings of the 1st International Workshop on Performative Speech and Singing Synthesis (p3s), Vancouver, Canada, 2011. [ bib | .pdf | Abstract ]

2010

Dong Wang, Simon King, Nick Evans, and Raphael Troncy. Direct posterior confidence for out-of-vocabulary spoken term detection. In Proc. ACM Multimedia 2010 Searching Spontaneous Conversational Speech Workshop, October 2010. [ bib | DOI | .pdf | Abstract ]

Zhen-Hua Ling, Korin Richmond, and Junichi Yamagishi. An analysis of HMM-based prediction of articulatory movements. Speech Communication, 52(10):834-846, October 2010. [ bib | DOI | Abstract ]

Jochen Ehnes. A precise controllable projection system for projected virtual characters and its calibration. In IEEE International Symposium on Mixed and Augmented Reality 2010 Science and Technolgy Proceedings, pages 221-222, Seoul, Korea, October 2010. [ bib | .pdf | Abstract ]

Zhen-Hua Ling, Korin Richmond, and Junichi Yamagishi. HMM-based text-to-articulatory-movement prediction and analysis of critical articulators. In Proc. Interspeech, pages 2194-2197, Makuhari, Japan, September 2010. [ bib | .pdf | Abstract ]

Daniel Felps, Christian Geng, Michael Berger, Korin Richmond, and Ricardo Gutierrez-Osuna. Relying on critical articulators to estimate vocal tract spectra in an articulatory-acoustic database. In Proc. Interspeech, pages 1990-1993, September 2010. [ bib | .pdf | Abstract ]

Korin Richmond, Robert Clark, and Sue Fitt. On generating Combilex pronunciations via morphological analysis. In Proc. Interspeech, pages 1974-1977, Makuhari, Japan, September 2010. [ bib | .pdf | Abstract ]

Yong Guan, Jilei Tian, Yi-Jian Wu, Junichi Yamagishi, and Jani Nurminen. A unified and automatic approach of Mandarin HTS system. In Proc. SSW7, Kyoto, Japan, September 2010. [ bib | .pdf ]

Mirjam Wester. Cross-lingual talker discrimination. In Proc. Interspeech, Makuhari, Japan, September 2010. [ bib | .pdf | Abstract ]

João Cabral, Steve Renals, Korin Richmond, and Junichi Yamagishi. Transforming voice source parameters in a HMM-based speech synthesiser with glottal post-filtering. In Proc. 7th ISCA Speech Synthesis Workshop (SSW7), pages 365-370, NICT/ATR, Kyoto, Japan, September 2010. [ bib | .pdf | Abstract ]

Ravi Chander Vipperla, Steve Renals, and Joe Frankel. Augmentation of adaptation data. In Proc. Interspeech, pages 530-533, Makuhari, Japan, September 2010. [ bib | .pdf | Abstract ]

Dong Wang, Simon King, Nick Evans, and Raphael Troncy. CRF-based stochastic pronunciation modelling for out-of-vocabulary spoken term detection. In Proc. Interspeech, Makuhari, Chiba, Japan, September 2010. [ bib | Abstract ]

Oliver Watts, Junichi Yamagishi, and Simon King. The role of higher-level linguistic features in HMM-based speech synthesis. In Proc. Interspeech, pages 841-844, Makuhari, Japan, September 2010. [ bib | .pdf | Abstract ]

Gregor Hofer and Korin Richmond. Comparison of HMM and TMDN methods for lip synchronisation. In Proc. Interspeech, pages 454-457, Makuhari, Japan, September 2010. [ bib | .pdf | Abstract ]

Junichi Yamagishi, Oliver Watts, Simon King, and Bela Usabaev. Roles of the average voice in speaker-adaptive HMM-based speech synthesis. In Proc. Interspeech, pages 418-421, Makuhari, Japan, September 2010. [ bib | .pdf | Abstract ]

Mirjam Wester, John Dines, Matthew Gibson, Hui Liang, Yi-Jian Wu, Lakshmi Saheer, Simon King, Keiichiro Oura, Philip N. Garner, William Byrne, Yong Guan, Teemu Hirsimäki, Reima Karhila, Mikko Kurimo, Matt Shannon, Sayaka Shiota, Jilei Tian, Keiichi Tokuda, and Junichi Yamagishi. Speaker adaptation and the evaluation of speaker similarity in the EMIME speech-to-speech translation project. In Proc. 7th ISCA Speech Synthesis Workshop, Kyoto, Japan, September 2010. [ bib | .pdf | Abstract ]

Michael Pucher, Dietmar Schabus, and Junichi Yamagishi. Synthesis of fast speech with interpolation of adapted HSMMs and its evaluation by blind and sighted listeners. In Proc. Interspeech, pages 2186-2189, Makuhari, Japan, September 2010. [ bib | .pdf | Abstract ]

Sebastian Andersson, Junichi Yamagishi, and Robert Clark. Utilising spontaneous conversational speech in HMM-based speech synthesis. In The 7th ISCA Tutorial and Research Workshop on Speech Synthesis, September 2010. [ bib | .pdf | Abstract ]

Javier Tejedor, Doroteo T. Toledano, Miguel Bautista, Simon King, Dong Wang, and Jose Colas. Augmented set of features for confidence estimation in spoken term detection. In Proc. Interspeech, September 2010. [ bib | .pdf | Abstract ]

Oliver Watts, Junichi Yamagishi, and Simon King. Letter-based speech synthesis. In Proc. Speech Synthesis Workshop 2010, pages 317-322, Nara, Japan, September 2010. [ bib | .pdf | Abstract ]

Atef Ben Youssef, Pierre Badin, and Gérard Bailly. Can tongue be recovered from face? the answer of data-driven statistical models. In Proc. Interspeech, pages 2002-2005, Makuhari, Japan, September 2010. [ bib | .pdf | Abstract ]

O. Watts, J. Yamagishi, S. King, and K. Berkling. Synthesis of child speech with HMM adaptation and voice conversion. Audio, Speech, and Language Processing, IEEE Transactions on, 18(5):1005-1016, July 2010. [ bib | DOI | .pdf | Abstract ]

Alice Turk, James Scobbie, Christian Geng, Barry Campbell, Catherine Dickie, Eddie Dubourg, Ellen Gurman Bard, William Hardcastle, Mariam Hartinger, Simon King, Robin Lickley, Cedric Macmartin, Satsuki Nakai, Steve Renals, Korin Richmond, Sonja Schaeffler, Kevin White, Ronny Wiegand, and Alan Wrench. An Edinburgh speech production facility. Poster presented at the 12th Conference on Laboratory Phonology, Albuquerque, New Mexico., July 2010. [ bib | .pdf ]

D. Wang, S. King, and J. Frankel. Stochastic pronunciation modelling for out-of-vocabulary spoken term detection. Audio, Speech, and Language Processing, IEEE Transactions on, PP(99), July 2010. [ bib | DOI | Abstract ]

Mikko Kurimo, William Byrne, John Dines, Philip N. Garner, Matthew Gibson, Yong Guan, Teemu Hirsimäki, Reima Karhila, Simon King, Hui Liang, Keiichiro Oura, Lakshmi Saheer, Matt Shannon, Sayaka Shiota, Jilei Tian, Keiichi Tokuda, Mirjam Wester, Yi-Jian Wu, and Junichi Yamagishi. Personalising speech-to-speech translation in the EMIME project. In Proc. ACL 2010 System Demonstrations, Uppsala, Sweden, July 2010. [ bib | .pdf | Abstract ]

J. Yamagishi, B. Usabaev, S. King, O. Watts, J. Dines, J. Tian, R. Hu, Y. Guan, K. Oura, K. Tokuda, R. Karhila, and M. Kurimo. Thousands of voices for HMM-based speech synthesis - analysis and application of TTS systems built on various ASR corpora. IEEE Transactions on Audio, Speech and Language Processing, 18(5):984-1004, July 2010. [ bib | DOI | Abstract ]

Sebastian Andersson, Kallirroi Georgila, David Traum, Matthew Aylett, and Robert Clark. Prediction and realisation of conversational characteristics by utilising spontaneous speech for unit selection. In Speech Prosody 2010, May 2010. [ bib | .pdf | Abstract ]

R. Barra-Chicote, J. Yamagishi, S. King, J. Manuel Monero, and J. Macias-Guarasa. Analysis of statistical parametric and unit-selection speech synthesis systems applied to emotional speech. Speech Communication, 52(5):394-404, May 2010. [ bib | DOI | Abstract ]

Atef Ben Youssef, Pierre Badin, Gérard Bailly, and Viet-Anh Tran. Méthodes basées sur les hmms et les gmms pour l'inversion acoustico-articulatoire en parole. In Proc. JEP, pages 249-252, Mons, Belgium, May 2010. [ bib | .pdf | Abstract ]

Dong Wang, Simon King, Joe Frankel, and Peter Bell. Stochastic pronunciation modelling and soft match for out-of-vocabulary spoken term detection. In Proc. ICASSP, Dallas, Texas, USA, March 2010. [ bib | .pdf | Abstract ]

Kallirroi Georgila, Maria Wolters, Johanna D. Moore, and Robert H. Logie. The MATCH corpus: A corpus of older and younger users' interactions with spoken dialogue systems. Language Resources and Evaluation, 44(3):221-261, March 2010. [ bib | DOI | Abstract ]

Peter Bell. Full covariance modelling for speech recognition. PhD thesis, University of Edinburgh, 2010. [ bib | .pdf | Abstract ]

Erich Zwyssig, Mike Lincoln, and Steve Renals. A digital microphone array for distant speech recognition. In Proc. IEEE ICASSP-10, pages 5106-5109, 2010. [ bib | DOI | .pdf | Abstract ]

Maria Wolters and Marilyn McGee-Lennon. Designing usable and acceptable reminders for the home. In Proc. AAATE Workshop AT Technology Transfer, Sheffield, UK, 2010. [ bib | .pdf | Abstract ]

Steve Renals. Recognition and understanding of meetings. In Proc. NAACL/HLT, pages 1-9, 2010. [ bib | .pdf | Abstract ]

Jonathan Kilgour, Jean Carletta, and Steve Renals. The Ambient Spotlight: Queryless desktop search from meeting speech. In Proc ACM Multimedia 2010 Workshop SSCS 2010, 2010. [ bib | DOI | .pdf | Abstract ]

Michael Berger, Gregor Hofer, and Hiroshi Shimodaira. Carnival: a modular framework for automated facial animation. Poster at SIGGRAPH 2010, 2010. Bronze award winner, ACM Student Research Competition. [ bib | .pdf ]

Simon King. Speech synthesis. In Morgan and Ellis, editors, Speech and Audio Signal Processing. Wiley, 2010. [ bib | Abstract ]

Anna C. Janska and Robert A. J. Clark. Native and non-native speaker judgements on the quality of synthesized speech. In Proc. Interspeech, pages 1121-1124, 2010. [ bib | .pdf | Abstract ]

M. Wester. The EMIME Bilingual Database. Technical Report EDI-INF-RR-1388, The University of Edinburgh, 2010. [ bib | .pdf | Abstract ]

P. L. De Leon, V. R. Apsingekar, M. Pucher, and J. Yamagishi. Revisiting the security of speaker verification systems against imposture using synthetic speech. In Proc. ICASSP 2010, Dallas, Texas, USA, 2010. [ bib | .pdf ]

Maria Wolters, Klaus-Peter Engelbrecht, Florian Gödde, Sebastian Möller, Anja Naumann, and Robert Schleicher. Making it easier for older people to talk to smart homes: Using help prompts to shape users' speech. Universal Access in the Information Society, 9(4):311-325, 2010. [ bib | DOI | Abstract ]

Michael White, Robert A. J. Clark, and Johanna D. Moore. Generating tailored, comparative descriptions with contextually appropriate intonation. Computational Linguistics, 36(2):159-201, 2010. [ bib | DOI | Abstract ]

Michael Pucher, Friedrich Neubarth, and Volker Strom. Optimizing phonetic encoding for Viennese unit selection speech synthesis. In A. Esposito et al., editor, COST 2102 Int. Training School 2009, LNCS, Heidelberg, 2010. Springer-Verlag. [ bib | .ps | .pdf | Abstract ]

Songfang Huang and Steve Renals. Hierarchical Bayesian language models for conversational speech recognition. IEEE Transactions on Audio, Speech and Language Processing, 18(8):1941-1954, January 2010. [ bib | DOI | http | .pdf | Abstract ]

Michael Pucher, Friedrich Neubarth, Volker Strom, Sylvia Moosmüller, Gregor Hofer, Christian Kranzler, Gudrun Schuchmann, and Dietmar Schabus. Resources for speech synthesis of viennese varieties. In Proc. Int. Conf. on Language Resources and Evaluation, LREC'10, Malta, 2010. European Language Resources Association (ELRA). [ bib | .ps | .pdf | Abstract ]

Anna C. Janska and Robert A. J. Clark. Further exploration of the possibilities and pitfalls of multidimensional scaling as a tool for the evaluation of the quality of synthesized speech. In The 7th ISCA Tutorial and Research Workshop on Speech Synthesis, pages 142-147, 2010. [ bib | .pdf | Abstract ]

Songfang Huang and Steve Renals. Power law discounting for n-gram language models. In Proc. IEEE ICASSP-10, pages 5178-5181, 2010. [ bib | DOI | http | .pdf | Abstract ]

Maria K. Wolters, Karl B. Isaac, and Steve Renals. Evaluating speech synthesis intelligibility using Amazon Mechanical Turk. In Proc. 7th Speech Synthesis Workshop (SSW7), pages 136-141, 2010. [ bib | .pdf | Abstract ]

P.L. De Leon, M. Pucher, and J. Yamagishi. Evaluation of the vulnerability of speaker verification to synthetic speech. In Proc. Odyssey (The speaker and language recognition workshop) 2010, Brno, Czech Republic, 2010. [ bib | .pdf ]

Steve Renals and Simon King. Automatic speech recognition. In William J. Hardcastle, John Laver, and Fiona E. Gibbon, editors, Handbook of Phonetic Sciences, chapter 22. Wiley Blackwell, 2010. [ bib ]

Ravi Chander Vipperla, Steve Renals, and Joe Frankel. Ageing voices: The effect of changes in voice parameters on ASR performance. EURASIP Journal on Audio, Speech, and Music Processing, 2010. [ bib | DOI | http | .pdf | Abstract ]

Keiichiro Oura, Keiichi Tokuda, Junichi Yamagishi, Mirjam Wester, and Simon King. Unsupervised cross-lingual speaker adaptation for HMM-based speech synthesis. In Proc. ICASSP, volume I, pages 4954-4957, 2010. [ bib | .pdf | Abstract ]

Gregor Hofer, Korin Richmond, and Michael Berger. Lip synchronization by acoustic inversion. Poster at Siggraph 2010, 2010. [ bib | .pdf ]

Steve Renals and Thomas Hain. Speech recognition. In Alex Clark, Chris Fox, and Shalom Lappin, editors, Handbook of Computational Linguistics and Natural Language Processing. Wiley Blackwell, 2010. [ bib ]

Volker Strom and Simon King. A classifier-based target cost for unit selection speech synthesis trained on perceptual data. In Proc. Interspeech, Makuhari, Japan, 2010. [ bib | .ps | .pdf | Abstract ]

Kallirroi Georgila, Maria Wolters, and Johanna D. Moore. Learning dialogue strategies from older and younger simulated users. In Proc. SIGDIAL, 2010. [ bib | .pdf | Abstract ]

Alice Turk, James Scobbie, Christian Geng, Cedric Macmartin, Ellen Bard, Barry Campbell, Catherine Dickie, Eddie Dubourg, Bill Hardcastle, Phil Hoole, Evia Kanaida, Robin Lickley, Satsuki Nakai, Marianne Pouplier, Simon King, Steve Renals, Korin Richmond, Sonja Schaeffler, Ronnie Wiegand, Kevin White, and Alan Wrench. The Edinburgh Speech Production Facility's articulatory corpus of spontaneous dialogue. The Journal of the Acoustical Society of America, 128(4):2429-2429, 2010. [ bib | DOI | Abstract ]

Michael Pucher, Dietmar Schabus, Junichi Yamagishi, Friedrich Neubarth, and Volker Strom. Modeling and interpolation of Austrian German and Viennese dialect in HMM-based speech synthesis. Speech Communication, 52(2):164-179, 2010. [ bib | DOI | Abstract ]

Maria K. Wolters, Florian Gödde, Sebastian Möller, and Klaus-Peter Engelbrecht. Finding patterns in user quality judgements. In Proc. ISCA Workshop Perceptual Quality of Speech Systems, Dresden, Germany, 2010. [ bib | .pdf | Abstract ]

J. Yamagishi and S. King. Simple methods for improving speaker-similarity of HMM-based speech synthesis. In Proc. ICASSP 2010, Dallas, Texas, USA, 2010. [ bib | .pdf ]

Jonathan Kilgour, Jean Carletta, and Steve Renals. The Ambient Spotlight: Personal multimodal search without query. In Proc. ICMI-MLMI, 2010. [ bib | DOI | http | .pdf | Abstract ]

Simon King. A tutorial on HMM speech synthesis (invited paper). In Sadhana - Academy Proceedings in Engineering Sciences, Indian Institute of Sciences, 2010. [ bib | .pdf | Abstract ]

Atef Ben Youssef, Pierre Badin, and Gérard Bailly. Acoustic-to-articulatory inversion in speech based on statistical models. In Proc. AVSP 2010, pages 160-165, Hakone, Kanagawa, Japon, 2010. [ bib | .pdf | Abstract ]

Pierre Badin, Atef Ben Youssef, Gérard Bailly, Frédéric Elisei, and Thomas Hueber. Visual articulatory feedback for phonetic correction in second language learning. In Workshop on Second Language Studies: Acquisition, Learning, Education and Technology, Tokyo, Japan, 2010. [ bib | .pdf | Abstract ]

2009

Peter Bell and Simon King. Diagonal priors for full covariance speech recognition. In Proc. IEEE Workshop on Automatic Speech Recognition and Understanding, Merano, Italy, December 2009. [ bib | DOI | .pdf | Abstract ]

Heiga Zen, Keiichiro Oura, Takashi Nose, Junichi Yamagishi, Shinji Sako, Tomoki Toda, Takashi Masuko, Alan W. Black, and Keiichi Tokuda. Recent development of the HMM-based speech synthesis system (HTS). In Proc. 2009 Asia-Pacific Signal and Information Processing Association (APSIPA), Sapporo, Japan, October 2009. [ bib | .pdf | Abstract ]

J. Sebastian Andersson, Joao P. Cabral, Leonardo Badino, Junichi Yamagishi, and Robert A.J. Clark. Glottal source and prosodic prominence modelling in HMM-based speech synthesis for the Blizzard Challenge 2009. In The Blizzard Challenge 2009, Edinburgh, U.K., September 2009. [ bib | .pdf | Abstract ]

Dong Wang, Simon King, and Joe Frankel. Stochastic pronunciation modelling for spoken term detection. In Proc. Interspeech, pages 2135-2138, Brighton, UK, September 2009. [ bib | .pdf | Abstract ]

Maria Wolters, Ravichander Vipperla, and Steve Renals. Age recognition for spoken dialogue systems: Do we need it? In Proc. Interspeech, September 2009. [ bib | .pdf | Abstract ]

Songfang Huang and Steve Renals. A parallel training algorithm for hierarchical Pitman-Yor process language models. In Proc. Interspeech'09, pages 2695-2698, Brighton, UK, September 2009. [ bib | .pdf | Abstract ]

Oliver Watts, Junichi Yamagishi, Simon King, and Kay Berkling. HMM adaptation and voice conversion for the synthesis of child speech: A comparison. In Proc. Interspeech 2009, pages 2627-2630, Brighton, U.K., September 2009. [ bib | .pdf | Abstract ]

Simon King and Vasilis Karaiskos. The Blizzard Challenge 2009. In Proc. Blizzard Challenge Workshop, Edinburgh, UK, September 2009. [ bib | .pdf | Abstract ]

Dong Wang, Simon King, Joe Frankel, and Peter Bell. Term-dependent confidence for out-of-vocabulary term detection. In Proc. Interspeech, pages 2139-2142, Brighton, UK, September 2009. [ bib | .pdf | Abstract ]

Michal Dziemianko, Gregor Hofer, and Hiroshi Shimodaira. HMM-based automatic eye-blink synthesis from speech. In Proc. Interspeech, pages 1799-1802, Brighton, UK, September 2009. [ bib | .pdf | Abstract ]

Leonardo Badino, J. Sebastian Andersson, Junichi Yamagishi, and Robert A.J. Clark. Identification of contrast and its emphatic realization in HMM-based speech synthesis. In Proc. Interspeech 2009, Brighton, U.K., September 2009. [ bib | .PDF | Abstract ]

Junichi Yamagishi, Mike Lincoln, Simon King, John Dines, Matthew Gibson, Jilei Tian, and Yong Guan. Analysis of unsupervised and noise-robust speaker-adaptive HMM-based speech synthesis systems toward a unified ASR and TTS framework. In Proc. Interspeech 2009, Edinburgh, U.K., September 2009. [ bib | Abstract ]

K. Richmond. Preliminary inversion mapping results with a new EMA corpus. In Proc. Interspeech, pages 2835-2838, Brighton, UK, September 2009. [ bib | .pdf | Abstract ]

K. Richmond, R. Clark, and S. Fitt. Robust LTS rules with the Combilex speech technology lexicon. In Proc. Interspeech, pages 1295-1298, Brighton, UK, September 2009. [ bib | .pdf | Abstract ]

J. Dines, J. Yamagishi, and S. King. Measuring the gap between HMM-based ASR and TTS. In Proc. Interspeech, pages 1391-1394, Brighton, U.K., September 2009. [ bib | Abstract ]

Javier Tejedor, Dong Wang, Simon King, Joe Frankel, and Jose Colas. A posterior probability-based system hybridisation and combination for spoken term detection. In Proc. Interspeech, pages 2131-2134, Brighton, UK, September 2009. [ bib | .pdf | Abstract ]

I. Steiner and K. Richmond. Towards unsupervised articulatory resynthesis of German utterances using EMA data. In Proc. Interspeech, pages 2055-2058, Brighton, UK, September 2009. [ bib | .pdf | Abstract ]

J. Yamagishi, Bela Usabaev, Simon King, Oliver Watts, John Dines, Jilei Tian, Rile Hu, Yong Guan, Keiichiro Oura, Keiichi Tokuda, Reima Karhila, and Mikko Kurimo. Thousands of voices for HMM-based speech synthesis. In Proc. Interspeech, pages 420-423, Brighton, U.K., September 2009. [ bib | http | Abstract ]

Atef Ben Youssef, Pierre Badin, Gérard Bailly, and Panikos Heracleous. Acoustic-to-articulatory inversion using speech recognition and trajectory formation based on phoneme hidden markov models. In Proc. Interspeech, pages 2255-2258, Brighton, UK, September 2009. [ bib | .pdf | Abstract ]

Z. Ling, K. Richmond, J. Yamagishi, and R. Wang. Integrating articulatory features into HMM-based parametric speech synthesis. IEEE Transactions on Audio, Speech and Language Processing, 17(6):1171-1185, August 2009. IEEE SPS 2010 Young Author Best Paper Award. [ bib | DOI | Abstract ]

Jochen Ehnes. A tangible interface for the AMI content linking device - the automated meeting assistant. In Lucia Lo Bello and Giancarlo Iannizzotto, editors, Proceedings of HSI 2009, pages 306-313, May 2009. Best Paper Award (Human Machine Interaction). [ bib | .pdf | Abstract ]

Songfang Huang and Bowen Zhou. An EM algorithm for SCFG in formal syntax-based translation. In Proc. IEEE International Conference on Acoustic, Speech, and Signal Processing (ICASSP'09), pages 4813-4816, Taiwan, China, April 2009. [ bib | .pdf | Abstract ]

Dong Wang, Tejedor Tejedor, Joe Frankel, and Simon King. Posterior-based confidence measures for spoken term detection. In Proc. ICASSP09, Taiwan, April 2009. [ bib | .pdf | Abstract ]

J. Cabral, S. Renals, K. Richmond, and J. Yamagishi. HMM-based speech synthesis with an acoustic glottal source model. In Proc. The First Young Researchers Workshop in Speech Technology, April 2009. [ bib | .pdf | Abstract ]

Christine Johnson, Pauline Campbell, Christine DePlacido, Amy Liddell, and Maria Wolters. Does peripheral hearing loss affect RGDT thresholds in older adults. In Proceedings of the American Auditory Society Conference, March 2009. [ bib | .pdf | Abstract ]

Jochen Ehnes. An automated meeting assistant: A tangible mixed reality interface for the AMIDA automatic content linking device. In Filipe and Cordeiro [38], pages 952-962. [ bib | DOI | .pdf | Abstract ]

Gabriel Murray, Thomas Kleinbauer, Peter Poller, Tilman Becker, Steve Renals, and Jonathan Kilgour. Extrinsic summarization evaluation: A decision audit task. ACM Transactions on Speech and Language Processing, 6(2):1-29, 2009. [ bib | DOI | http | .pdf | Abstract ]

Heriberto Cuayáhuitl. Hierarchical Reinforcement Learning for Spoken Dialogue Systems. PhD thesis, School of Informatics, University of Edinburgh, January 2009. [ bib | .pdf | Abstract ]

Ravi Chander Vipperla, Maria Wolters, Kallirroi Georgila, and Steve Renals. Speech input from older users in smart environments: Challenges and perspectives. In Proc. HCI International: Universal Access in Human-Computer Interaction. Intelligent and Ubiquitous Interaction Environments, number 5615 in Lecture Notes in Computer Science. Springer, 2009. [ bib | DOI | http | .pdf | Abstract ]

Jochen Ehnes. A tangible mixed reality interface for the AMI automated meeting assistant. In Michael J. Smith and Gavriel Salvendy, editors, Human Interface and the Management of Information, volume 5617 of Lecture Notes in Computer Science, pages 485-494. Springer, 2009. [ bib | .pdf | Abstract ]

Heriberto Cuayáhuitl, Steve Renals, Oliver Lemon, and Hiroshi Shimodaira. Evaluation of a hierarchical reinforcement learning spoken dialogue system. Computer Speech and Language, 24(2):395-429, 2009. [ bib | DOI | .pdf | Abstract ]

Sarah Creer, Phil Green, Stuart Cunningham, and Junichi Yamagishi. Building personalised synthesised voices for individuals with dysarthria using the HTS toolkit. In John W. Mullennix and Steven E. Stern, editors, Computer Synthesized Speech Technologies: Tools for Aiding Impairment. IGI Global, 1st edition, 2009. in press. [ bib | Abstract ]

Matthew P. Aylett, Simon King, and Junichi Yamagishi. Speech synthesis without a phone inventory. In Interspeech, pages 2087-2090, 2009. [ bib | .pdf | Abstract ]

Richard S. McGowan and Michael A. Berger. Acoustic-articulatory mapping in vowels by locally weighted regression. Journal of the Acoustical Society of America, 126(4):2011-2032, 2009. [ bib | .pdf | Abstract ]

Martin I. Tietze, Andi Winterboer, and Johanna D. Moore. The effect of linguistic devices in information presentation messages on recall and comprehension. In Proceedings ENLG09, 2009. [ bib | .pdf ]

Maria Wolters, Kallirroi Georgila, Sarah MacPherson, and Johanna Moore. Being old doesn't mean acting old: Older users' interaction with spoken dialogue systems. ACM Transactions on Accessible Computing, 2(1):1-39, 2009. [ bib | http | Abstract ]

Junichi Yamagishi, Takashi Nose, Heiga Zen, Zhenhua Ling, Tomoki Toda, Keiichi Tokuda, Simon King, and Steve Renals. Robust speaker-adaptive HMM-based text-to-speech synthesis. IEEE Transactions on Audio, Speech and Language Processing, 17(6):1208-1230, 2009. [ bib | http | www: | Abstract ]

Joaquim Filipe and José Cordeiro, editors. Enterprise Information Systems, 11th International Conference, ICEIS 2009, Milan, Italy, May 6-10, 2009. Proceedings, volume 24 of Lecture Notes in Business Information Processing. Springer, 2009. [ bib | DOI ]

Le Zhang. Modelling Speech Dynamics with Trajectory-HMMs. PhD thesis, School of Informatics, University of Edinburgh, January 2009. [ bib | .pdf | Abstract ]

Maria Wolters, Kallirroi Georgila, Robert Logie, Sarah MacPherson, Johanna Moore, and Matt Watson. Reducing working memory load in spoken dialogue systems. Interacting with Computers, 21(4):276-287, 2009. [ bib | .pdf | Abstract ]

Y. Hifny and S. Renals. Speech recognition using augmented conditional random fields. IEEE Transactions on Audio, Speech and Language Processing, 17(2):354-365, 2009. [ bib | http | .pdf | Abstract ]

John Niekrasz and Johanna Moore. Participant subjectivity and involvement as a basis for discourse segmentation. In Proceedings of the SIGDIAL 2009 Conference, pages 54-61, 2009. [ bib | .pdf | Abstract ]

Atef Ben Youssef, Viet-Anh Tran, Pierre Badin, and Gérard Bailly. Hmms and gmms based methods in acoustic-to-articulatory speech inversion. In Proc. RJCP, pages 186-192, Avignon, France, 2009. [ bib | .pdf ]

2008

Tanja Kocjancic. Ultrasound investigation of tongue movements in syllables with different onset structure. In Proc. Eighth International Seminar on Speech Production (ISSP), December 2008. [ bib | .pdf | Abstract ]

I. Steiner and K. Richmond. Generating gestural timing from EMA data using articulatory resynthesis. In Proc. 8th International Seminar on Speech Production, Strasbourg, France, December 2008. [ bib | Abstract ]

R. Barra-Chicote, J. Yamagishi, J.M. Montero, S. King, S. Lutfi, and J. Macias-Guarasa. Generacion de una voz sintetica en Castellano basada en HSMM para la Evaluacion Albayzin 2008: conversion texto a voz. In V Jornadas en Tecnologia del Habla, pages 115-118, November 2008. (in Spanish). [ bib | .pdf ]

Javier Tejedor, Dong Wang, Joe Frankel, Simon King, and José Colás. A comparison of grapheme and phoneme-based units for Spanish spoken term detection. Speech Communication, 50(11-12):980-991, November 2008. [ bib | DOI | Abstract ]

Oliver Watts, Junichi Yamagishi, Kay Berkling, and Simon King. HMM-based synthesis of child speech. In Proc. 1st Workshop on Child, Computer and Interaction (ICMI'08 post-conference workshop), Crete, Greece, October 2008. [ bib | .pdf | Abstract ]

Zhen-Hua Ling, Korin Richmond, Junichi Yamagishi, and Ren-Hua Wang. Articulatory control of HMM-based parametric speech synthesis driven by phonetic knowledge. In Proc. Interspeech, pages 573-576, Brisbane, Australia, September 2008. [ bib | .PDF | Abstract ]

Matthew P. Aylett and Junichi Yamagishi. Combining statistical parameteric speech synthesis and unit-selection for automatic voice cloning. In Proc. LangTech 2008, Brisbane, Australia, September 2008. [ bib | .pdf | Abstract ]

Martin Tietze, Vera Demberg, and Johanna D. Moore. Syntactic complexity induces explicit grounding in the MapTask corpus. In Proc. Interspeech, September 2008. [ bib | .pdf | Abstract ]

Peter Bell and Simon King. A shrinkage estimator for speech recognition with full covariance HMMs. In Proc. Interspeech, Brisbane, Australia, September 2008. Shortlisted for best student paper award. [ bib | .pdf | Abstract ]

Songfang Huang and Steve Renals. Unsupervised language model adaptation based on topic and role information in multiparty meetings. In Proc. Interspeech'08, pages 833-836, Brisbane, Australia, September 2008. [ bib | .pdf | Abstract ]

Junichi Yamagishi, Zhenhua Ling, and Simon King. Robustness of hmm-based speech synthesis. In Proc. Interspeech 2008, pages 581-584, Brisbane, Australia, September 2008. [ bib | .pdf | Abstract ]

C. Qin, M. Carreira-Perpiñán, K. Richmond, A. Wrench, and S. Renals. Predicting tongue shapes from a few landmark locations. In Proc. Interspeech, pages 2306-2309, Brisbane, Australia, September 2008. [ bib | .PDF | Abstract ]

J. Cabral, S. Renals, K. Richmond, and J. Yamagishi. Glottal spectral separation for parametric speech synthesis. In Proc. Interspeech, pages 1829-1832, Brisbane, Australia, September 2008. [ bib | .PDF | Abstract ]

Dong Wang, Ivan Himawan, Joe Frankel, and Simon King. A posterior approach for microphone array based speech recognition. In Proc. Interspeech, pages 996-999, September 2008. [ bib | .pdf | Abstract ]

Maria Wolters, Pauline Campbell, Christine DePlacido, Amy Liddell, and David Owens. Adapting Speech Synthesis Systems to Users with Age-Related Hearing Loss. In Beiträge der 8. ITG Fachtagung Sprachkommunikation, September 2008. [ bib | .pdf | Abstract ]

Joe Frankel, Dong Wang, and Simon King. Growing bottleneck features for tandem ASR. In Proc. Interspeech, page 1549, September 2008. [ bib | .pdf | Abstract ]

Gregor Hofer, Junichi Yamagishi, and Hiroshi Shimodaira. Speech-driven lip motion generation with a trajectory HMM. In Proc. Interspeech 2008, pages 2314-2317, Brisbane, Australia, September 2008. [ bib | .pdf | Abstract ]

Simon King, Keiichi Tokuda, Heiga Zen, and Junichi Yamagishi. Unsupervised adaptation for hmm-based speech synthesis. In Proc. Interspeech, pages 1869-1872, Brisbane, Australia, September 2008. [ bib | .PDF | Abstract ]

Laszlo Toth, Joe Frankel, Gabor Gosztolya, and Simon King. Cross-lingual portability of mlp-based tandem features - a case study for english and hungarian. In Proc. Interspeech, pages 2695-2698, Brisbane, Australia, September 2008. [ bib | .PDF | Abstract ]

Vasilis Karaiskos, Simon King, Robert A. J. Clark, and Catherine Mayo. The blizzard challenge 2008. In Proc. Blizzard Challenge Workshop, Brisbane, Australia, September 2008. [ bib | .pdf | Abstract ]

Junichi Yamagishi, Heiga Zen, Yi-Jian Wu, Tomoki Toda, and Keiichi Tokuda. The HTS-2008 system: Yet another evaluation of the speaker-adaptive HMM-based speech synthesis system in the 2008 Blizzard Challenge. In Proc. Blizzard Challenge 2008, Brisbane, Australia, September 2008. [ bib | .pdf | Abstract ]

Peter Bell and Simon King. Covariance updates for discriminative training by constrained line search. In Proc. Interspeech, Brisbane, Australia, September 2008. [ bib | .pdf | Abstract ]

Tanja Kocjancic. Tongue movement and syllable onset complexity: ultrasound study. In Proc. ISCA Experimental Linguistics ExLing 2008, August 2008. [ bib | .pdf | Abstract ]

Songfang Huang and Steve Renals. Using participant role in multiparty meetings as prior knowledge for nonparametric topic modeling. In Proc. ICML/UAI/COLT Workshop on Prior Knowledge for Text and Language Processing, pages 21-24, Helsinki, Finland, July 2008. [ bib | .pdf | Abstract ]

Junichi Yamagishi, Hisashi Kawai, and Takao Kobayashi. Phone duration modeling using gradient tree boosting. Speech Communication, 50(5):405-415, May 2008. [ bib | DOI | Abstract ]

Olga Goubanova and Simon King. Bayesian networks for phone duration prediction. Speech Communication, 50(4):301-311, April 2008. [ bib | DOI | Abstract ]

Junichi Yamagishi, Takashi Nose, Heiga Zen, Tomoki Toda, and Keiichi Tokuda. Performance evaluation of the speaker-independent HMM-based speech synthesis system "HTS-2007" for the Blizzard Challenge 2007. In Proc. ICASSP 2008, pages 3957-3960, Las Vegas, U.S.A, April 2008. [ bib | DOI | Abstract ]

Dong Wang, Joe Frankel, Javier Tejedor, and Simon King. A comparison of phone and grapheme-based spoken term detection. In Proc. ICASSP, pages 4969-4972, March 2008. [ bib | DOI | Abstract ]

Junichi Yamagishi, Takao Kobayashi, Yuji Nakano, Katsumi Ogata, and Juri Isogai. Analysis of speaker adaptation algorihms for HMM-based speech synthesis and a constrained SMAPLR adaptation algorithm. IEEE Transactions on Audio, Speech and Language Processing, 2008. In print. [ bib | Abstract ]

Steve Renals, Thomas Hain, and Hervé Bourlard. Interpretation of multiparty meetings: The AMI and AMIDA projects. In IEEE Workshop on Hands-Free Speech Communication and Microphone Arrays, 2008. HSCMA 2008, pages 115-118, 2008. [ bib | DOI | http | .pdf | Abstract ]

Ravichander Vipperla, Steve Renals, and Joe Frankel. Longitudinal study of ASR performance on ageing voices. In Proc. Interspeech, Brisbane, 2008. [ bib | .pdf | Abstract ]

Le Zhang and Steve Renals. Acoustic-articulatory modelling with the trajectory HMM. IEEE Signal Processing Letters, 15:245-248, 2008. [ bib | .pdf | Abstract ]

Florian Gödde, Sebastian Möller, Klaus-Peter Engelbrecht, Christine Kühnel, Robert Schleicher, Anja Naumann, and Maria Wolters. Study of a speech-based smart home system with older users. In International Workshop on Intelligent User Interfaces for Ambient Assisted Living, pages 17-22, 2008. [ bib ]

Gabriel Murray, Thomas Kleinbauer, Peter Poller, Steve Renals, and Jonathan Kilgour. Extrinsic summarization evaluation: A decision audit task. In Machine Learning for Multimodal Interaction (Proc. MLMI '08), number 5237 in Lecture Notes in Computer Science, pages 349-361. Springer, 2008. [ bib | DOI | .pdf | Abstract ]

F. Gibbon and C. Mayo. Adults' perception of conflicting acoustic cues associated with epg-defined undifferentiated gestures. In 4th International EPG Symposium, Edinburgh, UK., 2008. [ bib ]

Matthew P. Aylett and Simon King. Single speaker segmentation and inventory selection using dynamic time warping self organization and joint multigram mapping. In SSW06, pages 258-263, 2008. [ bib | .pdf | Abstract ]

Gabriel Murray and Steve Renals. Detecting action items in meetings. In Machine Learning for Multimodal Interaction (Proc. MLMI '08), number 5237 in Lecture Notes in Computer Science, pages 208-213. Springer, 2008. [ bib | DOI | http | .pdf | Abstract ]

Kallirroi Georgila, Maria Wolters, Vasilis Karaiskos, Melissa Kronenthal, Robert Logie, Neil Mayo, Johanna Moore, and Matt Watson. A fully annotated corpus for studying the effect of cognitive ageing on users' interactions with spoken dialogue systems. In Proceedings of the 6th International Conference on Language Resources and Evaluation, 2008. [ bib ]

Giulia Garau and Steve Renals. Combining spectral representations for large vocabulary continuous speech recognition. IEEE Transactions on Audio, Speech and Language Processing, 16(3):508-518, 2008. [ bib | DOI | http | .pdf | Abstract ]

J. Sebastian Andersson, Leonardo Badino, Oliver S. Watts, and Matthew P.Aylett. The CSTR/Cereproc Blizzard entry 2008: The inconvenient data. In Proc. Blizzard Challenge Workshop (in Proc. Interspeech 2008), Brisbane, Australia, 2008. [ bib | .pdf | Abstract ]

Volker Strom and Simon King. Investigating Festival's target cost function using perceptual experiments. In Proc. Interspeech, Brisbane, 2008. [ bib | .ps | .pdf | Abstract ]

Heidi Christensen, Yoshihiko Gotoh, and Steve Renals. A cascaded broadcast news highlighter. IEEE Transactions on Audio, Speech and Language Processing, 16:151-161, 2008. [ bib | DOI | http | .pdf | Abstract ]

Songfang Huang and Steve Renals. Modeling topic and role information in meetings using the hierarchical Dirichlet process. In A. Popescu-Belis and R. Stiefelhagen, editors, Machine Learning for Multimodal Interaction V, volume 5237 of Lecture Notes in Computer Science, pages 214-225. Springer, 2008. [ bib | .pdf | Abstract ]

Leonardo Badino, Robert A.J. Clark, and Volker Strom. Including pitch accent optionality in unit selection text-to-speech synthesis. In Proc. Interspeech, Brisbane, 2008. [ bib | .ps | .pdf | Abstract ]

Gabriel Murray and Steve Renals. Meta comments for summarizing meeting speech. In Machine Learning for Multimodal Interaction (Proc. MLMI '08), number 5237 in Lecture Notes in Computer Science, pages 236-247. Springer, 2008. [ bib | DOI | http | .pdf | Abstract ]

Sebastian Möller, Florian Gödde, and Maria Wolters. A corpus analysis of spoken smart-home interactions with older users. In Proceedings of the 6th International Conference on Language Resources and Evaluation, 2008. [ bib ]

Giulia Garau and Steve Renals. Pitch adaptive features for LVCSR. In Proc. Interspeech '08, 2008. [ bib | .pdf | Abstract ]

Maggie Morgan, Marilyn R. McGee-Lennon, Nick Hine, John Arnott, Chris Martin, Julia S. Clark, and Maria Wolters. Requirements gathering with diverse user groups and stakeholders. In Proc. 26th Conference on Computer-Human Interaction, Florence, 2008. [ bib ]

Leonardo Badino and Robert A.J. Clark. Automatic labeling of contrastive word pairs from spontaneous spoken english. In in 2008 IEEE/ACL Workshop on Spoken Language Technology, Goa, India, 2008. [ bib | .pdf | Abstract ]

Alfred Dielmann and Steve Renals. Recognition of dialogue acts in multiparty meetings using a switching DBN. IEEE Transactions on Audio, Speech and Language Processing, 16(7):1303-1314, 2008. [ bib | DOI | http | .pdf | Abstract ]

Herve Bourlard and Steve Renals. Recognition and understanding of meetings: Overview of the European AMI and AMIDA projects. In Proc. LangTech 2008, 2008. [ bib | .pdf | Abstract ]

Laurent Besacier, Atef Ben Youssef, and Hervé Blanchon. The lig arabic/english speech translation system at iwslt08. In International Workshop on Spoken Language Translation (IWSLT) 2008, pages 58-62, Hawaii, USA, 2008. [ bib | .pdf | Abstract ]

2007

J. Frankel and S. King. Factoring Gaussian precision matrices for linear dynamic models. Pattern Recognition Letters, 28(16):2264-2272, December 2007. [ bib | DOI | .pdf | Abstract ]

Ö. Çetin, M. Magimai-Doss, A. Kantor, S. King, C. Bartels, J. Frankel, and K. Livescu. Monolingual and crosslingual comparison of tandem features derived from articulatory and phone MLPs. In Proc. ASRU, Kyoto, December 2007. IEEE. [ bib | .pdf | Abstract ]

Songfang Huang and Steve Renals. Hierarchical Pitman-Yor language models for ASR in meetings. In Proc. IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU'07), pages 124-129, Kyoto, Japan, December 2007. [ bib | .pdf | Abstract ]

K. Richmond. Trajectory mixture density networks with multiple mixtures for acoustic-articulatory inversion. In M. Chetouani, A. Hussain, B. Gas, M. Milgram, and J.-L. Zarader, editors, Advances in Nonlinear Speech Processing, International Conference on Non-Linear Speech Processing, NOLISP 2007, volume 4885 of Lecture Notes in Computer Science, pages 263-272. Springer-Verlag Berlin Heidelberg, December 2007. [ bib | DOI | .pdf | Abstract ]

J. Frankel, M. Wester, and S. King. Articulatory feature recognition using dynamic Bayesian networks. Computer Speech & Language, 21(4):620-640, October 2007. [ bib | .pdf | Abstract ]

Takashi Nose, Junichi Yamagishi, and Takao Kobayashi. A style control technique for HMM-based expressive speech synthesis. IEICE Trans. Information and Systems, E90-D(9):1406-1413, September 2007. [ bib | http | Abstract ]

J. Frankel, M. Magimai-Doss, S. King, K. Livescu, and Ö. Çetin. Articulatory feature classifiers trained on 2000 hours of telephone speech. In Proc. Interspeech, Antwerp, Belgium, August 2007. [ bib | .pdf | Abstract ]

Maria Wolters, Pauline Campbell, Christine DePlacido, Amy Liddell, and David Owens. The effect of hearing loss on the intelligibility of synthetic speech. In Proc. Intl. Conf. Phon. Sci., August 2007. [ bib | .pdf | Abstract ]

Junichi Yamagishi, Takao Kobayashi, Steve Renals, Simon King, Heiga Zen, Tomoki Toda, and Keiichi Tokuda. Improved average-voice-based speech synthesis using gender-mixed modeling and a parameter generation algorithm considering GV. In Proc. 6th ISCA Workshop on Speech Synthesis (SSW-6), August 2007. [ bib | .pdf | Abstract ]

Maria Wolters, Pauline Campbell, Christine DePlacido, Amy Liddell, and David Owens. Making synthetic speech accessible to older people. In Proc. Sixth ISCA Workshop on Speech Synthesis, Bonn, Germany, August 2007. [ bib | .pdf | Abstract ]

Toshio Hirai, Junichi Yamagishi, and Seiichi Tenpaku. Utilization of an HMM-based feature generation module in 5 ms segment concatenative speech synthesis. In Proc. 6th ISCA Workshop on Speech Synthesis (SSW-6), August 2007. [ bib | Abstract ]

Robert A. J. Clark, Monika Podsiadlo, Mark Fraser, Catherine Mayo, and Simon King. Statistical analysis of the Blizzard Challenge 2007 listening test results. In Proc. Blizzard 2007 (in Proc. Sixth ISCA Workshop on Speech Synthesis), Bonn, Germany, August 2007. [ bib | .pdf | Abstract ]

Maria Wolters, Pauline Campbell, Christine DePlacido, Amy Liddell, and David Owens. The role of outer hair cell function in the perception of synthetic versus natural speech. In Proc. Interspeech, August 2007. [ bib | .pdf | Abstract ]

Mark Fraser and Simon King. The Blizzard Challenge 2007. In Proc. Blizzard 2007 (in Proc. Sixth ISCA Workshop on Speech Synthesis), Bonn, Germany, August 2007. [ bib | .pdf | Abstract ]

Heiga Zen, Takashi Nose, Junichi Yamagishi, Shinji Sako, Takashi Masuko, Alan Black, and Keiichi Tokuda. The HMM-based speech synthesis system (HTS) version 2.0. In Proc. 6th ISCA Workshop on Speech Synthesis (SSW-6), August 2007. [ bib | Abstract ]

Makoto Tachibana, Keigo Kawashima, Junichi Yamagishi, and Takao Kobayashi. Performance evaluation of HMM-based style classification with a small amount of training data. In Proc. Interspeech 2007, August 2007. [ bib | Abstract ]

Volker Strom, Ani Nenkova, Robert Clark, Yolanda Vazquez-Alvarez, Jason Brenier, Simon King, and Dan Jurafsky. Modelling prominence and emphasis improves unit-selection synthesis. In Proc. Interspeech 2007, Antwerp, Belgium, August 2007. [ bib | .pdf | Abstract ]

Gregor Hofer and Hiroshi Shimodaira. Automatic head motion prediction from speech data. In Proc. Interspeech 2007, Antwerp, Belgium, August 2007. [ bib | .pdf | Abstract ]

Heriberto Cuayáhuitl, Steve Renals, Oliver Lemon, and Hiroshi Shimodaira. Hierarchical dialogue optimization using semi-markov decision processes. In Proc. Interspeech, August 2007. [ bib | .pdf | Abstract ]

K. Richmond. A multitask learning perspective on acoustic-articulatory inversion. In Proc. Interspeech, Antwerp, Belgium, August 2007. [ bib | .pdf | Abstract ]

Peter Bell and Simon King. Sparse gaussian graphical models for speech recognition. In Proc. Interspeech 2007, Antwerp, Belgium, August 2007. [ bib | .pdf | Abstract ]

Junichi Yamagishi, Heiga Zen, Tomoki Toda, and Keiichi Tokuda. Speaker-independent HMM-based speech synthesis system - HTS-2007 system for the Blizzard Challenge 2007. In Proc. Blizzard Challenge 2007, August 2007. [ bib | .pdf | Abstract ]

K. Richmond, V. Strom, R. Clark, J. Yamagishi, and S. Fitt. Festival multisyn voices for the 2007 blizzard challenge. In Proc. Blizzard Challenge Workshop (in Proc. SSW6), Bonn, Germany, August 2007. [ bib | .pdf | Abstract ]

David Owens, Pauline Campbell, Amy Liddell, Christine DePlacido, and Maria Wolters. Random gap detection threshold: A useful measure of auditory ageing? In Proc. Europ. Cong. Fed. Audiol. Heidelberg, Germany, June 2007. [ bib | .pdf | Abstract ]

Amy Liddell, David Owens, Pauline Campbell, Christine DePlacido, and Maria Wolters. Can extended high frequency hearing thresholds be used to detect auditory processing difficulties in an ageing population? In Proc. Europ. Cong. Fed. Audiol. Heidelberg, Germany, June 2007. [ bib | Abstract ]

Marilyn McGee-Lennon, Maria Wolters, and Tony McBryan. Auditory reminders in the home. In Proc. Intl. Conf. Auditory Display (ICAD), Montreal, Canada, June 2007. [ bib | Abstract ]

Ö. Çetin, A. Kantor, S. King, C. Bartels, M. Magimai-Doss, J. Frankel, and K. Livescu. An articulatory feature-based tandem approach and factored observation modeling. In Proc. ICASSP, Honolulu, April 2007. [ bib | .pdf | Abstract ]

K. Livescu, Ö. Çetin, M. Hasegawa-Johnson, S. King, C. Bartels, N. Borges, A. Kantor, P. Lal, L. Yung, S. Bezman, Dawson-Haggerty, B. Woods, J. Frankel, M. Magimai-Doss, and K. Saenko. Articulatory feature-based methods for acoustic and audio-visual speech recognition: Summary from the 2006 JHU Summer Workshop. In Proc. ICASSP, Honolulu, April 2007. [ bib | .pdf | Abstract ]

A. Dielmann and S. Renals. DBN based joint dialogue act recognition of multiparty meetings. In Proc. IEEE ICASSP, volume 4, pages 133-136, April 2007. [ bib | .pdf | Abstract ]

K. Livescu, A. Bezman, N. Borges, L. Yung, Ö. Çetin, J. Frankel, S. King, M. Magimai-Doss, X. Chi, and L. Lavoie. Manual transcription of conversational speech at the articulatory feature level. In Proc. ICASSP, Honolulu, April 2007. [ bib | .pdf | Abstract ]

Junichi Yamagishi and Takao Kobayashi. Average-voice-based speech synthesis using hsmm-based speaker adaptation and adaptive training. IEICE Trans. Information and Systems, E90-D(2):533-543, February 2007. [ bib | Abstract ]

S. King, J. Frankel, K. Livescu, E. McDermott, K. Richmond, and M. Wester. Speech production knowledge in automatic speech recognition. Journal of the Acoustical Society of America, 121(2):723-742, February 2007. [ bib | .pdf | Abstract ]

Leonardo Badino and Robert A.J. Clark. Issues of optionality in pitch accent placement. In Proc. 6th ISCA Speech Synthesis Workshop, Bonn, Germany, 2007. [ bib | .pdf | Abstract ]

David Beaver, Brady Zack Clark, Edward Flemming, T. Florian Jaeger, and Maria Wolters. When semantics meets phonetics: Acoustical studies of second occurrence focus. Language, 83(2):245-276, 2007. [ bib | .pdf ]

A. Dielmann and S. Renals. Automatic dialogue act recognition using a dynamic Bayesian network. In S. Renals, S. Bengio, and J. Fiscus, editors, Proc. Multimodal Interaction and Related Machine Learning Algorithms Workshop (MLMI-06), pages 178-189. Springer, 2007. [ bib | .pdf | Abstract ]

Songfang Huang and Steve Renals. Modeling prosodic features in language models for meetings. In A. Popescu-Belis, S. Renals, and H. Bourlard, editors, Machine Learning for Multimodal Interaction IV, volume 4892 of Lecture Notes in Computer Science, pages 191-202. Springer, 2007. [ bib | .pdf | Abstract ]

J. Yamagishi, T. Kobayashi, M. Tachibana, K. Ogata, and Y. Nakano. Model adaptation approach to speech synthesis with diverse voices and styles. In Proc. ICASSP, pages 1233-1236, 2007. [ bib | Abstract ]

Alejandro Jaimes, Hervé Bourlard, Steve Renals, and Jean Carletta. Recording, indexing, summarizing, and accessing meeting videos: An overview of the AMI project. In Proc IEEE ICIAPW, pages 59-64, 2007. [ bib | DOI | http | .pdf | Abstract ]

Steve Renals, Thomas Hain, and Hervé Bourlard. Recognition and interpretation of meetings: The AMI and AMIDA projects. In Proc. IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU '07), 2007. [ bib | .pdf | Abstract ]

Gregor Hofer, Hiroshi Shimodaira, and Junichi Yamagishi. Speech-driven head motion synthesis based on a trajectory model. Poster at Siggraph 2007, 2007. [ bib | .pdf ]

Gabriel Murray and Steve Renals. Towards online speech summarization. In Proc. Interspeech '07, 2007. [ bib | .PDF | Abstract ]

Ani Nenkova, Jason Brenier, Anubha Kothari, Sasha Calhoun, Laura Whitton, David Beaver, and Dan Jurafsky. To memorize or to predict: Prominence labeling in conversational speech. In NAACL Human Language Technology Conference, Rochester, NY, 2007. [ bib | .pdf | Abstract ]

Matthew P. Aylett, J. Sebastian Andersson, Leonardo Badino, and Christopher J. Pidcock. The Cerevoice Blizzard entry 2007: Are small database errors worse than compression artifacts? In Proc. Blizzard Challenge Workshop 2007, Bonn, Germany, 2007. [ bib | .pdf | Abstract ]

J. Frankel and S. King. Speech recognition using linear dynamic models. IEEE Transactions on Speech and Audio Processing, 15(1):246-256, January 2007. [ bib | .ps | .pdf | Abstract ]

Gabriel Murray and Steve Renals. Term-weighting for summarization of multi-party spoken dialogues. In A. Popescu-Belis, S. Renals, and H. Bourlard, editors, Machine Learning for Multimodal Interaction IV, volume 4892 of Lecture Notes in Computer Science, pages 155-166. Springer, 2007. [ bib | .pdf | Abstract ]

Gregor Hofer, Hiroshi Shimodaira, and Junichi Yamagishi. Lip motion synthesis using a context dependent trajectory hidden Markov model. Poster at SCA 2007, 2007. [ bib | .pdf ]

J. Cabral, S. Renals, K. Richmond, and J. Yamagishi. Towards an improved modeling of the glottal source in statistical parametric speech synthesis. In Proc.of the 6th ISCA Workshop on Speech Synthesis, Bonn, Germany, 2007. [ bib | .pdf | Abstract ]

Sasha Calhoun. Predicting focus through prominence structure. In Proc. Interspeech, Antwerp, Belgium, 2007. [ bib | .pdf | Abstract ]

Robert A. J. Clark, Korin Richmond, and Simon King. Multisyn: Open-domain unit selection for the Festival speech synthesis system. Speech Communication, 49(4):317-330, 2007. [ bib | DOI | .pdf | Abstract ]

Alfred Dielmann and Steve Renals. Automatic meeting segmentation using dynamic Bayesian networks. IEEE Transactions on Multimedia, 9(1):25-36, 2007. [ bib | DOI | http | .pdf | Abstract ]

T. Hain, L. Burget, J. Dines, G. Garau, M. Karafiat, M. Lincoln, J. Vepa, and V. Wan. The AMI System for the Transcription of Speech in Meetings. In Proc. ICASSP, 2007. [ bib | .pdf | Abstract ]

Heike Penner, Nicholas Miller, and Maria Wolters. Motor speech disorders in three Parkinsonian syndromes: A comparative study. In Proc. Intl. Conf. Phon. Sci,, 2007. [ bib | Abstract ]

2006

Hisashi Kawai, Tomoki Toda, Junichi Yamagishi, Toshio Hirai, Jinfu Ni, Nobuyuki Nishizawa, Minoru Tsuzaki, and Keiichi Tokuda. Ximera: a concatenative speech synthesis system with large scale corpora. IEICE Trans. Information and Systems, J89-D-II(12):2688-2698, December 2006. [ bib ]

Heriberto Cuayáhuitl, Steve Renals, Oliver Lemon, and Hiroshi Shimodaira. Reinforcement learning of dialogue strategies with hierarchical abstract machines. In Proc. IEEE/ACL Workshop on Spoken Language Technology (SLT), December 2006. [ bib | .pdf | Abstract ]

Chie Shimodaira, Hiroshi Shimodaira, and Susumu Kunifuji. A Divergent-Style Learning Support Tool for English Learners Using a Thesaurus Diagram. In Proc. KES2006, Bournemouth, United Kingdom, October 2006. [ bib | .pdf | Abstract ]

Junko Tokuno, Mitsuru Nakai, Hiroshi Shimodaira, Shigeki Sagayama, and Masaki Nakagawa. On-line Handwritten Character Recognition Selectively employing Hierarchical Spatial Relationships among Subpatterns. In Proc. IWFHR-10, La Baule, France, October 2006. [ bib | Abstract ]

Heriberto Cuayáhuitl, Steve Renals, Oliver Lemon, and Hiroshi Shimodaira. Learning multi-goal dialogue strategies using reinforcement learning with reduced state-action spaces. In Proc. Interspeech, September 2006. [ bib | .pdf | Abstract ]

Sue Fitt and Korin Richmond. Redundancy and productivity in the speech technology lexicon - can we do better? In Proc. Interspeech 2006, September 2006. [ bib | .pdf | Abstract ]

Le Zhang and Steve Renals. Phone recognition analysis for trajectory HMM. In Proc. Interspeech 2006, Pittsburgh, USA, September 2006. [ bib | .pdf | Abstract ]

Jithendra Vepa and Simon King. Subjective evaluation of join cost and smoothing methods for unit selection speech synthesis. IEEE Transactions on Speech and Audio Processing, 14(5):1763-1771, September 2006. [ bib | .pdf | Abstract ]

J. Frankel and S. King. Observation process adaptation for linear dynamic models. Speech Communication, 48(9):1192-1199, September 2006. [ bib | .ps | .pdf | Abstract ]

R. Clark, K. Richmond, V. Strom, and S. King. Multisyn voices for the Blizzard Challenge 2006. In Proc. Blizzard Challenge Workshop (Interspeech Satellite), Pittsburgh, USA, September 2006. (http://festvox.org/blizzard/blizzard2006.html). [ bib | .pdf | Abstract ]

Partha Lal. A comparison of singing evaluation algorithms. In Proc. Interspeech 2006, September 2006. [ bib | .pdf | Abstract ]

Robert A. J. Clark and Simon King. Joint prosodic and segmental unit selection speech synthesis. In Proc. Interspeech 2006, Pittsburgh, USA, September 2006. [ bib | .ps | .pdf | Abstract ]

K. Richmond. A trajectory mixture density network for the acoustic-articulatory inversion mapping. In Proc. Interspeech, Pittsburgh, USA, September 2006. [ bib | .pdf | Abstract ]

G. Murray and S. Renals. Dialogue act compression via pitch contour preservation. In Proceedings of the 9th International Conference on Spoken Language Processing, Pittsburgh, USA, September 2006. [ bib | .pdf | Abstract ]

G. Murray, S. Renals, J. Moore, and J. Carletta. Incorporating speaker and discourse features into speech summarization. In Proceedings of the Human Language Technology Conference - North American Chapter of the Association for Computational Linguistics Meeting (HLT-NAACL) 2006, New York City, USA, June 2006. [ bib | .pdf | Abstract ]

B. Hachey, G. Murray, and D. Reitter. Dimensionality reduction aids term co-occurrence based multi-document summarization. In Proceedings of ACL Summarization Workshop 2006, Sydney, Australia, June 2006. [ bib | .pdf | Abstract ]

G. Murray, S. Renals, and M. Taboada. Prosodic correlates of rhetorical relations. In Proceedings of HLT/NAACL ACTS Workshop, 2006, New York City, USA, June 2006. [ bib | .pdf | Abstract ]

A. Janin, A. Stolcke, X. Anguera, K. Boakye, Ö. Çetin, J. Frankel, and J. Zheng. The ICSI-SRI spring 2006 meeting recognition system. In Proc. MLMI, Washington DC., May 2006. [ bib | .ps | .pdf | Abstract ]

Peter Bell, Tina Burrows, and Paul Taylor. Adaptation of prosodic phrasing models. In Proc. Speech Prosody 2006, Dresden, Germany, May 2006. [ bib | .pdf | Abstract ]

M. Al-Hames, A. Dielmann, D. Gatica-Perez, S. Reiter, S. Renals, G. Rigoll, and D. Zhang. Multimodal integration for meeting group action segmentation and recognition. In S. Renals and S. Bengio, editors, Proc. Multimodal Interaction and Related Machine Learning Algorithms Workshop (MLMI-05), pages 52-63. Springer, 2006. [ bib | Abstract ]

Steve Renals, Samy Bengio, and Jonathan Fiscus, editors. Machine learning for multimodal interaction (Proceedings of MLMI '06), volume 4299 of Lecture Notes in Computer Science. Springer-Verlag, 2006. [ bib ]

T. Hain, L. Burget, L. Burget, J. dines, G. Garau, M. Karafiat, M. Lincoln, J. Vepa, and V. Wan. The AMI meeting transcription system: Progress and performance. In Proceedings of the Rich Transcription 2006 Spring Meeting Recognition Evaluation, 2006. [ bib | .pdf | Abstract ]

Simon King. Handling variation in speech and language processing. In Keith Brown, editor, Encyclopedia of Language and Linguistics. Elsevier, 2nd edition, 2006. [ bib ]

Sasha Calhoun. Information Structure and the Prosodic Structure of English: a Probabilistic Relationship. PhD thesis, University of Edinburgh, 2006. [ bib | Abstract ]

Simon King. Language variation in speech technologies. In Keith Brown, editor, Encyclopedia of Language and Linguistics. Elsevier, 2nd edition, 2006. [ bib ]

P. Hsueh, J. Moore, and S. Renals. Automatic segmentation of multiparty dialogue. In Proc. EACL06, 2006. [ bib | .pdf | Abstract ]

Volker Strom, Robert Clark, and Simon King. Expressive prosody for unit-selection speech synthesis. In Proc. Interspeech, Pittsburgh, 2006. [ bib | .ps | .pdf | Abstract ]

Marc Al-Hames, Thomas Hain, Jan Cernocky, Sascha Schreiber, Mannes Poel, Ronald Mueller, Sebastien Marcel, David van Leeuwen, Jean-Marc Odobez, Sileye Ba, Hervé Bourlard, Fabien Cardinaux, Daniel Gatica-Perez, Adam Janin, Petr Motlicek, Stephan Reiter, Steve Renals, Jeroen van Rest, Rutger Rienks, Gerhard Rigoll, Kevin Smith, Andrew Thean, and Pavel Zemcik. Audio-video processing in meetings: Seven questions and current AMI answers. In S. Renals, S. Bengio, and J. G. Fiscus, editors, Machine Learning for Multimodal Interaction (Proc. MLMI '06), volume 4299 of Lecture Notes in Computer Science, pages 24-35. Springer, 2006. [ bib ]

Steve Renals and Samy Bengio, editors. Machine learning for multimodal interaction (Proceedings of MLMI '05), volume 3869 of Lecture Notes in Computer Science. Springer-Verlag, 2006. [ bib ]

2005

Alexander Gutkin. Towards Formal Structural Representation of Spoken Language: An Evolving Transformation System (ETS) Approach. PhD thesis, School of Informatics, University of Edinburgh, UK, December 2005. Internal version. [ bib | .pdf ]

Heriberto Cuayáhuitl, Steve Renals, Oliver Lemon, and Hiroshi Shimodaira. Human-computer dialogue simulation using hidden markov models. In Proc. IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU), November 2005. [ bib | .pdf | Abstract ]

B. Hachey, G. Murray, and D. Reitter. The Embra system at DUC 2005: Query-oriented multi-document summarization with a very large latent semantic space. In Proceedings of the Document Understanding Conference (DUC) 2005, Vancouver, BC, Canada, October 2005. [ bib | .pdf | Abstract ]

G. Garau, S. Renals, and T. Hain. Applying vocal tract length normalization to meeting recordings. In Proc. Interspeech, September 2005. [ bib | .pdf | Abstract ]

Robert A.J. Clark, Korin Richmond, and Simon King. Multisyn voices from ARCTIC data for the Blizzard challenge. In Proc. Interspeech 2005, September 2005. [ bib | .pdf | Abstract ]

G. Murray, S. Renals, and J. Carletta. Extractive summarization of meeting recordings. In Proc. Interspeech, September 2005. [ bib | .pdf | Abstract ]

C. Mayo, R. A. J. Clark, and S. King. Multidimensional scaling of listener responses to synthetic speech. In Proc. Interspeech 2005, Lisbon, Portugal, September 2005. [ bib | .pdf ]

J. Frankel and S. King. A hybrid ANN/DBN approach to articulatory feature recognition. In Proc. Eurospeech, Lisbon, September 2005. [ bib | .ps | .pdf | Abstract ]

G. Hofer, K. Richmond, and R. Clark. Informed blending of databases for emotional speech synthesis. In Proc. Interspeech, September 2005. [ bib | .ps | .pdf | Abstract ]

Mitsuru Nakai, Shigeki Sagayama, and Hiroshi Shimodaira. On-line Handwriting Recognition Based on Sub-stroke HMM. Trans. IEICE D-II, J88-D2(8), August 2005. (in press) (in Japanese). [ bib | Abstract ]

Junko Tokuno, Nobuhito Inami, Mitsuru Nakai, Hiroshi Shimodaira, and Shigeki Sagayama. Context-dependent Sub-stroke Model for HMM-based On-line Handwriting Recognition. Trans. IEICE D-II, J88-D2(8), August 2005. (in press), (in Japanese). [ bib | Abstract ]

Alexander Gutkin and David R. Gay. Structural representation and matching of articulatory speech structures based on the evolving transformation system (ETS) formalism. In Proc. Nineteenth International Joint Conference on Artificial Intelligence (IJCAI-05), Edinburgh, UK, August 2005. [ bib | .pdf ]

Hiroshi Shimodaira, Keisuke Uematsu, Shin'ichi Kawamoto, Gregor Hofer, and Mitsuru Nakai. Analysis and Synthesis of Head Motion for Lifelike Conversational Agents. In Proc. MLMI2005, July 2005. [ bib | .pdf ]

Sasha Calhoun, Malvina Nissim, Mark Steedman, and Jason Brenier. A framework for annotating information structure in discourse. In Frontiers in Corpus Annotation II: Pie in the Sky, ACL2005 Conference Workshop, Ann Arbor, Michigan, June 2005. [ bib | .pdf | Abstract ]

G. Murray, S. Renals, J. Carletta, and J. Moore. Evaluating automatic summaries of meeting recordings. In Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics, Ann Arbor, MI, USA, June 2005. [ bib | .pdf | Abstract ]

Alexander Gutkin and David R. Gay. Structural Representation and Matching of Articulatory Speech Structures based on the Evolving Transformation System (ETS) Formalism. In Michael Hofbaur, Bernhard Rinner, and Franz Wotawa, editors, Proc. 19th International Workshop on Qualitative Reasoning (QR-05), pages 89-96, Graz, Austria, May 2005. [ bib | .pdf | Abstract ]

Alexander Gutkin and Simon King. Inductive String Template-Based Learning of Spoken Language. In Hugo Gamboa and Ana Fred, editors, Proc. 5th International Workshop on Pattern Recognition in Information Systems (PRIS-2005), In conjunction with the 7th International Conference on Enterprise Information Systems (ICEIS-2005), pages 43-51, Miami, USA, May 2005. INSTICC Press. [ bib | .ps.gz | .pdf | Abstract ]

Alexander Gutkin and Simon King. Detection of Symbolic Gestural Events in Articulatory Data for Use in Structural Representations of Continuous Speech. In Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP-05), volume I, pages 885-888, Philadelphia, PA, USA, March 2005. IEEE Signal Processing Society Press. [ bib | .ps.gz | .pdf | Abstract ]

Dominika Oliver and Robert A. J. Clark. Modelling pitch accent types for Polish speech synthesis. In Proc. Interspeech 2005, 2005. [ bib | .pdf ]

H. Christensen, B. Kolluru, Y. Gotoh, and S. Renals. Maximum entropy segmentation of broadcast news. In Proc. IEEE ICASSP, 2005. [ bib | .ps.gz | .pdf | Abstract ]

T. Hain, J. Dines, G. Garau, M. Karafiat, D. Moore, V. Wan, R. Ordelman, and S. Renals. Transcription of conference room meetings: an investigation. In Proc. Interspeech, 2005. [ bib | .pdf | Abstract ]

Sasha Calhoun. It's the difference that matters: An argument for contextually-grounded acoustic intonational phonology. In Linguistics Society of America Annual Meeting, Oakland, California, January 2005. [ bib | .pdf | Abstract ]

Calum Gray. Acoustic Pulse Reflectometry for Measurement of the Vocal Tract with Application in Voice Synthesis. PhD thesis, University of Edinburgh, 2005. [ bib | .pdf | Abstract ]

T. Hain, L. Burget, J. Dines, G. Garau, M. Karafiat, M. Lincoln, I. McCowan, D. Moore, V. Wan, R. Ordelman, and S. Renals. The 2005 AMI system for the transcription of speech in meetings. In Proceedings of the Rich Transcription 2005 Spring Meeting Recognition Evaluation, 2005. [ bib | .pdf | Abstract ]

C. Mayo and A. Turk. The influence of spectral distinctiveness on acoustic cue weighting in children's and adults' speech perception. Journal of the Acoustical Society of America, 118:1730-1741, 2005. [ bib | .pdf ]

Simon King, Chris Bartels, and Jeff Bilmes. Svitchboard 1: Small vocabulary tasks from switchboard 1. In Proc. Interspeech 2005, Lisbon, Portugal, 2005. [ bib | .pdf | Abstract ]

S. J. Wrigley, G. J. Brown, V. Wan, and S. Renals. Speech and crosstalk detection in multi-channel audio. IEEE Trans. on Speech and Audio Processing, 13:84-91, 2005. [ bib | .pdf | Abstract ]

Jerry Goldman, Steve Renals, Steven Bird, Franciska de Jong, Marcello Federico, Carl Fleischhauer, Mark Kornbluh, Lori Lamel, Doug Oard, Clare Stewart, and Richard Wright. Accessing the spoken word. International Journal of Digital Libraries, 5(4):287-298, 2005. [ bib | .ps.gz | .pdf | Abstract ]

Y. Hifny, S. Renals, and N. Lawrence. A hybrid MaxEnt/HMM based ASR system. In Proc. Interspeech, 2005. [ bib | .pdf | Abstract ]

A. Dielmann and S. Renals. Multistream dynamic Bayesian network for meeting segmentation. In S. Bengio and H. Bourlard, editors, Proc. Multimodal Interaction and Related Machine Learning Algorithms Workshop (MLMI-04), pages 76-86. Springer, 2005. [ bib | .ps.gz | .pdf | Abstract ]

C. Mayo and A. Turk. No available theories currently explain all adult-child cue weighting differences. In Proc. ISCA Workshop on Plasticity in Speech Perception, London, UK, 2005. [ bib | .pdf ]

V. Wan and S. Renals. Speaker verification using sequence discriminant support vector machines. IEEE Trans. on Speech and Audio Processing, 13:203-210, 2005. [ bib | .ps.gz | .pdf | Abstract ]

Yoshinori Shiga. Precise Estimation of Vocal Tract and Voice Source Characteristics. PhD thesis, The Centre for Speech Technology Research, Edinburgh University, 2005. [ bib | .ps.gz | .pdf | Abstract ]

Konstantinos Koumpis and Steve Renals. Automatic summarization of voicemail messages using lexical and prosodic features. ACM Transactions on Speech and Language Processing, 2(1):1-24, 2005. [ bib | .ps.gz | .pdf | Abstract ]

Olga Goubanova and Simon King. Predicting consonant duration with Bayesian belief networks. In Proc. Interspeech 2005, Lisbon, Portugal, 2005. [ bib | .pdf | Abstract ]

Konstantinos Koumpis and Steve Renals. Content-based access to spoken audio. IEEE Signal Processing Magazine, 22(5):61-69, 2005. [ bib | .pdf | Abstract ]

T. Hain, L. Burget, J. Dines, G. Garau, M. Karafiat, M. Lincoln, I. McCowan, D. Moore, V. Wan, R. Ordelman, and S. Renals. The development of the AMI system for the transcription of speech in meetings. In 2nd Joint Workshop on Multimodal Interaction and Related Machine Learning Algorithms, 2005. [ bib | .pdf | Abstract ]

A. Faria and D. Gelbart. Efficient pitch-based estimation of VLTN warp factors. In Proc. Eurospeech, 2005. [ bib | .pdf | Abstract ]

L. Onnis, P. Monaghan, K. Richmond, and N. Chater. Phonology impacts segmentation in speech processing. Journal of Memory and Language, 53(2):225-237, 2005. [ bib | .pdf | Abstract ]

S. Chang, M. Wester, and S. Greenberg. An elitist approach to automatic articulatory-acoustic feature classification for phonetic characterization of spoken language. Speech Communication, 47:290-311, 2005. [ bib | .pdf | Abstract ]

2004

M. Wester, J. Frankel, and S. King. Asynchronous articulatory feature recognition using dynamic Bayesian networks. In Proc. IEICI Beyond HMM Workshop, Kyoto, December 2004. [ bib | .ps | .pdf | Abstract ]

Yoshinori Shiga and Simon King. Source-filter separation for articulation-to-speech synthesis. In Proc. ICSLP, Jeju, Korea, October 2004. [ bib | .ps | .pdf | Abstract ]

Jithendra Vepa and Simon King. Subjective evaluation of join cost functions used in unit selection speech synthesis. In Proc. 8th International Conference on Spoken Language Processing (ICSLP), Jeju, Korea, October 2004. [ bib | .pdf | Abstract ]

Yoshinori Shiga and Simon King. Estimating detailed spectral envelopes using articulatory clustering. In Proc. ICSLP, Jeju, Korea, October 2004. [ bib | .ps | .pdf | Abstract ]

Alexander Gutkin and Simon King. Phone classification in pseudo-Euclidean vector spaces. In Proc. 8th International Conference on Spoken Language Processing (ICSLP), volume II, pages 1453-1457, Jeju Island, Korea, October 2004. [ bib | .ps.gz | .pdf | Abstract ]

D. Toney, D. Feinberg, and K. Richmond. Acoustic features for profiling mobile users of conversational interfaces. In S. Brewster and M. Dunlop, editors, 6th International Symposium on Mobile Human-Computer Interaction - MobileHCI 2004, pages 394-398, Glasgow, Scotland, September 2004. Springer. [ bib | Abstract ]

J. Frankel, M. Wester, and S. King. Articulatory feature recognition using dynamic Bayesian networks. In Proc. ICSLP, September 2004. [ bib | .ps | .pdf | Abstract ]

Alexander Gutkin and Simon King. Structural Representation of Speech for Phonetic Classification. In Proc. 17th International Conference on Pattern Recognition (ICPR), volume 3, pages 438-441, Cambridge, UK, August 2004. IEEE Computer Society Press. [ bib | .ps.gz | .pdf | Abstract ]

Alexander Gutkin, David Gay, Lev Goldfarb, and Mirjam Wester. On the Articulatory Representation of Speech within the Evolving Transformation System Formalism. In Lev Goldfarb, editor, Pattern Representation and the Future of Pattern Recognition (Proc. Satellite Workshop of 17th International Conference on Pattern Recognition), pages 57-76, Cambridge, UK, August 2004. [ bib | .ps.gz | .pdf | Abstract ]

J. Vepa and S. King. Subjective evaluation of join cost and smoothing methods. In Proc. 5th ISCA speech synthesis workshop, Pittsburgh, USA, June 2004. [ bib | .pdf | Abstract ]

Yoshinori Shiga and Simon King. Accurate spectral envelope estimation for articulation-to-speech synthesis. In Proc. 5th ISCA Speech Synthesis Workshop, pages 19-24, CMU, Pittsburgh, USA, June 2004. [ bib | .ps | .pdf | Abstract ]

Yoshinori Shiga. Source-filter separation based on an articulatory corpus. In One day meeting for young speech researchers (UK meeting), University College London, London, United Kingdom, April 2004. [ bib | Abstract ]

Sasha Calhoun. Phonetic dimensions of intonational categories: the case of L+H* and H*. In Prosody 2004, Nara, Japan, March 2004. poster. [ bib | .ps | .pdf | Abstract ]

Enrico Zovato, Stefano Sandri, Silvia Quazza, and Leonardo Badino. Prosodic analysis of a multi-style corpus in the perspective of emotional speech synthesis. In Proc. ICSLP 2004, Jeju, Korea, 2004. [ bib | .pdf ]

A. Wray, S.J. Cox, M. Lincoln, and J. Tryggvason. A formulaic approach to translation at the post office: Reading the signs. Language and Communication, 24(1):59-75, 2004. [ bib | .pdf | Abstract ]

H. Christensen, B. Kolluru, Y. Gotoh, and S. Renals. From text summarisation to style-specific summarisation for broadcast news. In Proc. ECIR-2004, 2004. [ bib | .ps.gz | .pdf | Abstract ]

A. Dielmann and S. Renals. Dynamic Bayesian networks for meeting structuring. In Proc. IEEE ICASSP, 2004. [ bib | .ps.gz | .pdf | Abstract ]

Jithendra Vepa and Simon King. Join cost for unit selection speech synthesis. In Abeer Alwan and Shri Narayanan, editors, Speech Synthesis. Prentice Hall, 2004. [ bib | .ps ]

C. Mayo and A. Turk. The development of perceptual cue weighting within and across monosyllabic words. In LabPhon 9, University of Illinois at Urbana-Champaign, 2004. [ bib ]

Robert A.J. Clark, Korin Richmond, and Simon King. Festival 2 - build your own general purpose unit selection speech synthesiser. In Proc. 5th ISCA workshop on speech synthesis, 2004. [ bib | .ps | .pdf | Abstract ]

Rachel Baker, Robert A.J. Clark, and Michael White. Synthesising contextually appropriate intonation in limited domains. In Proc. 5th ISCA workshop on speech synthesis, Pittsburgh, USA, 2004. [ bib | .ps | .pdf ]

Leonardo Badino. Chinese text word segmentation considering semantic links among sentences. In Proc. ICSLP 2004, Jeju, Korea, 2004. [ bib | .pdf ]

A. Dielmann and S. Renals. Multi-stream segmentation of meetings. In Proc. IEEE Workshop on Multimedia Signal Processing, 2004. [ bib | .ps.gz | .pdf | Abstract ]

Leonardo Badino, Claudia Barolo, and Silvia Quazza. Language independent phoneme mapping for foreign TTS. In Proc. 5th ISCA Speech Synthesis Workshop, Pittsburgh, USA, 2004. [ bib | .pdf ]

Y. H. Abdel-Haleem, S. Renals, and N. D. Lawrence. Acoustic space dimensionality selection and combination using the maximum entropy principle. In Proc. IEEE ICASSP, 2004. [ bib | .pdf | Abstract ]

Leonardo Badino, Claudia Barolo, and Silvia Quazza. A general approach to TTS reading of mixed-language texts. In Proc. ICSLP 2004, Jeju, Korea, 2004. [ bib | .pdf ]

C. Mayo and T. Turk. Adult-child differences in acoustic cue weighting are influenced by segmental context: Children are not always perceptually biased towards transitions. Journal of the Acoustical Society of America, 115:3184-3194, 2004. [ bib | .pdf ]

2003

Shin-ichi Kawamoto, Hiroshi Shimodaira, Shigeki Sagayama, et al. Galatea: Open-Source Software for Developing Anthropomorphic Spoken Dialog Agents. Life-Like Characters. Tools, Affective Functions, and Applications. Helmut Prendinger et al. (Eds.) Springer, pages 187-212, November 2003. [ bib | .pdf | Abstract ]

Ben Gillett and Simon King. Transforming F0 contours. In Proc. Eurospeech, Geneva, September 2003. [ bib | .pdf | Abstract ]

Yoshinori Shiga and Simon King. Estimating the spectral envelope of voiced speech using multi-frame analysis. In Proc. Eurospeech-2003, volume 3, pages 1737-1740, Geneva, Switzerland, September 2003. [ bib | .ps | .pdf | Abstract ]

James Horlock and Simon King. Named entity extraction from word lattices. In Proc. Eurospeech, Geneva, September 2003. [ bib | .pdf | Abstract ]

James Horlock and Simon King. Discriminative methods for improving named entity extraction on speech data. In Proc. Eurospeech, Geneva, September 2003. [ bib | .pdf | Abstract ]

Ben Gillett and Simon King. Transforming voice quality. In Proc. Eurospeech, Geneva, September 2003. [ bib | .pdf | Abstract ]

Yoshinori Shiga and Simon King. Estimation of voice source and vocal tract characteristics based on multi-frame analysis. In Proc. Eurospeech, volume 3, pages 1749-1752, Geneva, Switzerland, September 2003. [ bib | .ps | .pdf | Abstract ]

Hiroshi Shimodaira, Takashi Sudo, Mitsuru Nakai, and Shigeki Sagayama. On-line Overlaid-Handwriting Recognition Based on Substroke HMMs. In ICDAR'03, pages 1043-1047, August 2003. [ bib | .pdf | Abstract ]

Mitsuru Nakai, Hiroshi Shimodaira, and Shigeki Sagayama. Generation of Hierarchical Dictionary for Stroke-order Free Kanji Handwriting Recognition Based on Substroke HMM. In Proc. ICDAR2003, pages 514-518, August 2003. [ bib | .pdf | Abstract ]

Shigeki Matsuda, Mitsuru Nakai, Hiroshi Shimodaira, and Shigeki Sagayama. Speech Recognition Using Asynchronous Transition HMM. IEICE Trans. D-II, J86-D-II(6):741-754, June 2003. (in Japanese). [ bib | Abstract ]

Kanad Keeni, Kunio Goto, and Hiroshi Shimodaira. Automatic Filtering of Network IntrusionDetection System Alarms Using Multi-layer Feed-forward Neural Networks. In International Conference on Neural Information Processing (ICONIP2003), June 2003. [ bib ]

Tokuno Junko, Naoto Akira, Mitsuru Nakai, Hiroshi Shimodaira, and Shigeki Sagayama. Blind-handwriting Interface for Wearable Computing. In Proc. Human - Computer Interaction (HCI) International 2003, Volume 2, pages 303-307, June 2003. [ bib | Abstract ]

Sasha Calhoun. The nature of theme and rheme accents. In One-Day Meeting for Young Speech Researchers, University College, London, April 2003. [ bib | .ps | .pdf | Abstract ]

Kanad Keeni, Kunio Goto, and Hiroshi Shimodaira. On fast learning of Multi-layer Feed-forward Neural Networks Using Back Propagation. In International Conference on Enterprise and Information Systems (ICEIS2003), pages 266-271, April 2003. [ bib | Abstract ]

J. Frankel. Linear dynamic models for automatic speech recognition. PhD thesis, The Centre for Speech Technology Research, Edinburgh University, April 2003. [ bib | .ps | .pdf | Abstract ]

Y. Gotoh and S. Renals. Language modelling. In Renals and Grefenstette [39], pages 78-105. [ bib | Abstract ]

J. Sturm, J. M. Kessens, M. Wester, F. de Wet, E. Sanders, and H. Strik. Automatic transcription of football commentaries in the MUMIS project. In Proc. Eurospeech '03, pages -, 2003. [ bib | .pdf | Abstract ]

Tu Bao Ho, Trong Dung Nguyen, Hiroshi Shimodaira, and Masayuki Kimura. A Knowledge Discovery System with Support for Model Selection and Visualization. Applied Intelligence, 19:125-141, 2003. [ bib ]

O. Goubanova. Bayesian modelling of vowel segment duration for text-to-speech synthesis using distinctive features. In Proc. ICPhS 2003, volume 3, page 2349, Barcelona, Spain, 2003. [ bib | .ps | Abstract ]

C. Mayo and A. Turk. Is the development of cue weighting strategies in children's speech perception context-dependent? In XVth International Congress of Phonetic Sciences, Barcelona, 2003. [ bib | .pdf ]

M. Wester. Syllable classification using articulatory-acoustic features. In Proc. Eurospeech '03, pages -, Geneva, 2003. [ bib | .pdf | Abstract ]

K. Koumpis and S. Renals. Evaluation of extractive voicemail summarization. In Proc. ISCA Workshop on Multilingual Spoken Document Retrieval, pages 19-24, 2003. [ bib | .ps.gz | .pdf | Abstract ]

S. Renals and D. Ellis. Audio information access from meeting rooms. In Proc. IEEE ICASSP, volume 4, pages 744-747, 2003. [ bib | .ps.gz | .pdf | Abstract ]

M. Wester. Pronunciation modeling for ASR - knowledge-based and data-derived methods. Computer Speech and Language, 17:69-85, 2003. [ bib | .pdf | Abstract ]

B. Kolluru, H. Christensen, Y. Gotoh, and S. Renals. Exploring the style-technique interaction in extractive summarization of broadcast news. In Proc. IEEE Automatic Speech Recognition and Understanding Workshop, 2003. [ bib | .ps.gz | .pdf | Abstract ]

K. Koumpis and S. Renals. Multi-class extractive voicemail summarization. In Proc. Eurospeech, pages 2785-2788, 2003. [ bib | .pdf | Abstract ]

K. Richmond, S. King, and P. Taylor. Modelling the uncertainty in recovering articulation from acoustics. Computer Speech and Language, 17:153-172, 2003. [ bib | .pdf | Abstract ]

Christophe Van Bael and Simon King. An accent-independent lexicon for automatic speech recognition. In Proc. ICPhS, pages 1165-1168, 2003. [ bib | .pdf | Abstract ]

V. Wan and S. Renals. SVMSVM: Support vector machine speaker verification methodology. In Proc. IEEE ICASSP, volume 2, pages 221-224, 2003. [ bib | .ps.gz | .pdf | Abstract ]

H. Christensen, Y. Gotoh, B. Kolluru, and S. Renals. Are extractive text summarisation techniques portable to broadcast news? In Proc. IEEE Automatic Speech Recognition and Understanding Workshop, 2003. [ bib | .ps.gz | .pdf | Abstract ]

J. Vepa and S. King. Kalman-filter based join cost for unit-selection speech synthesis. In Proc. Eurospeech, Geneva, Switzerland, 2003. [ bib | .pdf | Abstract ]

Robert A. J. Clark. Generating Synthetic Pitch Contours Using Prosodic Structure. PhD thesis, The University of Edinburgh, 2003. [ bib | .ps.gz | .pdf ]

C. Mayo, J. Scobbie, N. Hewlett, and D. Waters. The influence of phonemic awareness development on acoustic cue weighting in children's speech perception. Journal of Speech, Language and Hearing Research, 46:1184-1196, 2003. [ bib | .pdf ]

Simon King. Dependence and independence in automatic speech recognition and synthesis. Journal of Phonetics, 31(3-4):407-411, 2003. [ bib | .pdf | Abstract ]

S. Wrigley, G. Brown, V. Wan, and S. Renals. Feature selection for the classification of crosstalk in multi-channel audio. In Proc. Eurospeech, pages 469-472, 2003. [ bib | .pdf | Abstract ]

Robert A. J. Clark. Modelling pitch accents for concept-to-speech synthesis. In Proc. XVth International Congress of Phonetic Sciences, volume 2, pages 1141-1144, 2003. [ bib | .ps | .pdf ]

S.J. Cox, M. Lincoln, M. Nakisa, M. Wells, M. Tutt, and S. Abbott. The development and evaluation of a speech to sign translation system to assist transactions. Int. Journal of Human Computer Interaction, 16(2):141-161, 2003. [ bib | .pdf | Abstract ]

M. Lincoln and S.J. Cox. A comparison of language processing techniques for a constrained speech translation system. In IEEE Conference on Acoustics, Speech and Signal Processing, Hong Kong, 2003. [ bib | .pdf | Abstract ]

S. Renals and G. Grefenstette, editors. Text and Speech Triggered Information Access. Number 2705 in Lecture Notes in Computer Science. Springer-Verlag, 2003. [ bib | http | Abstract ]

2002

Haruto Takeda, Naoki Saito, Tomoshi Otsuki, Mitsuru Nakai, Hiroshi Shimodaira, and Shigeki Sagayama. Hidden Markov Model for AUtomatic Transcription of MIDI Signals. In 2002 International Workshop on Multimedia Signal Processing, December 2002. [ bib | .pdf ]

J. Vepa, S. King, and P. Taylor. Objective distance measures for spectral discontinuities in concatenative speech synthesis. In Proc. ICSLP, Denver, USA, September 2002. [ bib | .pdf | Abstract ]

J. Vepa, S. King, and P. Taylor. New objective distance measures for spectral discontinuities in concatenative speech synthesis. In Proc. IEEE 2002 workshop on speech synthesis, Santa Monica, USA, September 2002. [ bib | .pdf | Abstract ]

Kanad Keeni and Hiroshi Shimodaira. On Selection of Training Data for Fast Learning of Neural Networks Using Back Propagation. In IASTED International Conference on Artificial Intelligence and Application (AIA2002), pages 474-478, September 2002. [ bib ]

Junko Tokuno, Nobuhito Inami, Shigeki Matsuda, Mitsuru Nakai, Hiroshi Shimodaira, and Shigeki Sagayama. Context-Dependent Substroke Model for HMM-based On-line Handwriting Recognition. In Proc. IWFHR-8, pages 78-83, August 2002. [ bib | .pdf | Abstract ]

Mitsuru Nakai, Takashi Sudo, Hiroshi Shimodaira, and Shigeki Sagayama. Pen Pressure Features for Writer-Independent On-Line Handwriting Recognition Based on Substroke HMM. In Proc. ICPR2002, III, pages 220-223, August 2002. [ bib | .pdf ]

Shin-ichi Kawamoto, Hiroshi Shimodaira, Tsuneo Nitta, Takuya Nishimoto, Satoshi Nakamura, Katsunobu Itou, Shigeo Morishima, Tatsuo Yotsukura, Atsuhiko Kai, Akinobu Lee, Yoichi Yamashita, Takao Kobayashi, Keiichi Tokuda, Keikichi Hirose, Nobuaki Minematsu, Atsushi Yamada, Yasuharu Den, Takehito Utsuro, and Shigeki Sagayama. Open-source software for developing anthropomorphic spoken dialog agent. In Proc. PRICAI-02, International Workshop on Lifelike Animated Agents, pages 64-69, August 2002. [ bib | .pdf ]

Shin-ichi Kawamoto, Hiroshi Shimodaira, et al. Design of Software Toolkit for Anthromorphic Spoken Dialog Agent Software with Customization-oriented Features. Information Processing Society of Japan (IPSJ) Journal, 43(7):2249-2263, July 2002. (in Japanese). [ bib ]

Jun Rokui, Mitsuru Nakai, Hiroshi Shimodaira, and Shigeki Sagayama. Speaker Normalization Using Linear Transformation of Vocal Tract Length Based on Maximum Likelihood Estimation. Information Processing Society of Japan (IPSJ), 43(7):2030-2037, July 2002. (in Japanese). [ bib | Abstract ]

Hiroshi Shimodaira, Nobuyoshi Sakai, Mitsuru Nakai, and Shigeki Sagayama. Jacobian Joint Adaptation to Noise, Channel and Vocal Tract Length. In Proc. ICASSP2002, pages 197-200, May 2002. [ bib | .pdf | Abstract ]

Yoshinori Matsushita, Shinnichi Kawamoto, Mitsuru Nakai, Hiroshi Shimodaira, and Shigeki Sagayama. A Head-Behavior Synchronization Model with Utterance for Anthropomorphic Spoken-Dialog Agent. In Technical Report of IEICE, HIS2001, March 2002. (in Japanese). [ bib | Abstract ]

Tomoshi Otsuki, Naoki Saitou, Mitsuru Nakai, Hiroshi Shimodaira, and Shigeki Sagayama. Musical Rhythm Recognition Using Hidden Markov Model. Information Processing Society of Japan (IPSJ) JOURNAL, 43(2), February 2002. (in Japanese). [ bib ]

H. P. Graf, E. Cosatto, V. Strom, and F. J. Huang. Visual prosody: Facial movements accompanying speech. In Proc Fifth Int. Conf. Automatic Face and Gesture Recognition, pages 397-401, 2002. [ bib | .ps | .pdf | Abstract ]

K. Richmond. Estimating Articulatory Parameters from the Acoustic Speech Signal. PhD thesis, The Centre for Speech Technology Research, Edinburgh University, 2002. [ bib | .ps | Abstract ]

Jesper Salomon, Simon King, and Miles Osborne. Framewise phone classification using support vector machines. In Proceedings International Conference on Spoken Language Processing, Denver, 2002. [ bib | .ps | .pdf | Abstract ]

M. Wester, J.M. Kessens, and H. Strik. Goal-directed ASR in a multimedia indexing and searching environment (MUMIS). In Proc. ICSLP, pages 1993-1996, Denver, 2002. [ bib | .pdf | Abstract ]

O. Goubanova. Forms of introduction in map task dialogues: Case of L2 Russian speakers. In Proc. ICSLP 2002, Denver, USA, 2002. [ bib ]

A. J. Robinson, G. D. Cook, D. P. W. Ellis, E. Fosler-Lussier, S. J. Renals, and D. A. G. Williams. Connectionist speech recognition of broadcast news. Speech Communication, 37:27-45, 2002. [ bib | .ps.gz | .pdf | Abstract ]

Helen Wright-Hastie, Massimo Poesio, and Stephen Isard. Automatically predicting dialogue structure using prosodic features. Speech Communication, 36(1-2):63-79, 2002. [ bib ]

S.J. Cox, M. Lincoln, J Tryggvason, M Nakisa, M. Wells, Mand Tutt, and S Abbott. TESSA, a system to aid communication with deaf people. In ASSETS 2002, Fifth International ACM SIGCAPH Conference on Assistive Technologies, pages 205-212, Edinburgh, Scotland, 2002. [ bib | .pdf | Abstract ]

O. Pietquin and S. Renals. ASR system modeling for automatic evaluation and optimization of dialogue systems. In Proc IEEE ICASSP, pages 46-49, 2002. [ bib | .pdf | Abstract ]

V. Wan and S. Renals. Evaluation of kernel methods for speaker verification and identification. In Proc IEEE ICASSP, pages 669-672, 2002. [ bib | .pdf | Abstract ]

Sasha Calhoun. Using prosody in ASR: the segmentation of broadcast radio news. Master's thesis, University of Edinburgh, 2002. [ bib | .pdf | Abstract ]

V. Strom. From text to speech without ToBI. In Proc. ICSLP, Denver, 2002. [ bib | .ps | .pdf | Abstract ]

Fiona Couper. Switching linear dynamical models for automatic speech recognition. Master's thesis, University of Edinburgh, 2002. [ bib | .pdf | Abstract ]

Mirjam Wester. Pronunciation Variation Modeling for Dutch Automatic Speech Recognition. PhD thesis, University of Nijmegen, 2002. [ bib | .pdf | Abstract ]

C. Mayo, A. Turk, and J. Watson. Development of cue weighting strategies in children's speech perception. In Proceedings of TIPS: Temporal Integration in the Perception of Speech, Aix-en-Provence, 2002. [ bib ]

Juergen Schroeter, Alistair Conkie, Ann Syrdal, Mark Beutnagel, Matthias Jilka, Volker Strom, Yeon-Jun Kim, Hong-Goo Kang, and David Kapilow. A perspective on the next challanges for TTS. In IEEE 2002 Workshop in Speech Synthesis, pages 11-13, Santa Monica, CA, 2002. [ bib | .ps | .pdf | Abstract ]

2001

Hiroshi Shimodaira, Ken-ichi Noma, Mitsuru Nakai, and Shigeki Sagayama. Dynamic Time-Alignment Kernel in Support Vector Machine. Advances in Neural Information Processing Systems 14, NIPS2001, 2:921-928, December 2001. [ bib | .pdf | Abstract ]

J. Frankel and S. King. ASR - articulatory speech recognition. In Proc. Eurospeech, pages 599-602, Aalborg, Denmark, September 2001. [ bib | .ps | .pdf | Abstract ]

Sue Fitt. Morphological approaches for an English pronunciation lexicon. In Proc. Eurospeech 2001, Aalborg, September 2001. [ bib | .ps | .pdf | Abstract ]

Mitsuru Nakai, Naoto Akira, Hiroshi Shimodaira, and Shigeki Sagayama. Substroke Approach to HMM-based On-line Kanji Handwriting Recognition. In Proc. ICDAR'01, pages 491-495, September 2001. [ bib | .pdf | Abstract ]

Sue Fitt. Using real words for recording diphones. In Proc. Eurospeech 2001, September 2001. [ bib | .ps | .pdf | Abstract ]

Shigeki Sagayama, Yutaka Kato, Mitsuru Nakai, and Hiroshi Shimodaira. Jacobian Approach to Joint Adaptation to Noise, Channel and Vocal Tract Length. In Proc. ISCA Workshop on Adaptation Methods (Sophia Antipolis, France), pages 117-120, August 2001. [ bib ]

Shigeki Sagayama, Koichi Shinoda, Mitsuru Nakai, and Hiroshi Shimodaira. Analytic Methods for Acoustic Model Adaptation: A Review. In Proc. ISCA Workshop on Adaptation Methods (Sophia Antipolis France), pages 67-76, August 2001. Invited Paper. [ bib ]

Kanad Keeni, Kunio Goto, and Hiroshi Shimodaira. On Extraction of E-Mail Address from Fax Message for Automatic Delivery to Individual Recipient. In IASTED International Conference on Siganl Processing Pattern Recognition and Application, July 2001. [ bib ]

Katsuhisa Fujinaga, Mitsuru Nakai, Hiroshi Shimodaira, and Shigeki Sagayama. Multiple-Regression Hidden Markov Model. In Proc. ICASSP 2001, May 2001. [ bib | .pdf ]

J. Frankel and S. King. Speech recognition in the articulatory domain: investigating an alternative to acoustic HMMs. In Proc. Workshop on Innovations in Speech Processing, April 2001. [ bib | .ps | .pdf | Abstract ]

K. Richmond. Mixture density networks, human articulatory data and acoustic-to-articulatory inversion of continuous speech. In Proc. Workshop on Innovation in Speech Processing, pages 259-276. Institute of Acoustics, April 2001. [ bib | .ps ]

K. Koumpis, S. Renals, and M. Niranjan. Extractive summarization of voicemail using lexical and prosodic feature subset selection. In Proc. Eurospeech, pages 2377-2380, Aalborg, Denmark, 2001. [ bib | .ps.gz | .pdf | Abstract ]

O. Goubanova. Predicting segmental durations using Bayesian Belief networks. In CD-ROM Proc. 4th ISCA Tutorial and Research Workshop on Speech Synthesis, Scotland, UK, 2001. [ bib ]

K. Koumpis, C. Ladas, and S. Renals. An advanced integrated architecture for wireless voicemail retrieval. In Proc. 15th IEEE International Conference on Information Networking, pages 403-410, 2001. [ bib | .ps.gz | Abstract ]

S. Renals and D. Abberley. The THISL SDR system at TREC-9. In Proc. Ninth Text Retrieval Conference (TREC-9), 2001. [ bib | .ps.gz | .pdf | Abstract ]

H. Christensen, Y. Gotoh, and S. Renals. Punctuation annotation using statistical prosody models. In Proc. ISCA Workshop on Prosody in Speech Recognition and Understanding, Red Bank, NJ, USA, 2001. [ bib | .ps.gz | .pdf | Abstract ]

C. Mayo, A. Turk, and J. Watson. Flexibility of acoustic cue weighting in children's speech perception. Journal of the Acoustical Society of America, 109:2313, 2001. [ bib | .pdf ]

M. Wester, J. M. Kessens, C. Cucchiarini, and H. Strik. Obtaining phonetic transcriptions: a comparison between expert listeners and a continuous speech recognizer. Language and Speech, 44(3):377-403, 2001. [ bib | .pdf | Abstract ]

S. Chang, S. Greenberg, and M. Wester. An elitist approach to articulatory-acoustic feature classification. In Proc. Eurospeech '01, pages 1729-1733, Aalborg, 2001. [ bib | .pdf | Abstract ]

K. Koumpis and S. Renals. The role of prosody in a voicemail summarization system. In Proc. ISCA Workshop on Prosody in Speech Recognition and Understanding, Red Bank, NJ, USA, 2001. [ bib | .ps.gz | .pdf | Abstract ]

M. Wester, S. Greenberg, and S. Chang. A Dutch treatment of an elitist approach to articulatory-acoustic feature classification. In Proc. Eurospeech '01, pages 1729-1732, Aalborg, 2001. [ bib | .pdf | Abstract ]

2000

Alexander Gutkin. Log-Linear Interpolation of Language Models. MPhil. thesis, Department of Engineering, University of Cambridge, UK, December 2000. [ bib | .ps.gz | .pdf ]

Shigeki Matsuda, Mitsuru Nakai, Hiroshi Shimodaira, and Shigeki Sagayama. Feature-dependent Allophone Clustering. In Proc. ICSLP2000, pages 413-416, October 2000. [ bib | .pdf | Abstract ]

Hiroshi Shimodaira, Toshihiko Akae, Mitsuru Nakai, and Shigeki Sagayama. Jacobian Adaptation of HMM with Initial Model Selection for Noisy Speech Recognition. In Proc. ICSLP2000, pages 1003-1006, October 2000. [ bib | .pdf | Abstract ]

Shigeki Matsuda, Mitsuru Nakai, Hiroshi Shimodaira, and Shigeki Sagayama. Asynchronous-Transition HMM. In Proc. ICASSP 2000 (Istanbul, Turkey), Vol. II, pages 1001-1004, June 2000. [ bib | .pdf | Abstract ]

Y. Gotoh and S. Renals. Information extraction from broadcast news. Philosophical Transactions of the Royal Society of London, Series A, 358:1295-1310, 2000. [ bib | .ps.gz | .pdf | Abstract ]

J.M. Kessens, M. Wester, and H. Strik. Automatic detection and verification of Dutch phonological rules. In PHONUS 5: Proceedings of the "Workshop on Phonetics and Phonology in ASR", pages 117-128, Saarbruecken, 2000. [ bib | .pdf | Abstract ]

J.A. Bangham, S.J. Cox, M. Lincoln, I. Marshall, M. Tutt, and M Wells. Signing for the deaf using virtual humans. In IEE Colloquium on Speech and Language processing for Disabled and Elderly, 2000. [ bib | .pdf | Abstract ]

Andreas Stolcke, N. Coccaro, R. Bates, P. Taylor, C. Van Ess-Dykema, K. Ries, Elizabeth Shriberg, D. Jurafsky, R.Martin, and M. Meteer. Dialog act modeling for automatic tagging and recognition of conversational speech. Computational Linguistics, 26(3), 2000. [ bib | .ps | .pdf ]

Ann K. Syrdal, Colin W. Wightman, Alistair Conkie, Yannis Stylianou, Mark Beutnagel, Juergen Schroeter, Volker Strom, and Ki-Seung Lee. Corpus-based techniques in the at&t nextgen synthesis system. In Proc. Int. Conf. on Spoken Language Processing, Beijing, 2000. [ bib | .ps | .pdf | Abstract ]

S. Renals, D. Abberley, D. Kirby, and T. Robinson. Indexing and retrieval of broadcast news. Speech Communication, 32:5-20, 2000. [ bib | .ps.gz | .pdf | Abstract ]

M. Carreira-Perpiñán and S. Renals. Practical identifiability of finite mixtures of multivariate Bernoulli distributions. Neural Computation, 12:141-152, 2000. [ bib | .ps.gz | .pdf | Abstract ]

Paul Taylor. Analysis and synthesis of intonation using the tilt model. Journal of the Acoustical Society of America, 107(3):1697-1714, 2000. [ bib | .ps | .pdf ]

Kurt Dusterhoff. Synthesizing Fundamental Frequency Using Models Automatically Trained from Data. PhD thesis, University of Edinburgh, 2000. [ bib | .ps | .pdf ]

M. Wester, J.M. Kessens, and H. Strik. Pronunciation variation in ASR: Which variation to model? In Proc. ICSLP '00, volume IV, pages 488-491, Beijing, 2000. [ bib | .pdf | Abstract ]

Helen Wright. Modelling Prosodic and Dialogue Information for Automatic Speech Recognition. PhD thesis, University of Edinburgh, 2000. [ bib | .ps | .pdf ]

A. Wrench and K. Richmond. Continuous speech recognition using articulatory data. In Proc. ICSLP 2000, Beijing, China, 2000. [ bib | .ps | .pdf | Abstract ]

M. Wester and E. Fosler-Lussier. A comparison of data-derived and knowledge-based modeling of pronunciation variation. In Proc. ICSLP '00, volume I, pages 270-273, Beijing, 2000. [ bib | .pdf | Abstract ]

K. Koumpis and S. Renals. Transcription and summarization of voicemail speech. In Proc. ICSLP, volume 2, pages 688-691, Beijing, 2000. [ bib | .ps.gz | .pdf | Abstract ]

Y. Gotoh and S. Renals. Variable word rate n-grams. In Proc IEEE ICASSP, pages 1591-1594, Istanbul, 2000. [ bib | .ps.gz | .pdf | Abstract ]

P A Taylor. Concept-to-speech by phonological structure matching. Philosophical Transactions of the Royal Society, Series A, 2000. [ bib | .ps | .pdf ]

J. Frankel, K. Richmond, S. King, and P. Taylor. An automatic speech recognition system using neural networks and linear dynamic models to recover and model articulatory traces. In Proc. ICSLP, 2000. [ bib | .ps | .pdf | Abstract ]

Y. Gotoh and S. Renals. Sentence boundary detection in broadcast speech transcripts. In ISCA ITRW: ASR2000, pages 228-235, Paris, 2000. [ bib | .ps.gz | .pdf | Abstract ]

M. Wester, J.M. Kessens, and H. Strik. Using Dutch phonological rules to model pronunciation variation in ASR. In Phonus 5: proceedings of the "workshop on phonetics and phonology in ASR", pages 105-116, Saarbruecken, 2000. [ bib | .pdf | Abstract ]

C. Mayo. The relationship between phonemic awareness and cue weighting in speech perception: longitudinal and cross-sectional child studies. PhD thesis, Queen Margaret University College, 2000. [ bib | .pdf ]

O. Goubanova and P. Taylor. Using Bayesian Belief networks for model duration in text-to-speech systems. In CD-ROM Proc. ICSLP 2000, Beijing, China, 2000. [ bib ]

Edmilson Morais, Paul Taylor, and Fabio Violaro. Concatenative text-to-speech synthesis based on prototype waveform interpolation (a time frequency approach). In Proc. ICSLP 2000, Beijing, China, 2000. [ bib | .ps | .pdf ]

S. King, P. Taylor, J. Frankel, and K. Richmond. Speech recognition via phonetically-featured syllables. In PHONUS, volume 5, pages 15-34, Institute of Phonetics, University of the Saarland, 2000. [ bib | .ps | .pdf | Abstract ]

Simon King and Paul Taylor. Detection of phonological features in continuous speech using neural networks. Computer Speech and Language, 14(4):333-353, 2000. [ bib | .ps | .pdf | Abstract ]

D. Abberley, S. Renals, D. Ellis, and T. Robinson. The THISL SDR system at TREC-8. In Proc. Eighth Text Retrieval Conference (TREC-8), 2000. [ bib | .ps.gz | .pdf | Abstract ]

1999

Sue Fitt and Steve Isard. Synthesis of regional English using a keyword lexicon. In Proc. Eurospeech 1999, volume 2, pages 823-826, Budapest, September 1999. [ bib | .ps | .pdf | Abstract ]

Jun Rokui and Hiroshi Shimodaira. Multistage Building Learning based on Misclassification Measure. In 9-th International Conference on Artificial Neural Networks, Edinburgh, UK, September 1999. [ bib ]

Kanad Keeni, Kenji Nakayama, and Hiroshi Shimodaira. A Training Scheme for Pattern Classification Using Multi-layer Feed-forward Neural Networks. In IEEE International Conference on Computational Intelligence and Multimedia Applications, pages 307-311, September 1999. [ bib ]

Sue Fitt. The treatment of vowels preceding 'r' in a keyword lexicon of English. In Proc. ICPhS 1999, August 1999. [ bib | .ps | .pdf | Abstract ]

Simon King and Alan Wrench. Dynamical system modelling of articulator movement. In Proc. ICPhS 99, pages 2259-2262, San Francisco, August 1999. [ bib | .ps | .pdf | Abstract ]

Kanad Keeni, Kenji Nakayama, and Hiroshi Shimodaira. Estimation of Initial Weights and Hidden Units for Fast Learning of Multi-layer Neural Networks for Pattern Classification. In IEEE International Joint Conference on Neural Networks (IJCNN'99), July 1999. [ bib ]

M Poesio, R. Henschel, J. Hitzeman, R. Kibble, S. Montague, and K. van Deemter. Towards an annotation scheme for noun phrase generation. In Proceedings of the EACL workshop on linguistically interpreted corpora (LINC-99), Norway, 1999. [ bib | .ps | .pdf ]

C. Mayo. Perceptual weighting and phonemic awareness in pre-reading and early-reading children. In XIVth International Congress of Phonetic Sciences, San Francisco, 1999. [ bib | .pdf ]

V. Strom and H. Heine. Utilizing prosody for unconstrained morpheme recognition. In Proc. European Conf. on Speech Communication and Technology, Budapest, 1999. [ bib | .ps | .pdf | Abstract ]

J.M. Kessens, M. Wester, and H. Strik. Improving the performance of a Dutch CSR by modeling within-word and cross-word pronunciation variation. Speech Communication, 29:193-207, 1999. [ bib | .pdf | Abstract ]

G. Cook, K. Al-Ghoneim, D. Ellis, E. Fosler-Lussier, Y. Gotoh, B. Kingsbury, N. Morgan, S. Renals, T. Robinson, and G. Williams. The SPRACH system for the transcription of broadcast news. In Proc. DARPA Broadcast News Workshop, pages 161-166, 1999. [ bib | .html | .ps.gz | .pdf | Abstract ]

J.M. Kessens, M. Wester, and H. Strik. Modeling within-word and cross-word pronunciation variation to improve the performance of a Dutch CSR. In Proc. ICPhS '99, pages 1665-1668, San Francisco, 1999. [ bib | .pdf | Abstract ]

Janet Hitzeman, Alan W. Black, Paul Taylor, Chris Mellish, and Jon Oberlander. An annotation scheme for concept-to-speech synthesis. In Proceedings of the European Workshop on Natural Language Generation, pages 59-66, Toulouse, France, 1999. [ bib | .ps | .pdf ]

T. Robinson, D. Abberley, D. Kirby, and S. Renals. Recognition, indexing and retrieval of British broadcast news with the THISL system. In Proc. Eurospeech, pages 1067-1070, Budapest, 1999. [ bib | .ps.gz | .pdf | Abstract ]

H. Wright, Massimo Poesio, and Stephen Isard. Using high level dialogue information for dialogue act recognition using prosodic features. In Proceedings of an ESCA Tutorial and Research Workshop on Dialogue and Prosody, pages 139-143, Eindhoven, The Netherlands, 1999. [ bib | .ps | .pdf ]

M. Lincoln. Characterization of Speakers for Improved Automatic Speech Recognition. PhD thesis, University of East Anglia, 1999. [ bib | .pdf | Abstract ]

Y. Gotoh and S. Renals. Statistical annotation of named entities in spoken audio. In Proc. ESCA Workshop on Accessing Information In Spoken Audio, pages 43-48, Cambridge, 1999. [ bib | .ps.gz | .pdf | Abstract ]

M. Carreira-Perpiñán and S. Renals. A latent-variable modelling approach to the acoustic-to-articulatory mapping problem. In Proc. 14th Int. Congress of Phonetic Sciences, pages 2013-2016, San Francisco, 1999. [ bib | .ps.gz | .pdf | Abstract ]

S. Renals and M. Hochberg. Start-synchronous search for large vocabulary continuous speech recognition. IEEE Trans. on Speech and Audio Processing, 7:542-553, 1999. [ bib | .ps.gz | .pdf | Abstract ]

Robert A. J. Clark. Using prosodic structure to improve pitch range variation in text to speech synthesis. In Proc. XIVth international congress of phonetic sciences, volume 1, pages 69-72, 1999. [ bib | .ps | .pdf ]

Y. Gotoh, S. Renals, and G. Williams. Named entity tagged language models. In Proc IEEE ICASSP, pages 513-516, Phoenix AZ, 1999. [ bib | .ps.gz | .pdf | Abstract ]

S. Renals and Y. Gotoh. Integrated transcription and identification of named entities in broadcast speech. In Proc. Eurospeech, pages 1039-1042, Budapest, 1999. [ bib | .ps.gz | .pdf | Abstract ]

Robert. A. J. Clark and Kurt E. Dusterhoff. Objective methods for evaluating synthetic intonation. In Proc. Eurospeech 1999, volume 4, pages 1623-1626, 1999. [ bib | .ps | .pdf ]

K. Richmond. Estimating velum height from acoustics during continuous speech. In Proc. Eurospeech, volume 1, pages 149-152, Budapest, Hungary, 1999. [ bib | .ps | .pdf | Abstract ]

G. Williams and S. Renals. Confidence measures from local posterior probability estimates. Computer Speech and Language, 13:395-411, 1999. [ bib | .ps.gz | .pdf | Abstract ]

M. Wester and J.M. Kessens. Comparison between expert listeners and continuous speech recognizers in selecting pronunciation variants. In Proc. ICPhS '99, pages 723-726, San Francisco, 1999. [ bib | .pdf | Abstract ]

Kurt E. Dusterhoff. Automatic intonation analysis using acoustic data. In Proceedings, ESCA TRW on Dialogue and Prosody, Eindhoven, 1999. [ bib | .ps | .pdf ]

Kurt E. Dusterhoff, Alan W. Black, and Paul A. Taylor. Using decision trees within the tilt intonation model to predict f0 contours. In Eurospeech 99, Budapest, 1999. [ bib | .ps | .pdf ]

Briony Williams. A Welsh speech database: preliminary results. In Eurospeech 99, Eurospeech 99, Budapest, Hungary, 1999. [ bib | .ps | .pdf ]

Günther Görz, Jörg Spilker, Volker Strom, and Hans Weber. Architectural considerations for conversational systems - the verbmobil/intarc experience. proceedings of First International Workshop on Human Computer Conversation, cs.CL/9907021, 1999. [ bib | .ps | .pdf | Abstract ]

C. Mayo. The development of phonemic awareness and perceptual weighting in relation to early and later literacy acquisition. In 20th Annual Child Phonology Conference, Bangor, Wales, 1999. [ bib ]

Y. Gotoh and S. Renals. Topic-based mixture language modelling. Journal of Natural Language Engineering, 5:355-375, 1999. [ bib | .ps.gz | .pdf | Abstract ]

Paul Taylor and Alan W Black. Speech synthesis by phonological structure matching. In Eurospeech99, Budapest, Hungary, 1999. [ bib | .ps | .pdf ]

S. Renals, D. Abberley, D. Kirby, and T. Robinson. The THISL system for indexing and retrieval of broadcast news. In Proc. IEEE Workshop on Multimedia Signal Processing, pages 77-82, Copenhagen, 1999. [ bib | http | .ps.gz | .pdf | Abstract ]

John McKenna and Stephen Isard. Tailoring kalman filtering towards speaker characterisation. In Proc. Eurospeech '99, volume 6, pages 2793-2796, Budapest, 1999. [ bib | .ps | .pdf ]

D. Abberley, D. Kirby, S. Renals, and T. Robinson. The THISL broadcast news retrieval system. In Proc. ESCA Workshop on Accessing Information In Spoken Audio, pages 19-24, Cambridge, 1999. [ bib | http | .ps.gz | .pdf | Abstract ]

S. Renals, Y. Gotoh, R. Gaizauskas, and M. Stevenson. The SPRACH/LaSIE system for named entity identification in broadcast news. In Proc. DARPA Broadcast News Workshop, pages 47-50, 1999. [ bib | .html | .ps.gz | .pdf | Abstract ]

D. Abberley, S. Renals, G. Cook, and T. Robinson. Retrieval of broadcast news documents with the THISL system. In Proc. Seventh Text Retrieval Conference (TREC-7), pages 181-190, 1999. [ bib | .ps.gz | .pdf | Abstract ]

1998

Hiroshi Shimodaira, Jun Rokui, and Mitsuru Nakai. Improving The Generalization Performance Of The MCE/GPD Learning. In ICSLP'98, Australia, December 1998. [ bib | .pdf | Abstract ]

Simon King, Todd Stephenson, Stephen Isard, Paul Taylor, and Alex Strachan. Speech recognition via phonetically featured syllables. In Proc. ICSLP `98, pages 1031-1034, Sydney, Australia, December 1998. [ bib | .ps | .pdf | Abstract ]

Mitsuru Nakai and Hiroshi Shimodaira. The Use of F0 Reliability Function for Prosodic Command Analysis on F0 Contour Generation Model. In Proc. ICSLP'98, December 1998. [ bib | .pdf ]

Sue Fitt and Steve Isard. Representing the environments for phonological processes in an accent-independent lexicon for synthesis of English. In Proc. ICSLP 1998, volume 3, pages 847-850, Sydney, Australia, December 1998. [ bib | .ps | .pdf | Abstract ]

Kanad Keeni, Kenji Nakayama, and Hiroshi Shimodaira. Automatic Generation of Initial Weights and Target Outputs of Multi-layer Neural Networks and its Application to Pattern Classification. In International Conference on Neural Information Processing (ICONIP'98), pages 1622-1625, October 1998. [ bib ]

Jun Rokui and Hiroshi Shimodaira. Modified Minimum Classification Error Learning and Its Application to Neural Networks. In ICONIP'98, Kitakyushu, Japan, October 1998. [ bib ]

Eiji Iida, Hiroshi Shimodaira, Susumu Kunifuji, and Masayuki Kimura. A system to Perform Human Problem Solving. In The 5th International Conference on Soft Computing and Information / Intelligent Systems (IIZUKA'98), October 1998. [ bib ]

Kanad Keeni, Kenji Nakayama, and Hiroshi Shimodaira. Automatic Generation of Initial Weights and Estimation of Hidden Units for Pattern Classification Using Neural Networks. In 14th International Conference on Pattern Recognition (ICPR'98), pages 1568-1571, August 1998. [ bib ]

Eiji Iida, Susumu Kunifuji, Hiroshi Shimodaira, and Masayuki Kimura. A Scale-Down Solution of N^2-1 Puzzle. Trans. IEICE(D-I), J81-D-I(6):604-614, June 1998. (in Japanese). [ bib ]

Kanad Keeni, Hiroshi Shimodaira, Kenji Nakayama, and Kazunori Kotani. On Parameter Initialization of Multi-layer Feed-forward Neural Networks for Pattern Recognition. In International Conference on Computational Linguistics, Speech and Document Processing (ICCLSDP-'98), Calcutta, India, pages D8-12, February 1998. [ bib ]

Janet Hitzeman and Massimo Poesio. Long distance pronominalization and global focus. In COLING-ACL '98, volume 1, pages 550-556, Montreal, Quebec, Canada, 1998. [ bib | .ps | .pdf ]

D. Abberley, S. Renals, and G. Cook. Retrieval of broadcast news documents with the THISL system. In Proc IEEE ICASSP, pages 3781-3784, Seattle, 1998. [ bib | .ps.gz | .pdf | Abstract ]

Paul A. Taylor, S. King, S. D. Isard, and H. Wright. Intonation and dialogue context as constraints for speech recognition. Language and Speech, 41(3):493-512, 1998. [ bib | .ps | .pdf ]

S. Renals and D. Abberley. The THISL spoken document retrieval system. In Proc. 14th Twente Workshop on Language Technology, pages 129-140, 1998. [ bib | .ps.gz | .pdf | Abstract ]

M. Carreira-Perpiñán and S. Renals. Experimental evaluation of latent variable models for dimensionality reduction. In IEEE Proc. Neural Networks for Signal Processing, volume 8, pages 165-173, Cambridge, 1998. [ bib | .ps.gz | .pdf | Abstract ]

C. Mayo. The developmental relationship between perceptual weighting and phonemic awareness. In LabPhon 6, University of York, UK, 1998. [ bib ]

M. Wester, J.M. Kessens, C. Cucchiarini, and H. Strik. Selection of pronunciation variants in spontaneous speech: Comparing the performance of man and machine. In Proc. ESCA Workshop on the Sound Patterns of Spontaneous Speech: Production and Perception, pages 157-160, Aix-en-Provence, 1998. [ bib | .pdf ]

Tae-Yeoub Jang, Minsuck Song, and Kiyeong Lee. Disambiguation of korean utterances using automatic intonation recognition. In Proceedings of ICSLP98, volume 3, pages 603-606, Sydney, Australia, 1998. [ bib | .ps | .pdf ]

Helen Wright. Automatic utterance type detection using suprasegmental features. In ICSLP'98, volume 4, page 1403, Sydney, Australia, 1998. [ bib | .ps | .pdf ]

Richard Sproat, Andrew Hunt, Mari Ostendorf, Paul Taylor, Alan Black, and Kevin Lenzo. Sable: a standard for TTS markup. In Third ESCA workshop on speech synthesis, pages 27-30, Jenolan Caves, Blue Mountains, Australia, 1998. [ bib | .ps | .pdf ]

Ann Syrdal, Gregor Moehler, Kurt Dusterhoff, Alistair Conkie, and Alan W Black. Three methods of intonation modeling. In 3rd ESCA Workshop on Speech Synthesis, pages 305-310, Jenolan Caves, 1998. [ bib | .ps | .pdf ]

Michael O'Donnell, Alistair Knott, Janet Hitzeman, and Hua Cheng. Integrating referring and informing in np planning. In Coling-ACL Workshop on the Computational Treatment of Nominals, Montreal, Quebec, Canada, 1998. [ bib | .ps | .pdf ]

Paul Taylor and Alan Black. Assigning phrase breaks from part of speech sequences. Computer Speech and Language, 12:99-117, 1998. [ bib | .ps | .pdf ]

Paul A Taylor. The Tilt intonation model. In ICSLP98, Sydney, 1998. [ bib | .ps | .pdf ]

J. Barker, G. Williams, and S. Renals. Acoustic confidence measures for segmenting broadcast news. In Proc. ICSLP, pages 2719-2722, Sydney, 1998. [ bib | .ps.gz | .pdf | Abstract ]

Paul A Taylor, Alan Black, and Richard Caley. The architecture of the festival speech synthesis system. In The Third ESCA Workshop in Speech Synthesis, pages 147-151, Jenolan Caves, Australia, 1998. [ bib | .ps | .pdf ]

K. Dusterhoff. An investigation into the effectiveness of sub-syllable acoustics in automatic intonantion analysis. In Proceedings of University of Edinburgh Linguistics/Applied Linguistics Postgraduate Conference, 1998. [ bib | .ps | .pdf ]

D. Abberley, S. Renals, G. Cook, and T. Robinson. The 1997 THISL spoken document retrieval system. In Proc. Sixth Text Retrieval Conference (TREC-6), pages 747-752, 1998. [ bib | .ps.gz | .pdf | Abstract ]

M. Lincoln, S.J. Cox, and S. Ringland. A comparison of two unsupervised approaches to accent identification. In Int. Conf. on Spoken Language Processing, pages 109-112, Sydney, 1998. [ bib | .pdf | Abstract ]

Simon King. Using Information Above the Word Level for Automatic Speech Recognition. PhD thesis, University of Edinburgh, 1998. [ bib | .ps | .pdf | Abstract ]

G. Williams and S. Renals. Confidence measures derived from an acceptor HMM. In Proc. ICSLP, pages 831-834, Sydney, 1998. [ bib | .ps.gz | .pdf | Abstract ]

M. Wester, J.M. Kessens, and H. Strik. Modeling pronunciation variation for a Dutch CSR: testing three methods. In Proc. ICSLP '98, pages 2535-2538, Sydney, 1998. [ bib | .pdf | Abstract ]

M. Wester, J.M. Kessens, and H. Strik. Improving the performance of a Dutch CSR by modeling pronunciation variation. In Proc. Workshop Modeling Pronunciation Variation for Automatic Speech Recognition, pages 145-150, Kerkrade, 1998. [ bib | .pdf | Abstract ]

Sue Fitt. Processing unfamiliar words - a study in the perception and production of native and foreign placenames. PhD thesis, The Centre for Speech Technology Research, Edinburgh University, 1998. [ bib | .ps | .pdf | Abstract ]

M. Wester, J.M. Kessens, and H. Strik. Two automatic approaches for analyzing the frequency of connected speech processes in Dutch. In Proc. ICSLP Student Day '98, pages 3351-3356, Sydney, 1998. [ bib | .pdf | Abstract ]

Hiroshi Shimodaira, Jun Rokui, and Mitsuru Nakai. Modified Minimum Classification Error Learning and Its Application to Neural Networks. In 2nd International Workshop on Statistical Techniques in Pattern Recognition (SPR'98), Sydney, Australia, 1998. [ bib | .pdf | Abstract ]

Janet Hitzeman, Alan W. Black, Paul Taylor, Chris Mellish, and Jon Oberlander. On the use of automatically generated discourse-level information in a concept-to-speech synthesis system. In ICSLP98, volume 6, pages 2763-2768, Sydney, Australia, 1998. [ bib | .ps | .pdf ]

Briony Williams. Levels of annotation for a Welsh speech database for phonetic research. In Workshop on Language Resources for European Minority Languages, Granada, Spain, May 27 1998, Workshop on Language Resources for European Minority Languages, Granada, Spain, May 27 1998, 1998. [ bib | .ps | .pdf ]

Laurence Molloy and Stephen Isard. Suprasegmental duration modelling with elastic constraints in automatic speech recognition. In ICSLP, volume 7, pages 2975-2978, Sydney, Australia, 1998. [ bib | .ps | .pdf ]

V. Strom. Automatische Erkennung von Satzmodus, Akzentuierung und Phrasengrenzen. PhD thesis, University of Bonn, 1998. [ bib | .ps | .pdf ]

Vincent Pagel, Kevin Lenzo, and Alan W Black. Letter to sound rules for accented lexicon compression. In ICSLP98, volume 5, pages 2015-2020, 1998. [ bib | .ps | .pdf ]

Yoshinori Shiga, Hiroshi Matsuura, and Tsuneo Nitta. Segmental duration control based on an articulatory model. In Proc. ICSLP, volume 5, pages 2035-2038, 1998. [ bib | .ps | .pdf | Abstract ]

M. Carreira-Perpiñán and S. Renals. Dimensionality reduction of electropalatographic data using latent variable models. Speech Communication, 26:259-282, 1998. [ bib | .ps.gz | .pdf | Abstract ]

Andreas Stolcke, E. Shriberg, R. Bates, P. Taylor, K. Ries, D. Jurafsky, N. Coccaro, R. Martin, M. Meteer, and C. Van Ess-Dykema. Dialog act modelling for conversational speech. In AAAI Spring Symposium on Applying Machine Learning to Discourse Processing, 1998. [ bib | .ps | .pdf ]

G. Williams and S. Renals. Confidence measures for evaluating pronunciation models. In ESCA Workshop on Modeling pronunciation variation for automatic speech recognition, pages 151-155, Kerkrade, Netherlands, 1998. [ bib | .ps.gz | .pdf | Abstract ]

C. Mayo. A longitudinal study of perceptual weighting and phonemic awarenes. In Chicago Linguistics Society 34, 1998. [ bib ]

M. Wester. Automatic classification of voice quality: Comparing regression models and hidden Markov models. In Proc. VOICEDATA98, Symposium on Databases in Voice Quality Research and Education, pages 92-97, Utrecht, 1998. [ bib | .pdf | Abstract ]

Alan W Black, Kevin Lenzo, and Vincent Pagel. Issues in building general letter to sound rules. In The Third ESCA Workshop in Speech Synthesis, pages 77-80, 1998. [ bib | .ps | .pdf ]

Elizabeth Shriberg, R. Bates, P. Taylor, A. Stolcke, K. Ries, D. Jurafsky, N. Coccaro, R. Martin, M. Meteer, and C. Van Ess-Dykema. Can prosody aid the automatic classification of dialog acts in conversational speech? Language and Speech, 41(3-4), 1998. [ bib | .ps | .pdf ]

Briony Williams. The phonetic manifestation of stress in Welsh. 1998. [ bib | .ps | .pdf ]

J.M. Kessens, M. Wester, C. Cucchiarini, and H. Strik. The selection of pronunciation variants: Comparing the performance of man and machine. In Proc. ICSLP '98, pages 2715-2718, Sydney, 1998. [ bib | .pdf | Abstract ]

Richard Sproat, Andrew Hunt, Mari Ostendorf, Paul Taylor, Alan Black, and Kevin Lenzo. Sable: a standard for TTS markup. In ICSLP98, volume 5, pages 1719-1724, Sydney, Australia, 1998. [ bib | .ps | .pdf ]

1997

Mitsuru Nakai, Harald Singer, Yoshimori Sagisaka, and Hiroshi Shimodaira. Accent Phrase Segmentation Based on F0 Templates Using a Superpositional Prosodic Model. Trans. IEICE (D-II), J80-D-II(10):2605-2614, October 1997. (in Japanese). [ bib ]

Sue Fitt. The generation of regional pronunciations of English for speech synthesis. In Proc. Eurospeech 1997, Rhodes, Greece, September 1997. [ bib | .ps | .pdf | Abstract ]

Simon King, Thomas Portele, and Florian Höfer. Speech synthesis using non-uniform units in the Verbmobil project. In Proc. Eurospeech 97, volume 2, pages 569-572, Rhodes, Greece, September 1997. [ bib | .ps | .pdf | Abstract ]

Hiroshi Shimodaira, Mitsuru Nakai, and Akihiro Kumata. Restration of Pitch Pattern of Speech Based on a Pitch Gereration Model. In Proc. EuroSpeech'97, pages 512-524, September 1997. [ bib | .pdf | Abstract ]

Mitsuru Nakai and Hiroshi Shimodaira. On Representation of Fundamental Frequency of Speech for Prosody Analysis Using Reliability Function. In Proc. EuroSpeech'97, pages 243-246, September 1997. [ bib | .pdf ]

K. Richmond. A proposal for the compartmental modelling of stellate cells in the anteroventral cochlear nucleus, using realistic auditory nerve inputs. Master's thesis, Centre for Cognitive Science, University of Edinburgh, September 1997. [ bib ]

K. Richmond, A. Smith, and E. Amitay. Detecting subject boundaries within text: A language-independent statistical approach. In Proc. The Second Conference on Empirical Methods in Natural Language Processing, pages 47-54, Brown University, Providence, USA, August 1997. [ bib | .ps | .pdf | Abstract ]

Kanad Keeni, Hiroshi Shimodaira, and Kenji Nakayama. On Distributed Representation of Output Layer for Recognizing Japanese Kana Characters Using Neural Networks. In Proceedings of the 4'th International Conference on Document Analysis and Recognition, ICDAR'97, pages 600-603, July 1997. Ulm, Germany. [ bib ]

Tu Bao Ho, Nguyen Trong Dung, Hiroshi Shimodaira, and Masayuki Kimura. An Interactive-Graphic Environment for Discovering and Using Conceptual Knowledge. In 7th European-Japanese Conference on Information Modelling and Knowledge Bases, pages 327-343, May 1997. [ bib ]

Kanad Keeni and Hiroshi Shimodaira. On Representation of Output Layer for Recognizing Japanese Kana Characters Using Neural Networks. In Proc. the `17'th International Conference on Computer Processing of Oriental Languages, pages 305-308, April 1997. Baptist University, Kowloon Tong, Hong Kong. [ bib ]

Briony J. Williams and Stephen Isard. A keyvowel approach to the synthesis of regional accents of English. In Eurospeech 97, Rhodes, Greece, 1997. [ bib | .ps | .pdf ]

Robert A. J. Clark. Language acquisition and implication for language change: A computational model. In Proceedings of the GALA 97 Conference on Language Acquisition, pages 322-326, 1997. [ bib | .ps | .pdf ]

J.M. Kessens and M. Wester. Improving recognition performance by modelling pronunciation variation. In Proc. CLS opening Academic Year '97 '98, pages 1-20, Nijmegen, 1997. [ bib | .pdf | Abstract ]

Janet Hitzeman. Semantic partition and the ambiguity of temporal adverbials. Journal of Natural Language Semantics, 5:87-100, 1997. [ bib | .ps | .pdf ]

J. Hennebert, C. Ris, H. Bourlard, S. Renals, and N. Morgan. Estimation of global posteriors and forward-backward training of hybrid HMM/ANN systems. In Proc. Eurospeech, pages 1951-1954, Rhodes, 1997. [ bib | .ps.gz | .pdf | Abstract ]

M. Huckvale, C. Benoit, C. Bowerman, A. Eriksson, M. Rosner, M. Tatham, and Briony J. Williams. Opportunities for computer-aided instruction in phonetics and speech communication provided by the internet. In Eurospeech 97, Rhodes, Greece, 1997. [ bib | .ps | .pdf ]

Jean Carletta, Amy Isard, Stephen Isard, Jacqueline C. Kowtko, Gwyneth Doherty-Sneddon, and Anne H. Anderson. The reliability of a dialogue structure coding scheme. Computational Linguistics, 23(1):13-31, 1997. [ bib | .ps | .pdf ]

M. Wester, J.M. Kessens, C. Cucchiarini, and H. Strik. Modelling pronunciation variation: some preliminary results. In Proc. Dept. of Language & Speech, pages 127-137, Nijmegen, 1997. [ bib | .pdf | Abstract ]

Janet Hitzeman, Chris Mellish, and Jon Oberlander. Generation of museum web pages: The intelligent labelling explorer. Archives and Museum Informatics, 11:107-115, 1997. [ bib | .ps | .pdf ]

Alan W. Black and Paul A. Taylor. Assigning phrase breaks from part-of-speech sequences. In Eurospeech97, volume 2, pages 995-998, Rhodes, Greece, 1997. [ bib | .ps | .pdf ]

G. Williams and S. Renals. Confidence measures for hybrid HMM/ANN speech recognition. In Proc. Eurospeech, pages 1955-1958, Rhodes, 1997. [ bib | .ps.gz | .pdf | Abstract ]

Simon King. Final report for Verbmobil Teilprojekt 4.4. Technical Report ISSN 1434-8845, IKP, Universitaet Bonn, January 1997. Verbmobil-Report 195 available at http://verbmobil.dfki.de. [ bib | Abstract ]

M. Lincoln, S.J. Cox, and S. Ringland. A fast method of speaker normalisation using formant estimation. In 5th European Conference on Speech Communication and Technology, pages 2095-2098, Rhodes, 1997. [ bib | .pdf | Abstract ]

R. Sproat, Paul A. Taylor, M. Tanenblatt, and Amy Isard. A markup language for text-to-speech synthesis. In Eurospeech 97, 1997. [ bib | .ps | .pdf ]

Y. Gotoh and S. Renals. Document space models using latent semantic analysis. In Proc. Eurospeech, pages 1443-1446, Rhodes, 1997. [ bib | .ps.gz | .pdf | Abstract ]

Mitsuru Nakai, Harald Singer, Yoshinori Sagisaka, and Hiroshi Shimodaira. Accent Phrase Segmentation by F0 Clustering Using Superpositional Modeling, pages 343-360. Springer, January 1997. [ bib ]

Alan W. Black and Paul A. Taylor. Automatically clustering similar units for unit selection in speech synthesis. In Eurospeech97, volume 2, pages 601-604, Rhodes, Greece, 1997. [ bib | .ps | .pdf ]

C. Mayo, M. Aylett, and D. R. Ladd. Prosodic transcription of glasgow english: an evaluation study of GlaToBI. In Intonation: Theory, Models and Applications, 1997. [ bib | .pdf ]

B. L. Karlsen, G. J. Brown, M. Cooke, P. Green, and S. Renals. Analysis of a simultaneous speaker sound corpus. In D. F. Rosenthal and H. G. Okuno, editors, Computational Auditory Scene Analysis, pages 321-334. Lawrence Erlbaum Associates, 1997. [ bib ]

Sukeyasu Kanno and Hiroshi Shimodaira. Voiced Sound Detection under Nonstationary and Heavy Noisy Environment Using the Prediction Error of Low-Frequency Spectrum. Trans. IEICE(D-II), J80-D-II(1):26-35, January 1997. (in Japanese). [ bib ]

Alan W. Black and Paul A. Taylor. The Festival Speech Synthesis System: System documentation. Technical Report HCRC/TR-83, Human Communciation Research Centre, University of Edinburgh, Scotland, UK, 1997. Avaliable at http://www.cstr.ed.ac.uk/projects/festival.html. [ bib ]

B Williams. Computer-Aided Learning and Use of the Internet: Speech Sciences Education (section of chapter). 1997. [ bib ]

Beth Ann Hockey, Deborah Rossen-Knill, Beverly Spejewski, Matthew Stone, and Stephen Isard. Can you predict responses to yes/no questions? yes, no, and stuff. In Eurospeech '97, pages 2267-2270, 1997. [ bib ]

Jacqueline Kowtko. The function of intonation in spontaneous and read dialogue. In Proceedings of the XIIIth International Congress of Phonetic Sciences, volume 2, pages 286-289, Stockholm, Sweden, 1997. [ bib ]

Helen Wright and Paul A. Taylor. Modelling intonational structure using hidden markov models. In ESCA workshop on Intonation: Theory Models and Applications, Athens, Greece, 1997. [ bib | .ps | .pdf ]

Kurt Dusterhoff and Alan W. Black. Generating f0 contours for speech synthesis using the tilt intonation theory. In Proc. ESCA Workshop on Intonation, pages 107-110, Athens, Greece., 1997. [ bib | .ps | .pdf ]

V. Strom, A. Elsner, G. Görz, W. Hess, W. Kasper, A. Klein, H.U. Krieger, J. Spilker, and H. Weber. On the use of prosody in a speech-to-speech translator. In Proc. European Conf. on Speech Communication and Technology, Rhodes, 1997. [ bib | .ps | .pdf | Abstract ]

Dan Jurafsky, A. Stolcke, E. Shriberg, R. Bates, P. Taylor, K. Ries, N. Coccaro, R. Martin, M. Meteer, and C. Van Ess-Dykema. Automatic detection of discourse structure for speech recognition and understanding. In 1997 IEEEWorkshop on Speech Recognition and Understanding,, Santa Barbara, 1997. [ bib | .ps | .pdf ]

Alan W. Black. Predicting the intonation of discourse segments from examples in dialogue speech. In Y. Sagisaka, N. Campbell, and N. Higuchi, editors, Computing Prosody, pages 117-128. Springer-Verlag, 1997. [ bib ]

J.M. Kessens, M. Wester, C. Cucchiarini, and H. Strik. Testing a method for modelling pronunciation variation. In Proceedings of the COST workshop, pages 37-40, Rhodos, 1997. [ bib | .pdf | Abstract ]

Paul A. Taylor, Simon King, Stephen Isard, Helen Wright, and Jacqueline Kowtko. Using intonation to constrain language models in speech recognition. In Proc. Eurospeech'97, Rhodes, 1997. [ bib | .pdf | Abstract ]

Paul A. Taylor and Amy Isard. SSML: A speech synthesis markup language. Speech Communication, (21):123-133, 1997. [ bib | .ps | .pdf ]

B Williams. Spoken Language Corpus Representation (section of chapter). longmans, 1997. [ bib ]

1996

Simon King. Users Manual for Verbmobil Teilprojekt 4.4. IKP, Universitaet Bonn, October 1996. [ bib | Abstract ]

Simon King. Inventory design for Verbmobil Teilprojekt 4.4. Technical report, IKP, Universität Bonn, October 1996. [ bib | Abstract ]

Kanad Keeni, Hiroshi Shimodaira, Tetsuro Nishino, and Yasuo Tan. Recognition of Devanagari Characters Using Neural Networks. IEICE, E79-D(5):523-528, May 1996. [ bib ]

Andrew Hunt and Alan W. Black. Unit selection in a concatenative speech synthesis system using a large speech database. In ICASSP-96, volume 1, pages 373-376, Atlanta, Georgia, 1996. [ bib | .ps | .pdf ]

G Knowles, L. Taylor, and B. Williams. A corpus of formal British English speech. 1996. [ bib ]

K. Dusterhoff. Intone: A prototype intonation analysis system. Master's thesis, Georgetown University, 1996. [ bib ]

S. Renals. Phone deactivation pruning in large vocabulary continuous speech recognition. IEEE Signal Processing Letters, 3:4-6, 1996. [ bib | .ps.gz | Abstract ]

Briony J. Williams. The status of corpora as linguistic data. In A. Wichmann & P. Alderson G. Knowles, editor, Working with Speech. London: Longmans, 1996. [ bib ]

B. Pickering, Briony J. Williams, and G. Knowles. Analysis of transcriber differences in the sec. In Working with Speech. 1996. [ bib ]

Paul A. Taylor, Hiroshi Shimodaira, Stephen Isard, Simon King, and Jacqueline Kowtko. Using prosodic information to constrain language models for spoken dialogue. In Proc. ICSLP `96, Philadelphia, 1996. [ bib | .ps | .pdf | Abstract ]

N. Campbell and Alan W. Black. CHATR: a multi-lingual speech re-sequencing synthesis system. In Institute of Electronic, Information and Communication Engineers, Tokyo, 1996. [ bib ]

John McKenna. Tone and initial/final recognition for mandarin chinese. Master's thesis, University of Edinburgh, 1996. [ bib | .ps | .pdf ]

Robert A.J. Clark. Internal and external factors affecting language change: A computational model. Master's thesis, University of Edinburgh, 1996. [ bib | .ps | .pdf ]

Sue Fitt. Spelling unfamiliar names. In Proc. International Congress of Onomastic Sciences 1996, 1996. [ bib | .ps | .pdf | Abstract ]

D. Kershaw, T. Robinson, and S. Renals. The 1995 Abbot LVCSR system for multiple unknown microphones. In Proc. ICSLP, pages 1325-1328, Philadelphia PA, 1996. [ bib ]

Briony J. Williams and P. Alderson. Synthesising British English intonation. In Working with Speech. 1996. [ bib ]

Alan W. Black and Andrew Hunt. Generating f0 contours from ToBI labels using linear regression. In ICSLP96, volume 3, pages 1385-1388, Philadelphia, PA., 1996. [ bib ]

Briony J. Williams. The formulation of an intonation transcription system for British English. In A. Wichmann & P. Alderson G. Knowles, editor, Working with Speech. London: Longmans, 1996. [ bib ]

T. Robinson, M. Hochberg, and S. Renals. The use of recurrent networks in continuous speech recognition. In C.-H. Lee, K. K. Paliwal, and F. K. Soong, editors, Automatic Speech and Speaker Recognition - Advanced Topics, pages 233-258. Kluwer Academic Publishers, 1996. [ bib | .ps.gz | Abstract ]

S. Renals and M. Hochberg. Efficient evaluation of the LVCSR search space using the NOWAY decoder. In Proc IEEE ICASSP, pages 149-152, Atlanta, 1996. [ bib | .ps.gz | Abstract ]

D. Kershaw, T. Robinson, and S. Renals. The 1995 Abbot hybrid connectionist-HMM large vocabulary recognition system. In Proc. ARPA Spoken Language Technology Conference, pages 93-99, 1996. [ bib ]

Jacqueline Kowtko. The Function of Intonation in Task-Oriented Dialogue. PhD thesis, 1996. [ bib | .ps | .pdf ]

N. Campbell and Alan W. Black. Prosody and the selection of source units for concatenative synthesis. In J. van Santen, R. Sproat, J. Olive, and J. Hirschberg, editors, Progress in Speech Synthesis, pages 279-282. Springer Verlag, 1996. [ bib ]

K. Dusterhoff. Using computational analysis to determine pitch accent. In Proceedings Computational Linguistics in Montreal, pages 1-4, 1996. [ bib ]

A. Conkie and Stephen D. Isard. Optimal coupling of diphones. In J. P. H. Santen, R. W. Sproat, J. P. Olive, and Hirschberg, editors, Progress in Speech Synthesis. Springer, 1996. [ bib ]

V. Strom and C. Widera. What's in the “pure” prosody? In Proc. ICSLP, Philadelphia, 1996. [ bib | .ps | .pdf | Abstract ]

1995

Sue Fitt. The pronunciation of unfamiliar native and non-native town names. In Proc. Eurospeech 1995, Madrid, Spain, September 1995. [ bib | .ps | .pdf | Abstract ]

Hisao Koba, hiroshi Shimodaira, and Masayuki Kimura. Intelligent Automatic Document Transcription System for Braille: To Improve Accessibility to Printed Matter for the Visually Impaired. In HIC International'95, July 1995. [ bib ]

and Hiroshi Shimodaira. HI Design Based on the Costs of Human Information-processing Model. In HIC international'95, July 1995. [ bib ]

Mitsuru Nakai, Singer Harald, Yoshinori Sagisaka, and Hiroshi Shimodaira. Automatic Prosodic Segmentation by F0 Clustering Using Superpositional Modeling. In Proc. ICASSP-95, PR08.6, pages 624-627, May 1995. [ bib | .pdf ]

Mark E. Forsyth. Semi-continuous hidden Markov models for speaker verification. PhD thesis, University of Edinburgh, 1995. [ bib ]

M. Hochberg, G. Cook, S. Renals, T. Robinson, and R. Schechtman. The 1994 Abbot hybrid connectionist-HMM large vocabulary recognition system. In Proc. ARPA Spoken Language Technology Workshop, pages 170-175, 1995. [ bib | .ps.gz ]

Jean Carletta, Amy Isard, Stephen Isard, Jacqueline Kowtko, Gwyneth Doherty-Sneddon, and Anne H. Anderson. The coding of dialogue structure in a corpus. In J.A. Andernach, S.P. van de Burgt, and G.F. van der Hoeven, editors, Proceedings of the Ninth Twente Workshop on Language Technology: Corpus-based Approaches to Dialogue Modelling. Universiteit Twente, Enschede, 1995. [ bib ]

Stephen Isard, Simon King, Paul A. Taylor, and Jacqueline Kowtko. Prosodic information in a speech recognition system intended for dialogue. In IEEE Workshop in speech recognition, Snowbird, Utah, 1995. [ bib | Abstract ]

Alan W. Black. Comparison of algorithms for predicting accent placement in English speech synthesis. In Proceedings of the Acoustics Society of Japan, pages 275-276, 1995. [ bib | .ps | .pdf ]

Briony J. Williams. Text-to-speech synthesis for Welsh and Welsh English. In Proc. Eurospeech '95, Madrid, 1995. [ bib | .ps | .pdf ]

Paul A. Taylor. Using neural networks to locate pitch accents. In Proc. Eurospeech '95, Madrid, 1995. [ bib | .ps | .pdf ]

T. Robinson, J. Fransen, D. Pye, J. Foote, and S. Renals. WSJCAM0: A British English speech corpus for large vocabulary continuous speech recognition. In Proc IEEE ICASSP, pages 81-84, Detroit, 1995. [ bib ]

Paul A. Taylor and Amy Isard. SSML: A speech synthesis markup language. In 2nd Speak! Workshop: Speech Generation in Multimodal Information Systems and Practical Applications, Darmstadt, 1995. [ bib ]

Alan A. Wrench, M. S. Jackson, D. S. Soutar, A.G. Robertson, and J. Mackenzie Beck. Evaluation of a system for segmental speech quality assessment: Voiceless fricavties. In Proc. Eurospeech '95, Madrid, 1995. [ bib | .ps | .pdf ]

Alan W. Black and N. Campbell. Predicting the intonation of discourse segments from examples in dialogue speech. In ESCA workshop on spoken dialogue systems, pages 197-200, Denmark, 1995. [ bib | .ps | .pdf ]

Alan W. Black and N. Campbell. Optimising selection of units from speech databases for concatenative synthesis. In Eurospeech95, volume 1, pages 581-584, Madrid, Spain, 1995. [ bib | .ps | .pdf ]

Briony Williams. The segmentation and labelling of speech databases. Technical report, 1995. [ bib ]

Janet Hitzeman, Marc Moens, and Claire Grover. Algorithms for analysing the temporal structure of discourse. In Proceedings of the Sixth International Conference of the European Chapter of the Association for Computational Linguistics, Dublin, Ireland, 1995. [ bib | .ps | .pdf ]

Amy C. Isard. SSML: a markup language for speech synthesis. Master's thesis, University of Edinburgh, 1995. [ bib ]

S. Renals and M. Hochberg. Efficient search using posterior phone probability estimates. In Proc IEEE ICASSP, pages 596-599, Detroit, 1995. [ bib | .ps.gz | Abstract ]

J. Neto, L. Almeida, M. Hochberg, C. Martins, L. Nunes, S. Renals, and T. Robinson. Speaker adaptation for hybrid HMM-ANN continuous speech recogniton system. In Proc. Eurospeech, pages 2171-2174, Madrid, 1995. [ bib | .ps.gz | Abstract ]

Alan A. Wrench. Analysis of fricatives using multiple centres of gravity. In Proc. Eurospeech '95, Madrid, 1995. [ bib ]

E. Sanders. Using probabilistic methods to detect phrase boundaries for speech synthesis. Master's thesis, University of Edinburgh, 1995. [ bib ]

W. Hess, A. Batliner, A. Kießling, R. Kompe, E. Nöth, A. Petzold, M. Reyelt, and V. Strom. Prosodic modules for speech recognition and understanding in VERBMOBIL. In Norio Higuchi Yoshinori Sagisaka, Nick Campbell, editor, Computing Prosody, pages Part IV, Chapter 23, pp. 363 - 383. Springer-Verlag, New York, 1995. [ bib | .ps | .pdf ]

Paul A. Taylor. The rise/fall/connection model of intonation. Speech Communication, 15:169-186, 1995. [ bib | .ps | .pdf ]

Alan W. Black. Predicting the intonation of discourse segments from examples in dialogue speech. In ATR workshop on computational modeling of prosody for spontaneous speech processing, ATR, Japan, 1995. [ bib | .ps | .pdf ]

Eric Sanders and Paul A. Taylor. Using statistical models to predict phrase boundaries for speech synthesis. In Proc. Eurospeech '95, Madrid, 1995. [ bib | .ps | .pdf ]

M. Hochberg, S. Renals, T. Robinson, and G. Cook. Recent improvements to the Abbot large vocabulary CSR system. In Proc IEEE ICASSP, pages 69-72, Detroit, 1995. [ bib | .ps.gz | Abstract ]

Mark E. Forsyth. Discriminating observation probability (DOP) HMM for speaker verification. Speech Communication, 17:117-129, 1995. [ bib ]

V. Strom. Detection of accents, phrase boundaries and sentence modality in German with prosodic features. In Proc. European Conf. on Speech Communication and Technology, volume 3, pages 2039-2041, Madrid, 1995. [ bib | .ps | .pdf | Abstract ]

1994

Mitsuru Nakai and Hiroshi Shimodaira. Accent Phrase Segmentation by Finding N-best Sequences of Pitch Pattern Templates. In Proc. ICSLP94, 8.10, pages 347-350, September 1994. [ bib | .pdf ]

Mitsuru Nakai, Hiroshi Shimodaira, and Shigeki Sagayama. Prosodic Phrase Segmentation Based on Pitch-Pattern Clustering. Electronics and Communications in Japan, Part 3, 77(6):80-91, June 1994. (in Japanese). [ bib ]

Hiroshi Shimodaira and Mitsuru Nakai. Prosodic phrase segmentation by pitch pattern clustering. In Proc. ICASSP-94, 76.5, vol.II, pages 185-188, March 1994. [ bib | .pdf | Abstract ]

Mitsuru Nakai, Hiroshi Shimodaira, and Shigeki Sagayama. Prosodic phrase segmentation based on pitch-pattern clustering. Trans. IEICE (A), J77-A(2):206-214, February 1994. (in Japanese). [ bib ]

M. Hochberg, S. Renals, and T. Robinson. Abbot: The CUED hybrid connectionist/HMM large vocabulary recognition system. In Proc. ARPA Spoken Language Technology Workshop, pages 102-105, 1994. [ bib ]

N. Morgan, H. Bourlard, S. Renals, M. Cohen, and H. Franco. Hybrid neural network/hidden Markov model systems for continuous speech recognition. In I. Guyon and P. S. P. Wang, editors, Advances in Pattern Recognition Systems using Neural Networks Technologies, volume 7 of Series in Machine Perception and Artificial Intelligence. World Scientific Publications, 1994. [ bib ]

M. Hochberg, S. Renals, T. Robinson, and D. Kershaw. Large vocabulary continuous speech recognition using a hybrid connectionist/HMM system. In Proc. ICSLP, pages 1499-1502, Yokohama, 1994. [ bib ]

Paul A. Taylor and Alan W. Black. Synthesizing conversational intonation from a linguistically rich input. In Second ESCA/IEEE Workshop on Speech Synthesis, New York, 1994. [ bib | .ps | .pdf ]

Alan W. Black and Paul A. Taylor. A framework for generating prosody from high level linguistics descriptions. In Spring meeting, Acoustical society of Japan, 1994. [ bib ]

Briony J. Williams. Diphone synthesis for Welsh. In Proceedings of the Institute of Acoustics, volume 16, pages 359-365, 1994. [ bib ]

Mark Forsyth and M. A. Jack. Discriminating semi-continuous HMM for speaker verification. In Proc. IEEE International Conference on Acoustics, Speech, Signal Processing, 1994. [ bib | .ps | .pdf ]

Alan W. Black and Paul A. Taylor. Assigning intonation elements and prosodic phrasing for English speech synthesis from high level linguistic input. In ICSLP94, volume 2, pages 715-718, Yokohama, Japan, 1994. [ bib | .ps | .pdf ]

S. Renals, N. Morgan, H. Bourlard, M. Cohen, and H. Franco. Connectionist probability estimators in HMM speech recognition. IEEE Trans. on Speech and Audio Processing, 2:161-175, 1994. [ bib | .ps.gz | Abstract ]

S. Renals, M. Hochberg, and T. Robinson. Learning temporal dependencies in connectionist speech recognition. In J. D. Cowan, G. Tesauro, and J. Alspector, editors, Advances in Neural Information Processing Systems, volume 6, pages 1051-1058. Morgan Kaufmann, 1994. [ bib | .ps.gz | .pdf ]

Alan W. Black and Paul A. Taylor. CHATR: A generic speech synthesis system. In COLING '94, volume 2, pages 983-986, Kyoto, Japan, 1994. [ bib | .ps | .pdf ]

Janet Hitzeman. A reichenbachian account of the interaction of the present perfect with temporal adverbials. In Proceedings of the conference on Semantics and Linguistic Theory, Cornell Working Papers, volume 10, pages 107-126, Cornell, NY, USA, 1994. [ bib | .ps | .pdf ]

Briony J. Williams and S. Hiller. The question of randomness in English foot timing: a control experiment. Journal of Phonetics, 22:423-439, 1994. [ bib | .ps | .pdf ]

S. Renals and M. Hochberg. Using Gamma filters to model temporal dependencies in speech. In Proc. ICSLP, pages 1491-1494, Yokohama, 1994. [ bib | .ps.gz ]

Yoshinori Shiga, Yoshiyuki Hara, and Tsuneo Nitta. A novel segment-concatenation algorithm for a cepstrum-based synthesizer. In Proc. ICSLP, volume 4, pages 1783-1786, 1994. [ bib ]

Briony J. Williams. Diphone synthesis for the Welsh language. In Proceedings of the 1994 International Conference on Spoken Language Processing, Yokohama, Japan, 1994. [ bib ]

Briony J. Williams. Welsh letter-to-sound rules: Rewrite rules and two-level rules compared. Computer Speech and Language, 8:261-277, 1994. [ bib | .ps | .pdf ]

P. C. Bagshaw. Automatic Prosodic Analysis for Computer Aided Pronunciation Teaching. PhD thesis, University of Edinburgh, 1994. [ bib ]

T. Robinson, M. Hochberg, and S. Renals. IPA: Improved phone modelling with recurrent neural networks. In Proc IEEE ICASSP, pages 37-40, Adelaide, 1994. [ bib ]

Mark Forsyth, P. C. Bagshaw, and M. A. Jack. Incorporating discriminating observation probabilities (DOP) into semi-continuous hmm for speaker verification. In Proc. ESCA workshop on Automatic Speaker Recognition, Identification and Verification, pages 19-22, Martigny, Switzerland, 1994. [ bib | .ps | .pdf ]

M. Hochberg, G. Cook, S. Renals, and T. Robinson. Connectionist model combination for large vocabulary speech recognition. In IEEE Proc. Neural Networks for Signal Processing, volume 4, pages 269-278, 1994. [ bib | .ps.gz ]

H. Niemann, J. Denzler, B. Kahles, R. Kompe, A. Kießling, E. Nöth, and V. Strom. Pitch determination considering laryngealization effects in spoken dialogs. In Proc. Int. Conf. on Neuronal Networks, volume 7, pages 4457-4461, Orlando, 1994. [ bib | .ps | .pdf | Abstract ]

1993

Mark Schmidt, Sue Fitt, Christina Scott, and Mervin Jack. Phonetic transcription standards for European names (ONOMASTICA). In Proc. Eurospeech 1993, September 1993. [ bib | .ps | .pdf | Abstract ]

Hiroshi Shimodaira and Mitsuru Nakai. Accent phrase segmentation using transition probabilities between pitch pattern templates. In Proc. EuroSpeech'93, pages 1767-1770, September 1993. [ bib | .ps.gz | Abstract ]

Briony J. Williams. Letter-to-sound rules for the Welsh language. In Proc. Eurospeech '93, Berlin, 1993. [ bib ]

Alan A. Wrench, M.S. Jackson, M. A. Jack, D. S. Soutar, A. G. Robertson, J. Mackenzie Beck, and J. Laver. A speech therapy workstation providing visual feedback of segmental quality. In Proc. Eurospeech '93, Berlin, volume 1, pages 219-222, 1993. [ bib ]

Paul A. Taylor. Automatic recognition of intonation from F0 contours using the rise/fall/connection model. In Proc. Eurospeech '93, Berlin, 1993. [ bib | .ps | .pdf ]

N. Morgan, H. Bourlard, S. Renals, M. Cohen, and H. Franco. Hybrid neural network/hidden Markov model systems for continuous speech recognition. Intl. J. Pattern Recog. and Artific. Intell., 7:899-916, 1993. [ bib ]

Mark Forsyth and M. A. Jack. Duration modelling and multiple codebooks in semi-continuous HMMs for speaker verification. In Proc. European Conference on Speech Communication and Technology, pages 319-322, 1993. [ bib ]

Alan A. Wrench, M. S. Jackson, M. A. Jack, D. S. Soutar, A. G. Robertson, J. Mackenzie Beck, and J. Laver. Speech therapy workstation for the assessment of segmental quality: Voiceless fricative. In ESCA Workshop on Speech and Language Technology for Disabled Persons, Stockholm, 1993. [ bib ]

Mark E. Forsyth, A. M. Sutherland, J. A. Elliott, and M. A. Jack. HMM speaker verification with sparse training data on telephone quality speech. Speech Communication, 1993. [ bib ]

A. J. Robinson, L. Almeida, J.-M. Boite, H. Bourlard, F. Fallside, M. Hochberg, D. Kershaw, P. Kohn, Y. Konig, N. Morgan, J. P. Neto, S. Renals, M. Saerens, and C. Wooters. A neural network based, speaker independent, large vocabulary, continuous speech recognition system: the Wernicke project. In Proc. Eurospeech, pages 1941-1944, Berlin, 1993. [ bib ]

S. Renals and D. MacKay. Bayesian regularisation methods in a hybrid MLP-HMM system. In Proc. Eurospeech, pages 1719-1722, Berlin, 1993. [ bib | .ps.gz ]

P. C. Bagshaw, S. M. Hiller, and M. A. Jack. Enhanced pitch tracking and the processing of f0 contours for computer aided intonation teaching. In Proc. Eurospeech '93, Berlin, volume 2, pages 1003-1006, 1993. [ bib | .ps | .pdf ]

Paul A. Taylor. Synthesizing intonation using the RFC model. In Proc. ESCA workshop on prosody, lund, sweden, 1993. [ bib | .ps | .pdf ]

P. C. Bagshaw. An investigation of acoustics events related to sentential stress and pitch accents in English. Speech Communication, 13(3-4):333-342, 1993. [ bib ]

1992

Hiroshi Shimodaira and Mitsuru Nakai. Robust pitch detection by narrow band spectrum analysis. In Proc. ICSLP-92, pages 1597-1600, October 1992. [ bib | .pdf | Abstract ]

P. C. Bagshaw. Criteria for labelling prosodic aspects of English speech. In Proc. 4th. Australian International Conference on Speech Science and Technology, Brisbane, Australia, 1992. [ bib | .ps | .pdf ]

James L. Hieronymus and Briony J. Williams. A comparison of the prosody in read speech and directed monologue in British English. In Proceedings of the ESCA Workshop on the Phonetics and Phonology of Speaking Styles, Barcelona, Spain, 1992. [ bib ]

H. Bourlard, N. Morgan, and S. Renals. Neural nets and hidden Markov models: Review and generalizations. Speech Communication, 11:237-246, 1992. [ bib ]

S. Renals, N. Morgan, M. Cohen, and H. Franco. Connectionist probability estimation in the Decipher speech recognition system. In Proc IEEE ICASSP, pages 601-604, San Francisco, 1992. [ bib | .ps.gz ]

Alan A. Wrench, L. Laver, M. A. Jack, A. G. Robertson, D. S. Soutar, and J. Mackenzie Beck. Objective speech quality assessment in patients with intra-oral cancers: Voiceless fricative. In International Conference on Spoken Language Processing, volume 2, pages 1071-1074, Banff, Canada, 1992. [ bib ]

S. Renals, H. Bourlard, N. Morgan, H. Franco, and M. Cohen. Connectionist optimisation of tied mixture hidden Markov models. In J. E. Moody, S. J. Hanson, and R. P. Lippmann, editors, Advances in Neural Information Processing Systems, volume 4, pages 167-174. Morgan-Kaufmann, 1992. [ bib ]

Briony J. Williams. The design of a speech database for Welsh diphone extraction. In Proc. Institute of Acoustics, volume 14, 1992. [ bib ]

Paul A. Taylor. A phonetic model of English intonation. PhD thesis, University of Edinburgh, 1992. [ bib | .ps | .pdf ]

Paul A. Taylor and S. D. Isard. A new model of intonation for use with speech recognition and synthesis. In International Conference on Spoken Language Processing, Banff, Canada, 1992. [ bib | .ps | .pdf ]

Alan W. Black. A Situation Theoretic Approach to Computational Semantics. PhD thesis, University of Edinburgh, 1992. [ bib | .ps | .pdf ]

S. Renals, N. Morgan, M. Cohen, H. Franco, and H. Bourlard. Improving statistical speech recognition. In Proc. IJCNN, volume 2, pages 301-307, Baltimore MD, 1992. [ bib | .ps.gz ]

P. C. Bagshaw and Briony J. Williams. Criteria for labelling prosodic aspects of English speech. In Proc. International Conference on Spoken Language Processing, volume 2, pages 859-862, Banff, Canada, 1992. [ bib | .ps | .pdf ]

H. Bourlard, N. Morgan, C. Wooters, and S. Renals. CDNN: A context-dependent neural network for continuous speech recognition. In Proc IEEE ICASSP, pages 349-352, San Francisco, 1992. [ bib ]

Briony J. Williams. Welsh letter-to-sound rules for text-to-speech synthesis. In Proc. Institute of Acoustics, volume 14, 1992. [ bib ]

Mark E. Forsyth, A. M. Sutherland, J. A. Elliott, and M. A. Jack. HMM speaker verification with sparse training data on telephone quality speech. In Proceedings of the Fourth Australian International Conference on Speech Science and Technology, pages 67-72, Brisbane, Australia, 1992. [ bib ]

1991

Paul A. Taylor, I. A. Nairn, A. M. Sutherland, and M. A. Jack. A real time speech synthesis system. In IEEE symposium, 1991. [ bib ]

P. C. Bagshaw. Analysis of samples of wideband signals taken at irregular, sub-nyquist intervals. Electronic Letters, 27(14):1228-1230, 1991. [ bib ]

S. Renals, D. McKelvie, and F. McInnes. A comparative study of continuous speech recognition using neural networks and hidden Markov models. In Proc IEEE ICASSP, pages 369-372, Toronto, 1991. [ bib ]

Paul A. Taylor, I. A. Nairn, A. M. Sutherland, and M. A. Jack. An interactive synthetic speech generation system. In IEEE Colloquium on Systems and Applications of Man-Machine Interaction Using Speech I/O, London, 1991. [ bib ]

W. N. Campbell and Stephen D. Isard. Segmental durations in a syllable frame. Journal of Phonetics, 19:37-47, 1991. [ bib ]

James L. Hieronymus and Briony J. Williams. An investigation of the relation between perceived pitch accent and automatically-located accent in British English. In Proceedings of the Second European Conference on Speech Communication and Technology, Genova, Italy, 1991. [ bib ]

S. Renals, N. Morgan, and H. Bourlard. Probability estimation by feed-forward networks in continuous speech recognition. In IEEE Proc. Neural Networks for Signal Processing, pages 309-318, Princeton NJ, 1991. [ bib | .ps.gz ]

Paul A. Taylor, I. A. Nairn, A. M. Sutherland, and M. A. Jack. A real time speech synthesis system. In Proc. Eurospeech '91, Genova Italy, 1991. [ bib ]

Briony J. Williams and Franziska Maier. A spelling corrector for use in text-to-speech synthesis for English. In Proc. Eurospeech '91, Genova Italy, 1991. [ bib ]

Paul A. Taylor and Stephen D. Isard. Automatic diphone segmentation. In Proc. Eurospeech '91, Genova, Italy, 1991. [ bib ]

J. van de Plassche Alan W. Black and Briony J. Williams. Analysis of unknown words through morphological decomposition. In Proceedings of 5th Conference of the European Chapter of the Association for Computational Linguistics, pages 101-106, Berlin, Germany, 1991. [ bib | .ps | .pdf ]

1990

S. Renals. Chaos in neural networks. In L. B. Almeida and C. J. Wellekens, editors, Neural Networks, number 412 in Lecture Notes in Computer Science, pages 90-99. Springer-Verlag, 1990. [ bib ]

Briony J. Williams and D. McKelvie. A statistical analysis of segmental durations using automatically-segmented data. In Proceedings of the Institute of Acoustics, volume 12, pages 95-102, 1990. [ bib ]

S. Renals and R. Rohwer. A study of network dynamics. J. Stat. Phys., 58:825-847, 1990. [ bib ]

Paul A. Taylor and Stephen D. Isard. Automatic diphone segmentation using hidden markov models. In SST-90, Third International Australian Conference in Speech Science and Technology, Melbourne, Australia, 1990. [ bib ]

W. N. Campbell, Stephen D. Isard, A. I. C. Monaghan, and J. Verhoven. Duration, pitch and diphones in the CSTR TTS system. In ICSLP '90, 1990. [ bib ]

1989

Briony Williams. Speech technology: A snapshot for the non-specialist. Newsletter of the Voice Research Society, 3(2), 1989. [ bib ]

S. Renals and R. Rohwer. Phoneme classification experiments using radial basis functions. In Proc. IJCNN, pages 461-468, Washington DC, 1989. [ bib ]

Alan W. Black. Finite state machines from feature grammars,. In Proceedings of the International Workshop on Parsing Technologies, pages 277-285, Pittsburgh, 1989. [ bib | .ps | .pdf ]

Briony J. Williams and H. Thompson. Modelling phonological processes in continuous speech recognition. In Proceedings of the European Conference on Speech Communication and Technology, Paris, France, 1989. [ bib ]

Briony J. Williams, S. M Hiller, F. McInnes, and J. Dalby. A knowledge-based nasal classifier for use in continuous speech recognition. In Proceedings of the European Conference on Speech Communication and Technology, Paris, France, 1989. [ bib ]

S. Renals and R. Rohwer. Neural networks for speech pattern classification. In IEE Conference Publication 313, 1st IEE Conference on Artificial Neural Networks, pages 292-296, London, 1989. [ bib ]

S. Renals and R. Rohwer. Learning phoneme recognition using neural networks. In Proc IEEE ICASSP, pages 413-416, Glasgow, 1989. [ bib ]

Briony Williams. Stress in modern Welsh. PhD thesis, 1989. [ bib ]

S. Renals and J. Dalby. Analysis of a neural network model for speech recognition. In Proc. Eurospeech, volume 1, pages 333-336, Paris, 1989. [ bib ]

1988

Briony Williams. Review of W.A. Ainsworth, 'Speech Recognition by Machine', 1988. [ bib ]

Briony Williams. Review of A. Cruttenden, 'Intonation', 1988. [ bib ]

R. Rohwer and S. Renals. Training recurrent networks. In L. Personnaz and G. Dreyfus, editors, Neural networks from models to applications (Proc. nEuro '88), pages 207-216, Paris, 1988. I.D.S.E.T. [ bib ]

M. Terry, S. Renals, R. Rohwer, and J. Harrington. A connectionist approach to speech recognition using peripheral auditory modelling. In Proc IEEE ICASSP, pages 699-702, New York, 1988. [ bib ]

S. Renals, R. Rohwer, and M. Terry. A comparison of speech recognition front ends using a connectionist classifier. In Proc. FASE Speech '88, pages 1381-1388, Edinburgh, 1988. [ bib ]

Briony Williams, Steve Hiller, and Jonathan Dalby. Experimental results on gaussian classification of nasals. Technical report, 1988. [ bib ]

R. Rohwer, S. Renals, and M. Terry. Unstable connectionist networks in speech recognition. In Proc IEEE ICASSP, pages 426-428, New York, 1988. [ bib ]

S. Renals. Radial basis functions network for speech pattern classification. Electronics Letters, 25:437-439, 1988. [ bib ]

Stephen D. Isard and Mark Pearson. A repertoire of British English contours for speech synthesis. In SPEECH '88, 7th FASE Symposium, London, 1988. [ bib ]

1987

Briony Williams. English word stress in a text-to-speech synthesis system. Technical report, 1987. [ bib ]

Briony J. Williams. Word stress assignment in a text-to-speech synthesis system for British English. Computer Speech and Language, 2:235-272, 1987. [ bib ]

Briony J. Williams and P. R. Alderson. Applying the tonetic stress mark system to the synthesis of British English. In Proceedings of the XIth International Congress of Phonetic Sciences, Tallinn, Estonia, 1987. [ bib ]

1986

Briony J. Williams. An acoustic study of some features of Welsh prosody. In C. Johns-Lewis, editor, Intonation in Discourse. London: Croom Helm, 1986. [ bib ]

Briony Williams and Peter Alderson. Synthesising British English intonation using a nuclear tone model. Technical report, 1986. [ bib ]

S. G. C. Lawrence, Briony J. Williams, and G. Kaye. The automatic phonetic transcription of English. In P. A. Luelsdorff, editor, Orthography and Phonology. London: Croom Helm, 1986. [ bib ]

Briony J. Williams. Synthesising British English intonation using tonetic stress marks. In Proceedings of the Institute of Acoustics, volume 8, 1986. [ bib ]

Stephen D. Isard and D. A. Miller. Diphone synthesis techniques. In IEEE Conference Publication no 258, pages 77-82, 1986. [ bib ]

1985

Briony J. Williams. Pitch and duration in Welsh stress perception: the implications for intonation. Journal of Phonetics, 13(4):381-406, 1985. [ bib ]

1984

Briony J. Williams. Lexical stress assignment in English: a metrical approach. In Proceedings of the Institute of Acoustics, volume 6, pages 261-265, 1984. [ bib ]

1983

Briony J. Williams. An approach to the Welsh vowel system. Bulletin of the Board of Celtic Studies, 30:239-252, 1983. [ bib ]