The Centre for Speech Technology Research, The university of Edinburgh

Publications by Steve Isard

[1] Peter Bell, Myroslava Dzikovska, and Amy Isard. Designing a spoken language interface for a tutorial dialogue system. In Proc. Interspeech, Portland, Oregon, USA, September 2012. [ bib | .pdf ]
We describe our work in building a spoken language interface for a tutorial dialogue system. Our goal is to allow natural, unrestricted student interaction with the computer tutor, which has been shown to improve the student's learning gain, but presents challenges for speech recognition and spoken language understanding. We discuss the choice of system components and present the results of development experiments in both acoustic and language modelling for speech recognition in this domain.

[2] Myroslava Dzikovska, Amy Isard, Peter Bell, Johanna Moore, Natalie Steinhauser, and Gwendolyn Campbell. Beetle II: an adaptable tutorial dialogue system. In Proceedings of the SIGDIAL 2011 Conference, demo session, pages 338-340, Portland, Oregon, June 2011. Association for Computational Linguistics. [ bib | http ]
We present Beetle II, a tutorial dialogue system which accepts unrestricted language input and supports experimentation with different tutorial planning and dialogue strategies. Our first system evaluation compared two tutorial policies and demonstrated that the system can be used to study the impact of different approaches to tutoring. The system is also designed to allow experimentation with a variety of natural language techniques, and discourse and dialogue strategies.

[3] Myroslava Dzikovska, Amy Isard, Peter Bell, Johanna D. Moore, Natalie B. Steinhauser, Gwendolyn E. Campbell, Leanne S. Taylor, Simon Caine, and Charlie Scott. Adaptive intelligent tutorial dialogue in the Beetle II system. In Artificial Intelligence in Education - 15th International Conference (AIED 2011), interactive event, volume 6738 of Lecture Notes in Computer Science, page 621, Auckland, New Zealand, 2011. Springer. [ bib | DOI ]
[4] Helen Wright-Hastie, Massimo Poesio, and Stephen Isard. Automatically predicting dialogue structure using prosodic features. Speech Communication, 36(1-2):63-79, 2002. [ bib ]
[5] Sue Fitt and Steve Isard. Synthesis of regional English using a keyword lexicon. In Proc. Eurospeech 1999, volume 2, pages 823-826, Budapest, September 1999. [ bib | .ps | .pdf ]
We discuss the use of an accent-independent keyword lexicon to synthesise speakers with different regional accents. The paper describes the system architecture and the transcription system used in the lexicon, and then focuses on the construction of word-lists for recording speakers. We illustrate by mentioning some of the features of Scottish and Irish English, which we are currently synthesising, and describe how these are captured by keyword synthesis.

[6] H. Wright, Massimo Poesio, and Stephen Isard. Using high level dialogue information for dialogue act recognition using prosodic features. In Proceedings of an ESCA Tutorial and Research Workshop on Dialogue and Prosody, pages 139-143, Eindhoven, The Netherlands, 1999. [ bib | .ps | .pdf ]
[7] John McKenna and Stephen Isard. Tailoring kalman filtering towards speaker characterisation. In Proc. Eurospeech '99, volume 6, pages 2793-2796, Budapest, 1999. [ bib | .ps | .pdf ]
[8] Simon King, Todd Stephenson, Stephen Isard, Paul Taylor, and Alex Strachan. Speech recognition via phonetically featured syllables. In Proc. ICSLP `98, pages 1031-1034, Sydney, Australia, December 1998. [ bib | .ps | .pdf ]
We describe a speech recogniser which uses a speech production-motivated phonetic-feature description of speech. We argue that this is a natural way to describe the speech signal and offers an efficient intermediate parameterisation for use in speech recognition. We also propose to model this description at the syllable rather than phone level. The ultimate goal of this work is to generate syllable models whose parameters explicitly describe the trajectories of the phonetic features of the syllable. We hope to move away from Hidden Markov Models (HMMs) of context-dependent phone units. As a step towards this, we present a preliminary system which consists of two parts: recognition of the phonetic features from the speech signal using a neural network; and decoding of the feature-based description into phonemes using HMMs.

[9] Sue Fitt and Steve Isard. Representing the environments for phonological processes in an accent-independent lexicon for synthesis of English. In Proc. ICSLP 1998, volume 3, pages 847-850, Sydney, Australia, December 1998. [ bib | .ps | .pdf ]
This paper reports on work developing an accent-independent lexicon for use in synthesising speech in English. Lexica which use phonemic transcriptions are only suitable for one accent, and developing a lexicon for a new accent is a long and laborious process. Potential solutions to this problem include the use of conversion rules to generate lexica of regional pronunciations from standard accents and encoding of regional variation by means of keywords. The latter proposal forms the basis of the current work. However, even if we use a keyword system for lexical transcription there are a number of remaining theoretical and methodological problems if we are to synthesise and recognise accents to a high degree of accuracy; these problems are discussed in the following paper.

[10] Paul A. Taylor, S. King, S. D. Isard, and H. Wright. Intonation and dialogue context as constraints for speech recognition. Language and Speech, 41(3):493-512, 1998. [ bib | .ps | .pdf ]
[11] Laurence Molloy and Stephen Isard. Suprasegmental duration modelling with elastic constraints in automatic speech recognition. In ICSLP, volume 7, pages 2975-2978, Sydney, Australia, 1998. [ bib | .ps | .pdf ]
[12] Briony J. Williams and Stephen Isard. A keyvowel approach to the synthesis of regional accents of English. In Eurospeech 97, Rhodes, Greece, 1997. [ bib | .ps | .pdf ]
[13] Jean Carletta, Amy Isard, Stephen Isard, Jacqueline C. Kowtko, Gwyneth Doherty-Sneddon, and Anne H. Anderson. The reliability of a dialogue structure coding scheme. Computational Linguistics, 23(1):13-31, 1997. [ bib | .ps | .pdf ]
[14] Beth Ann Hockey, Deborah Rossen-Knill, Beverly Spejewski, Matthew Stone, and Stephen Isard. Can you predict responses to yes/no questions? yes, no, and stuff. In Eurospeech '97, pages 2267-2270, 1997. [ bib ]
[15] Paul A. Taylor, Simon King, Stephen Isard, Helen Wright, and Jacqueline Kowtko. Using intonation to constrain language models in speech recognition. In Proc. Eurospeech'97, Rhodes, 1997. [ bib | .pdf ]
This paper describes a method for using intonation to reduce word error rate in a speech recognition system designed to recognise spontaneous dialogue speech. We use a form of dialogue analysis based on the theory of conversational games. Different move types under this analysis conform to different language models. Different move types are also characterised by different intonational tunes. Our overall recognition strategy is first to predict from intonation the type of game move that a test utterance represents, and then to use a bigram language model for that type of move during recognition. point in a game.

[16] Paul A. Taylor, Hiroshi Shimodaira, Stephen Isard, Simon King, and Jacqueline Kowtko. Using prosodic information to constrain language models for spoken dialogue. In Proc. ICSLP `96, Philadelphia, 1996. [ bib | .ps | .pdf ]
We present work intended to improve speech recognition performance for computer dialogue by taking into account the way that dialogue context and intonational tune interact to limit the possibilities for what an utterance might be. We report here on the extra constraint achieved in a bigram language model expressed in terms of entropy by using separate submodels for different sorts of dialogue acts and trying to predict which submodel to apply by analysis of the intonation of the sentence being recognised.

[17] A. Conkie and Stephen D. Isard. Optimal coupling of diphones. In J. P. H. Santen, R. W. Sproat, J. P. Olive, and Hirschberg, editors, Progress in Speech Synthesis. Springer, 1996. [ bib ]
[18] Jean Carletta, Amy Isard, Stephen Isard, Jacqueline Kowtko, Gwyneth Doherty-Sneddon, and Anne H. Anderson. The coding of dialogue structure in a corpus. In J.A. Andernach, S.P. van de Burgt, and G.F. van der Hoeven, editors, Proceedings of the Ninth Twente Workshop on Language Technology: Corpus-based Approaches to Dialogue Modelling. Universiteit Twente, Enschede, 1995. [ bib ]
[19] Stephen Isard, Simon King, Paul A. Taylor, and Jacqueline Kowtko. Prosodic information in a speech recognition system intended for dialogue. In IEEE Workshop in speech recognition, Snowbird, Utah, 1995. [ bib ]
We report on an automatic speech recognition system intended for use in dialogue, whose original aspect is its use of prosodic information for two different purposes. The first is to improve the word level accuracy of the system. The second is to constrain the language model applied to a given utterance by taking into account the way that dialogue context and intonational tune interact to limit the possibilities for what an utterance might be.

[20] Paul A. Taylor and S. D. Isard. A new model of intonation for use with speech recognition and synthesis. In International Conference on Spoken Language Processing, Banff, Canada, 1992. [ bib | .ps | .pdf ]
[21] W. N. Campbell and Stephen D. Isard. Segmental durations in a syllable frame. Journal of Phonetics, 19:37-47, 1991. [ bib ]
[22] Paul A. Taylor and Stephen D. Isard. Automatic diphone segmentation. In Proc. Eurospeech '91, Genova, Italy, 1991. [ bib ]
[23] Paul A. Taylor and Stephen D. Isard. Automatic diphone segmentation using hidden markov models. In SST-90, Third International Australian Conference in Speech Science and Technology, Melbourne, Australia, 1990. [ bib ]
[24] W. N. Campbell, Stephen D. Isard, A. I. C. Monaghan, and J. Verhoven. Duration, pitch and diphones in the CSTR TTS system. In ICSLP '90, 1990. [ bib ]
[25] Stephen D. Isard and Mark Pearson. A repertoire of British English contours for speech synthesis. In SPEECH '88, 7th FASE Symposium, London, 1988. [ bib ]
[26] Stephen D. Isard and D. A. Miller. Diphone synthesis techniques. In IEEE Conference Publication no 258, pages 77-82, 1986. [ bib ]