Sue Fitt. The pronunciation of unfamiliar native and non-native town names. In Proc. Eurospeech 1995, Madrid, Spain, September 1995. [ bib | .ps | .pdf ]
This paper will discuss pronunciations of unfamiliar names, both British and foreign, by native speakers of English. Most studies which look at peoples' pronunciations of unfamiliar of pseudowords are based on English word-patterns, rather than a cross-language selection, while algorithms for determining the pronunciation of names from a variety of languages do not necessarily tell us how real people behave in such a situation. This paper shows that subjects may use different systems or sub-systems of rules to pronounce unknown names which they perceive to be non-native. If we wish to model human behaviour in novel word pronunciation, we need to take account the fact that, while native speakers are not experts in all foreign languages, neither are they linguistically naive.
Hisao Koba, hiroshi Shimodaira, and Masayuki Kimura. Intelligent Automatic Document Transcription System for Braille: To Improve Accessibility to Printed Matter for the Visually Impaired. In HIC International'95, July 1995. [ bib ]
and Hiroshi Shimodaira. HI Design Based on the Costs of Human Information-processing Model. In HIC international'95, July 1995. [ bib ]
Mitsuru Nakai, Singer Harald, Yoshinori Sagisaka, and Hiroshi Shimodaira. Automatic Prosodic Segmentation by F0 Clustering Using Superpositional Modeling. In Proc. ICASSP-95, PR08.6, pages 624-627, May 1995. [ bib | .pdf ]
Mark E. Forsyth. Semi-continuous hidden Markov models for speaker verification. PhD thesis, University of Edinburgh, 1995. [ bib ]
M. Hochberg, G. Cook, S. Renals, T. Robinson, and R. Schechtman. The 1994 Abbot hybrid connectionist-HMM large vocabulary recognition system. In Proc. ARPA Spoken Language Technology Workshop, pages 170-175, 1995. [ bib | .ps.gz ]
Jean Carletta, Amy Isard, Stephen Isard, Jacqueline Kowtko, Gwyneth Doherty-Sneddon, and Anne H. Anderson. The coding of dialogue structure in a corpus. In J.A. Andernach, S.P. van de Burgt, and G.F. van der Hoeven, editors, Proceedings of the Ninth Twente Workshop on Language Technology: Corpus-based Approaches to Dialogue Modelling. Universiteit Twente, Enschede, 1995. [ bib ]
Stephen Isard, Simon King, Paul A. Taylor, and Jacqueline Kowtko. Prosodic information in a speech recognition system intended for dialogue. In IEEE Workshop in speech recognition, Snowbird, Utah, 1995. [ bib ]
We report on an automatic speech recognition system intended for use in dialogue, whose original aspect is its use of prosodic information for two different purposes. The first is to improve the word level accuracy of the system. The second is to constrain the language model applied to a given utterance by taking into account the way that dialogue context and intonational tune interact to limit the possibilities for what an utterance might be.
Alan W. Black. Comparison of algorithms for predicting accent placement in English speech synthesis. In Proceedings of the Acoustics Society of Japan, pages 275-276, 1995. [ bib | .ps | .pdf ]
Briony J. Williams. Text-to-speech synthesis for Welsh and Welsh English. In Proc. Eurospeech '95, Madrid, 1995. [ bib | .ps | .pdf ]
Paul A. Taylor. Using neural networks to locate pitch accents. In Proc. Eurospeech '95, Madrid, 1995. [ bib | .ps | .pdf ]
T. Robinson, J. Fransen, D. Pye, J. Foote, and S. Renals. WSJCAM0: A British English speech corpus for large vocabulary continuous speech recognition. In Proc IEEE ICASSP, pages 81-84, Detroit, 1995. [ bib ]
Paul A. Taylor and Amy Isard. SSML: A speech synthesis markup language. In 2nd Speak! Workshop: Speech Generation in Multimodal Information Systems and Practical Applications, Darmstadt, 1995. [ bib ]
Alan A. Wrench, M. S. Jackson, D. S. Soutar, A.G. Robertson, and J. Mackenzie Beck. Evaluation of a system for segmental speech quality assessment: Voiceless fricavties. In Proc. Eurospeech '95, Madrid, 1995. [ bib | .ps | .pdf ]
Alan W. Black and N. Campbell. Predicting the intonation of discourse segments from examples in dialogue speech. In ESCA workshop on spoken dialogue systems, pages 197-200, Denmark, 1995. [ bib | .ps | .pdf ]
Alan W. Black and N. Campbell. Optimising selection of units from speech databases for concatenative synthesis. In Eurospeech95, volume 1, pages 581-584, Madrid, Spain, 1995. [ bib | .ps | .pdf ]
Briony Williams. The segmentation and labelling of speech databases. Technical report, 1995. [ bib ]
Janet Hitzeman, Marc Moens, and Claire Grover. Algorithms for analysing the temporal structure of discourse. In Proceedings of the Sixth International Conference of the European Chapter of the Association for Computational Linguistics, Dublin, Ireland, 1995. [ bib | .ps | .pdf ]
Amy C. Isard. SSML: a markup language for speech synthesis. Master's thesis, University of Edinburgh, 1995. [ bib ]
S. Renals and M. Hochberg. Efficient search using posterior phone probability estimates. In Proc IEEE ICASSP, pages 596-599, Detroit, 1995. [ bib | .ps.gz ]
In this paper we present a novel, efficient search strategy for large vocabulary continuous speech recognition (LVCSR). The search algorithm, based on stack decoding, uses posterior phone probability estimates to substantially increase its efficiency with minimal effect on accuracy. In particular, the search space is dramatically reduced by phone deactivation pruning where phones with a small local posterior probability are deactivated. This approach is particularly well-suited to hybrid connectionist/hidden Markov model systems because posterior phone probabilities are directly computed by the acoustic model. On large vocabulary tasks, using a trigram language model, this increased the search speed by an order of magnitude, with 2% or less relative search error. Results from a hybrid system are presented using the Wall Street Journal LVCSR database for a 20,000 word task using a backed-off trigram language model. For this task, our single-pass decoder took around 15 times realtime on an HP735 workstation. At the cost of 7% relative search error, decoding time can be speeded up to approximately realtime.
J. Neto, L. Almeida, M. Hochberg, C. Martins, L. Nunes, S. Renals, and T. Robinson. Speaker adaptation for hybrid HMM-ANN continuous speech recogniton system. In Proc. Eurospeech, pages 2171-2174, Madrid, 1995. [ bib | .ps.gz ]
It is well known that recognition performance degrades significantly when moving from a speaker- dependent to a speaker-independent system. Traditional hidden Markov model (HMM) systems have successfully applied speaker-adaptation approaches to reduce this degradation. In this paper we present and evaluate some techniques for speaker-adaptation of a hybrid HMM-artificial neural network (ANN) continuous speech recognition system. These techniques are applied to a well trained, speaker-independent, hybrid HMM-ANN system and the recognizer parameters are adapted to a new speaker through off-line procedures. The techniques are evaluated on the DARPA RM corpus using varying amounts of adaptation material and different ANN architectures. The results show that speaker-adaptation within the hybrid framework can substantially improve system performance.
Alan A. Wrench. Analysis of fricatives using multiple centres of gravity. In Proc. Eurospeech '95, Madrid, 1995. [ bib ]
E. Sanders. Using probabilistic methods to detect phrase boundaries for speech synthesis. Master's thesis, University of Edinburgh, 1995. [ bib ]
W. Hess, A. Batliner, A. Kießling, R. Kompe, E. Nöth, A. Petzold, M. Reyelt, and V. Strom. Prosodic modules for speech recognition and understanding in VERBMOBIL. In Norio Higuchi Yoshinori Sagisaka, Nick Campbell, editor, Computing Prosody, pages Part IV, Chapter 23, pp. 363 - 383. Springer-Verlag, New York, 1995. [ bib | .ps | .pdf ]
Paul A. Taylor. The rise/fall/connection model of intonation. Speech Communication, 15:169-186, 1995. [ bib | .ps | .pdf ]
Alan W. Black. Predicting the intonation of discourse segments from examples in dialogue speech. In ATR workshop on computational modeling of prosody for spontaneous speech processing, ATR, Japan, 1995. [ bib | .ps | .pdf ]
Eric Sanders and Paul A. Taylor. Using statistical models to predict phrase boundaries for speech synthesis. In Proc. Eurospeech '95, Madrid, 1995. [ bib | .ps | .pdf ]
M. Hochberg, S. Renals, T. Robinson, and G. Cook. Recent improvements to the Abbot large vocabulary CSR system. In Proc IEEE ICASSP, pages 69-72, Detroit, 1995. [ bib | .ps.gz ]
ABBOT is the hybrid connectionist-hidden Markov model (HMM) large-vocabulary continuous speech recognition (CSR) system developed at Cambridge University. This system uses a recurrent network to estimate the acoustic observation probabilities within an HMM framework. A major advantage of this approach is that good performance is achieved using context-independent acoustic models and requiring many fewer parameters than comparable HMM systems. This paper presents substantial performance improvements gained from new approaches to connectionist model combination and phone-duration modeling. Additional capability has also been achieved by extending the decoder to handle larger vocabulary tasks (20,000 words and greater) with a trigram language model. This paper describes the recent modifications to the system and experimental results are reported for various test and development sets from the November 1992, 1993, and 1994 ARPA evaluations of spoken language systems.
Mark E. Forsyth. Discriminating observation probability (DOP) HMM for speaker verification. Speech Communication, 17:117-129, 1995. [ bib ]
V. Strom. Detection of accents, phrase boundaries and sentence modality in German with prosodic features. In Proc. European Conf. on Speech Communication and Technology, volume 3, pages 2039-2041, Madrid, 1995. [ bib | .ps | .pdf ]
In this paper detectors for accents, phrase boundaries, and sentence modality are described which derive prosodic features only from the speech signal and its fundamental frequency to support other modules of a speech understanding system in an early analysis stage, or in cases where no word hypotheses are available. A new method for interpolating and decomposing the fundamental frequency is suggested. The detectors' underlying Gaussian distribution classifiers were trained and tested with approximately 50 minutes of spontaneous speech, yielding recognition rates of 78 percent for accents, 81 percent for phrase boundaries, and 85 percent for sentence modality.