Mitsuru Nakai and Hiroshi Shimodaira. Accent Phrase Segmentation by Finding N-best Sequences of Pitch Pattern Templates. In Proc. ICSLP94, 8.10, pages 347-350, September 1994. [ bib | .pdf ]

Mitsuru Nakai, Hiroshi Shimodaira, and Shigeki Sagayama. Prosodic Phrase Segmentation Based on Pitch-Pattern Clustering. Electronics and Communications in Japan, Part 3, 77(6):80-91, June 1994. (in Japanese). [ bib ]

Hiroshi Shimodaira and Mitsuru Nakai. Prosodic phrase segmentation by pitch pattern clustering. In Proc. ICASSP-94, 76.5, vol.II, pages 185-188, March 1994. [ bib | .pdf ]

This paper proposes a novel method for detecting the optimal sequence of prosodic phrases from continuous speech based on data-driven approach. The pitch pattern of input speech is divided into prosodic segments which minimized the overall distortion with pitch pattern templates of accent phrases by using the One Pass search algorithm. The pitch pattern templates are designed by clustering a large number of training samples of accent phrases. On the ATR continuous speech database uttered by 10 speakers, the rate of correct segmentation was 91.7 % maximum for the same sex data of training and testing, 88.6 % for the opposite sex.

Mitsuru Nakai, Hiroshi Shimodaira, and Shigeki Sagayama. Prosodic phrase segmentation based on pitch-pattern clustering. Trans. IEICE (A), J77-A(2):206-214, February 1994. (in Japanese). [ bib ]

M. Hochberg, S. Renals, and T. Robinson. Abbot: The CUED hybrid connectionist/HMM large vocabulary recognition system. In Proc. ARPA Spoken Language Technology Workshop, pages 102-105, 1994. [ bib ]

N. Morgan, H. Bourlard, S. Renals, M. Cohen, and H. Franco. Hybrid neural network/hidden Markov model systems for continuous speech recognition. In I. Guyon and P. S. P. Wang, editors, Advances in Pattern Recognition Systems using Neural Networks Technologies, volume 7 of Series in Machine Perception and Artificial Intelligence. World Scientific Publications, 1994. [ bib ]

M. Hochberg, S. Renals, T. Robinson, and D. Kershaw. Large vocabulary continuous speech recognition using a hybrid connectionist/HMM system. In Proc. ICSLP, pages 1499-1502, Yokohama, 1994. [ bib ]

Paul A. Taylor and Alan W. Black. Synthesizing conversational intonation from a linguistically rich input. In Second ESCA/IEEE Workshop on Speech Synthesis, New York, 1994. [ bib | .ps | .pdf ]

Alan W. Black and Paul A. Taylor. A framework for generating prosody from high level linguistics descriptions. In Spring meeting, Acoustical society of Japan, 1994. [ bib ]

Briony J. Williams. Diphone synthesis for Welsh. In Proceedings of the Institute of Acoustics, volume 16, pages 359-365, 1994. [ bib ]

Mark Forsyth and M. A. Jack. Discriminating semi-continuous HMM for speaker verification. In Proc. IEEE International Conference on Acoustics, Speech, Signal Processing, 1994. [ bib | .ps | .pdf ]

Alan W. Black and Paul A. Taylor. Assigning intonation elements and prosodic phrasing for English speech synthesis from high level linguistic input. In ICSLP94, volume 2, pages 715-718, Yokohama, Japan, 1994. [ bib | .ps | .pdf ]

S. Renals, N. Morgan, H. Bourlard, M. Cohen, and H. Franco. Connectionist probability estimators in HMM speech recognition. IEEE Trans. on Speech and Audio Processing, 2:161-175, 1994. [ bib | .ps.gz ]

We are concerned with integrating connectionist networks into a hidden Markov model (HMM) speech recognition system. This is achieved through a statistical interpretation of connectionist networks as probability estimators. We review the basis of HMM speech recognition and point out the possible benefits of incorporating connectionist networks. Issues necessary to the construction of a connectionist HMM recognition system are discussed, including choice of connectionist probability estimator. We describe the performance of such a system, using a multi-layer perceptron probability estimator, evaluated on the speaker-independent DARPA Resource Management database. In conclusion, we show that a connectionist component improves a state-of-the-art HMM system.

S. Renals, M. Hochberg, and T. Robinson. Learning temporal dependencies in connectionist speech recognition. In J. D. Cowan, G. Tesauro, and J. Alspector, editors, Advances in Neural Information Processing Systems, volume 6, pages 1051-1058. Morgan Kaufmann, 1994. [ bib | .ps.gz | .pdf ]

Alan W. Black and Paul A. Taylor. CHATR: A generic speech synthesis system. In COLING '94, volume 2, pages 983-986, Kyoto, Japan, 1994. [ bib | .ps | .pdf ]

Janet Hitzeman. A reichenbachian account of the interaction of the present perfect with temporal adverbials. In Proceedings of the conference on Semantics and Linguistic Theory, Cornell Working Papers, volume 10, pages 107-126, Cornell, NY, USA, 1994. [ bib | .ps | .pdf ]

Briony J. Williams and S. Hiller. The question of randomness in English foot timing: a control experiment. Journal of Phonetics, 22:423-439, 1994. [ bib | .ps | .pdf ]

S. Renals and M. Hochberg. Using Gamma filters to model temporal dependencies in speech. In Proc. ICSLP, pages 1491-1494, Yokohama, 1994. [ bib | .ps.gz ]

Yoshinori Shiga, Yoshiyuki Hara, and Tsuneo Nitta. A novel segment-concatenation algorithm for a cepstrum-based synthesizer. In Proc. ICSLP, volume 4, pages 1783-1786, 1994. [ bib ]

Briony J. Williams. Diphone synthesis for the Welsh language. In Proceedings of the 1994 International Conference on Spoken Language Processing, Yokohama, Japan, 1994. [ bib ]

Briony J. Williams. Welsh letter-to-sound rules: Rewrite rules and two-level rules compared. Computer Speech and Language, 8:261-277, 1994. [ bib | .ps | .pdf ]

P. C. Bagshaw. Automatic Prosodic Analysis for Computer Aided Pronunciation Teaching. PhD thesis, University of Edinburgh, 1994. [ bib ]

T. Robinson, M. Hochberg, and S. Renals. IPA: Improved phone modelling with recurrent neural networks. In Proc IEEE ICASSP, pages 37-40, Adelaide, 1994. [ bib ]

Mark Forsyth, P. C. Bagshaw, and M. A. Jack. Incorporating discriminating observation probabilities (DOP) into semi-continuous hmm for speaker verification. In Proc. ESCA workshop on Automatic Speaker Recognition, Identification and Verification, pages 19-22, Martigny, Switzerland, 1994. [ bib | .ps | .pdf ]

M. Hochberg, G. Cook, S. Renals, and T. Robinson. Connectionist model combination for large vocabulary speech recognition. In IEEE Proc. Neural Networks for Signal Processing, volume 4, pages 269-278, 1994. [ bib | .ps.gz ]

H. Niemann, J. Denzler, B. Kahles, R. Kompe, A. Kießling, E. Nöth, and V. Strom. Pitch determination considering laryngealization effects in spoken dialogs. In Proc. Int. Conf. on Neuronal Networks, volume 7, pages 4457-4461, Orlando, 1994. [ bib | .ps | .pdf ]

A frequent phenomen in spoken dialogs of the information seeking type are short elliptic utterances whose mood (declarative or interrogative) can only be distinguished by intonation. The main acoustic evidence is conveyed by the fundamental frequency or F0 contour. Many algorithms for F0 determination have been reported in the literature. A common problem are irregularities of speech known as laryngealizations. This article describes an approach based on neuronal network techniques for the improved determination of fundamental frequency. First, an improved version of our neuronal network algorithm for reconstruction of the voice source signal (glottis signal) is presented. Second, the reconstructed voice source signal is used as input to another neuronal network destinguishing the three classes 'voiceless', 'voiced-non-laryngealized', and 'voiced-laryngealized'. Third, the results are used to improve an existing F0 algorithm. Results of this approach are presented and discussed in the context of the application in a spoken dialog system.