The Centre for Speech Technology Research, The university of Edinburgh

PWorkshop Archives: Summer Term 2003

15 Apr 2003

Mariko Sugahara

The Right vs. Left Asymmetry of Post-FOCUS Prosodic Phrase Boundaries in Tokyo Japanese

This paper examines presence/absence of prosodic boundaries in a post-FOCUS part of an utterance in Tokyo Japanese and provides a phonological account for it within the framework of Optimality Theory.



22 Apr 2003

Christine Haunz

Grammatical and Non-Grammatical Factors in Loanword Adaptation

This talk aims to include not only phonological differences between borrowing and donor language in the study of loanwords, but also factors which may not depend solely on these differences, e.g. similarity, frequency and gradient grammaticality. The influence of these factors on the performance of English speakers in a shadowing task of Russian words with English-illegal initial clusters was tested. The frequency of potential adapted onsets in the English lexicon does not correlate with the strategy of adaptation. Judgments about the grammaticality of words containing illegal initial clusters and the similarity between pairs of words partially containing illegal onsets were obtained from English native speakers. Similarity of a target to an adaptation was shown to be a predictor of its rate of use. The perceived grammaticality of a target cluster influenced performance in two ways: high-grammaticality target clusters were modified less often, and low-grammaticality clusters were mostly associated with vowel epenthesis.



29 Apr 2003

Bob Ladd and Caroline Ekelund

Downstep, Emphasis, and overall pitch range


06 May 2003

Yiya Chen

Prosody and systematic variation in the F0 realization of lexical tones in Standard Chinese


14 May 2003

Laura Redi (Harvard, MIT)

Categories of intonational representation: Some effects of alignment and pitch range


20 May 2003

Abigail Cohn (Cornell University)

Superheavy Monosyllables in American English: The Role of the Mora

Words with diphthong or tense vowel nuclei and post-vocalic liquids, such as flour and eel, are an area of considerable interest in American English due to their variability. Native speakers are not in agreement as to their syllable count and this can differ between dialects. Drawing on a variety of phonological and mophological evidence, we conclude that these words are monosyllabic, but superheavy. We argue that such superheavy syllables are best represented as being trimoraic, due to a requirement that liquids in the rime bear a mora. Results of an acoustic study lend support to our analysis of these words as superheavy monosyllables represented moraically. The universal markedness of trimoraic syllables makes them vulnerable to resolution, which is manifested in different ways in different dialects of American English.



21 May 2003

Julian Bradfield (Computer Science)

Concurrency and Phonology

This talk presents some very preliminary ideas on the use of concurrency, and in particular the rich computer science notion of concurrency, in phonetics and phonology. After a brief description of the CS concept, I'll consider, as time permits, its application to formal models of phonology, to relating different phonological theories, to click languages and to speech recognition. I hope for your feedback!

A more detailed abstract can be found here.



03 Jun 2003

Antti-Veikko Rosti (Cambridge University)

Switching linear dynamical systems for speech recognition

Currently the most popular acoustic model for speech recognition is the hidden Markov model (HMM). However, HMMs are based on a series of assumptions some of which are known to be poor. In particular the assumption that successive speech frames are conditionally independent given the state that generated them. To overcome this, segment models have been proposed. These model whole segments of frames rather than individual ones. One form is the stochastic segment model (SSM), which uses a standard linear dynamical system to model the sequence of observations within a segment. Here the dynamics are modelled by a first-order Gauss-Markov process in some low-dimensional state space. The feature vector is a noise corrupted linear transformation of the state vector. Though the training and recognition algorithms are more complex compared to HMMs, it is feasible to use standard techniques for inference with SSMs.

For the SSM, segments are assumed to be independent. Intuitively, this is not always valid due to co-articulation between the modelling units. Switching linear dynamical systems (SLDS) have therefore been proposed. In SLDS, the posterior distribution of the state vector is propagated between segments. Unfortunately, exact inference in SLDS is not tractable due to exponential growth of components in time. In this talk, approximate methods for the inference in SLDSs will be presented. First there are approximate methods based on heuristic Viterbi-like algorithm. Alternatively variational learning may be used. Finally approaches based on Markov chain Monte Carlo methods can be used, including a training scheme based on stochastic expectation maximisation (SEM). For the SEM scheme, convergence and implementation issues for use with SLDS will be discussed in detail.



01 Jul 2003

Nikola Ikonomov (Bulgarian Academy of Sciences)

Preservation and Digital Restoration of Audio Archives

Problems, related to the storage and handling of audio archives on magnetic tapes were examined. A corresponding set of measures and restoration utilities were defined, developed and implemented. Results offer an efficient way to overcome problems related to safekeeping and restoration of sound recordings and could be successfully applied in institutions (linguistic and speech research, dialectological research, folklore, history related recordings etc.) with similar audio archives.



14 Jul 2003

Alan W Black, Tanja Schultz, and Robert Frederking (CMU)

Towards Communicating with Dolphins

After working in the area of rapid development of speech-to-speech translation systems for human languages with limited resources, we were recently contacted about applying our techniques to communication with dolphins. Of course although full translation is not feasible, there a number of ways speech technology can help in dolphin research.

Working with the Wild Dolphin Project, who have almost 20 years of experience with a pod of spotted dolphins 40 miles off the Bahamas, we are using their existing recordings for this work, and currently designing new equipment to allow collection of more data.

After a general description of dolphin acoustics, this talk will describe some areas where speech recognition technology can be used to better classify dolphin recordings, and present initial results on a simple dolphin ID system based on signature whistles. Also we will describe a framework for an experiment we intend to run later this summer, to investigate how dolphins may relate to synthesized noises in their acoustic domain, and how they may mimic them.



31 Jul 2003

ICPhS Practice Talks

Corine Astesano and Ellen Bard
Structural and rhythmic influences on the occurrence of the Initial Accent in French

Rob Clark
Modelling pitch accents for concept-to-speech synthesis

Christine Haunz
Factors in loanword adaptation

Bob Ladd
Phonological conditioning of F0 target alignment

Richard Mullooly
An electromagnetic articulography study of rhotic consonants in English

Mits Ota, Bob Ladd and Madoka Tsuchiya
Effects of foot structure on mora duration in Japanese?

Susana Cortes Pomacondor
Transfer in L2 sound production

Alice Turk
Introduction to the symposium on the word, foot, and syllable in speech production and perception: Possible roles for the syllable in speech production

Posters

Mika Ito
Breathiness and Politeness : An experimental study of male speakers of Japanese

Cassie Mayo and Alice Turk
Is the development of cue weighting strategies in children's speech perception context-dependent?

Sherry Ou
An Optimality-theoretic approach to word stress: evidence from Chinese-English Interlanguage

James M Scobbie and Alan A Wrench
An articulatory investigation of word final /l/ and /l/-sandhi in three dialects of English

Sarah Creer and Maria Wolters
Stress patterns of German cardinal numbers



12 Aug 2003

Nobuaki Minematsu (University of Tokyo; Royal Institute of Technology, Stockholm)

Phonetic Tree Analysis

Two new techniques are proposed to characterize the accented pronunciation. The first technique, Phonetic Tree Analysis, extracts phonetic tree structure embedded in utterances of a student. Results of analyzing Japanese English visually and clearly present well-known Japanese habits in speaking English. The second technique automatically estimates the segmental intelligibility not based upon acoustic matching with native speakers' utterances but based upon matching between two structures, the extracted phonetic structure in the student's pronunciation and the lexical structure in the target language's vocabulary. The estimation is done using one of word perception models, Cohort Model, and the estimated cohort size is interpreted as degree of the segmental unintelligibility. Experimental results show good accordance between the estimated intelligibility and the segmental proficiency rated by teachers. Further, some possible applications are also shown based upon the proposed two techniques.



26 Aug 2003

Laurence White and Alice Turk

Polysyllabic shortening revisited: word length and the attenuation of accentual lengthening


[back to PWorkshop Archives]

<owner-pworkshop@ling.ed.ac.uk>