The Centre for Speech Technology Research, The university of Edinburgh

PWorkshop Archives: Summer Term 2001

17 Apr 2001

Adriana Marusso

Vowel Reduction & Rhythm: A Case Study in Brazilian Portuguese and British English

Vowel reduction is a common phenomenon in stress-timed languages such as Brazilian Portuguese (BP) or British English (BE). However, there are substantial differences in the way vowel reduction takes place in these two languages, both at the phonological and phonetic levels. Therefore, an experimental research on the vowel quality and duration of reduced vowels in both languages helps to show some of the similarities and differences between them. Furthermore, the understanding of the different patterns of duration may help to understand differences in their rhythmic patterns.



24 Apr 2001

Dr Alexandra Vella (University of Malta)

A Preliminary Investigation of Intonational Variation across Dialects of Maltese

This talk reports preliminary work on the realisation of the nuclear statement tune in two dialects of Maltese. Working on the intonation of Maltese within an AM framework, Vella's (1995) conclusion regarding the nuclear statement tune is that there is a tendency for early alignment of H* with the stressed syllable. Intuitively, early alignment of H* seems to be more marked in the two dialects under investigation. This investigation is therefore intended mainly at examining the implementation of the nuclear statement tune in the two dialects of Maltese. It is hoped that descriptive work of this sort could serve as the basis for further analysis of the prosodic characteristics of both Standard Maltese and its dialects.



01 May 2001

Monica Tamariz

The intra-word distribution of information in Spanish

This research focuses on the optimum use of representational space by words in speech and in the brain (for the mental lexicon). We use the concept of entropy from information theory to plot the information profiles of the words in a system, and apply this method to different Spanish word systems: a citation vs. a fast-speech transcription; a dictionary lexicon vs. the speech lexicon and the word tokens in a speech corpus vs. the word types. We analyze the system taking phonemes as the basic units, and then features as the basic units to obtain the information profiles. Finally we discuss the implications for the mental lexicon.



08 May 2001

Dr Joyce McDonough (University of Rochester)

Tone and Intonation in Navajo

In this talk I will present some data from yes / no and focus constructions in Navajo as part of a preliminary investigation into the interaction of tone and intonation. Navajo has the dense tonal specification of a tone language, though its complex morphological and morpho-syntactic structure precludes easy typological tonal classification. The Navajo verb is characterized by final prominence, the word final stem is marked by acoustic properties commonly associated with stress: longer, louder segments and pitch range expansion, resulting in a striking end-prominent profile [McDonough, Anthro. Ling. 41.4, 503-539 (1999)]. Preliminary studies indicated tonal sensitivity to the morphological domains in the word, pitch range expansion in the final syllable, no apparent boundary tones, no declination/downstep of H^Òs across the domains of the word. Research has shown that tone languages may use intonational and/or stress-related strategies (boundary tones, pitch-range expansion) to mark focus and yes/no questions. To investigate the acoustic properties of these constructions in Navajo, 12 native speakers were recorded reciting short statements followed by either yes/no questions (3 types) or focus constructions. Both type constructions in Navajo are marked by pro and enclitic-like particles on/surrounding the (non-utterance final) NP's. Results indicate no differences in the F0 contour between statements and the contrasting constructions, though the interpretation of the results is dependent on the analysis of the structures. This fact about intonation is arguably related to the NP's non-argument status in the grammar, possibly indicating a consequential interaction of morphology, morpho-syntax and intonation in these type languages.



28 May 2001

Dr Colleen Fitzgerald (SUNY at Buffalo)

Representations of Rhythm and Clash in Tohono O'odham

Previous descriptions of Tohono O'odham (formerly Papago), a Uto-Aztecan language spoken in Arizona, shows that words avoid stress clash in phonological words and in clitic groups (Fitzgerald 1997a,b, c). More recent work on another dialect of O'odham, the Western dialect, reveals different findings. This Western dialect of O'odham allows extensive stress clash, while the more Central/Eastern dialect (the subject in the previous studies) does not. This paper examines the extent to which clash is permitted in the Western dialect. Interestingly, the Western dialect of clash permits stresses on morphemes that are unstressed in the Central dialect. The Optimality Theoretic analysis offered here can account for dialect variation merely by reranking constraints.

However, the data from the Western dialect raises the issue about what Optimality Theory can tell us about language, especially if OT continues as a representationally impoverished theory. This issue relates to the type of stress clash permitted in this dialect. There is a growing cross-linguistic literature on how languages treat clashing stresses, either in words or phrases. Clash-based analyses, at least before Optimality Theory, often relied on grid theory to represent the relationship between stressed and unstressed syllables. In Optimality Theory, rhythmic constraints are invoked, but there is often no discussion of rhythmic representations. In this paper, I use data from the Western dialect of Tohono O'odham to inform us about the representation of rhythm within OT. The analysis has interest beyond OT because of the key generalization in this dialect: primary stresses are never allowed to be in a clash relationship, while subsidiary stresses can clash. A version of OT without a theory of how to construct constraints and without a theory of representations fails to tell us about the novelty of this generalization.



05 Jun 2001

Dr Hubert Truckenbrodt (Rutgers University)

On register levels and structure



12 Jun 2001

Vepa Jithendra & John Niekrasz

Unit Selection Synthesis: Better Concatenation costs using LSFs and MCA    (Vepa Jithendra)

This talk focuses on the use of different acoustic features to compute concatenation (join) costs in unit-selection based concatenative speech synthesis. We compare two methods for join cost computation:

  1. Line Spectral Frequencies(LSFs), which are derived from an all-pole model of speech.
  2. formant frequencies and bandwidths obtained from Multiple Centroid Analysis(MCA) of the speech power spectrum.
We present our results in the form of pairs of speech files synthesised using each of the of above features to compute the join costs. The talk concludes with some pointers to future work.

Applying Speech Technology to Singing with the Singing TIMIT corpus    (John Niekrasz)

It is often forgotten in voice research that speech represents only a part of the wide range of sounds produced by the human voice. My research as a PhD student at CSTR aims to broaden this traditional vision of voice research to include another major set of vocal sounds: singing.

Many of the important initial discoveries made in voice science, such as the source-filter model, are now the obvious foundation for research being done today is all fields of speech science. However, with the ever expanding power of computers, the increasing accessibility of speech corpora, and the existence of marketable applications, speech technology research today is now dominated by massive data-driven algorithms which aim to extract detailed information from these large collections of recorded speech. While this approach is clearly useful, particularly for research toward the improvement of large-vocabulary speech recognition systems, the incomplete representation of all possible vocal sounds, including linguistic sounds, in the data precludes this research from fully modeling the vocal mechanism in completely abstract ways.

With the above in mind, I have set out to create the Singing TIMIT corpus, an attempt to create a set of labeled English singing for the purpose of data-driven voice research. Very similar to the existing widely used TIMIT corpus, Singing TIMIT contains the same phonetically compact set of sentences as TIMIT with a similar labeling scheme. It, however, also strives to control and broaden the scope of other interesting variables such as fundamental frequency, syllable duration, and loudness, which are all explicitly controlled to a degree in singing. The expansion of some of the important variables beyond their normal ranges during speech, the ability to control these variables to a degree through the magic of written music, and the complete inclusion of the same linguistic richness contained in the original TIMIT corpus, could potentially lead to data-driven analyses which capture more meaningful, abstract features of the voice. I hope to initiate such research in the subsequent years of my PhD, and then finally explore research specific to singing technology such as singing transcription and synthesis as the obvious next step with the accessibility of such a corpus.



[back to PWorkshop Archives]

<owner-pworkshop@ling.ed.ac.uk>