The Centre for Speech Technology Research, The university of Edinburgh

PWorkshop Archives: Semester 2, 2004–2005

01 Feb 2005

Andrew Wedel (University of Arizona)

Effective contrast and alternation

Functionally, contrast can be thought of as a property that describes the extent to which a lexical entry can be quickly and accurately accessed over another in a given context. A large psycholinguistic literature has shown that the speed and accuracy of lexical access is dependent on a number of factors, three of which will be of interest here. First, access of a lexical entry is more efficient when the lexical entry shows high phonemic contrast, where phonemic contrast is defined in terms of the actual number of other phonemically similar entries in the lexicon (Luce and Pisoni 1998). For example, in English, 'cat' is an example of a word with many such similar lexical neighbors, e.g., 'hat', 'pat', 'fat', 'cut', 'kit', 'cot', 'cap', 'can', etc. 'Orange', on the other hand, is an example of a lexical entry with no highly similar lexical neighbors. Within this model then, high phonemic contrast is synonymous with having few near lexical neighbors.

Second, high frequency of a lexical entry relative to its lexical neighbors strongly enhances the efficiency with which it can be accessed. For example, the highly frequent word 'cat' is accessed more readily than the less frequent word 'cot', even though they exhibit a similar degree of phonemic contrast. Third, lexical access of a morpheme has been shown to be more efficient if it does not exhibit alternation, that is, if it surfaces in the same form in all contexts (Tsapkini et al. 1996). We use the term effective contrast to refer to the net effect of the composite influences on lexical access efficiency. Here, I'll argue that the hypothesis that lexicons evolve under pressure to optimize effective contrast, rather than actual or potential phonemic contrast, allows us to better understand two otherwise apparently disparate contrast patterns: (i) the root-affix contrast asymmetry, and (ii) the suppression of alternation in small roots.

i) Typological evidence suggests that affixes tend to alternate more often than roots, and also tend to be drawn from a more restricted inventory (Macarthy and Prince 1995 and references therein). Therefore, from the point of view of lexical access, affixes are at a disadvantage relative to roots both through their greater tendency to alternate, and through restricted access to elements of phonemic contrast. Within OT, both of these phenomena have been accounted for by positing a universal 'meta-constraint' that Root-Faith must outrank Affix-Faith. However, under a model in which systems evolve to optimize effective contrast, the fact that affixes are much more frequent than roots (Segalowitz and Lane 2000) suggests a more deeply explanatory account. Under this model, we expect the very high frequency of affixes to mitigate the negative effects on lexical access of alternation or low phonemic contrast. For roots, with their lower average frequency, uniform surface forms and higher phonemic contrast should be relatively more important. If a effective contrast is optimized within the lexicon, the difference in frequency between roots and affixes should lead to a pattern in which roots tend to show greater phonemic contrast and less alternation than affixes.

(ii) A survey of the distribution of final devoicing, vowel reduction, and palatalization alternations across nouns and adjectives in Turkish, Catalan and Czech show that in these languages, alternation is significantly less common in roots of fewer than 4 phonemes. Unlike the case of roots versus affixes, this asymmetry cannot be explained on the basis of frequency, because smaller morphemes tend to be more frequent than larger morphemes, which would predict the opposite pattern. However, the number of close lexical neighbors is strongly, inversely correlated with root size. On average then, small roots are less phonemically contrastive than larger roots. Given the negative impact of low phonemic contrast on effective contrast, alternation may be relatively more costly in small relative to large roots.



04 Feb 2005

Andrew Wedel (University of Arizona)

Self-organization and the origin of higher-order phonological patterns

Generative models of phonology account for output patterns through a complex grammar algorithm applied over a passive lexicon. However, many complex patterns in the natural world can be successfully explained as the gradual accumulation of structure through repeated local interactions (Nicolis & Prigogine 1977).

Language change often proceeds by analogy, where the more similar two forms are, the more likely they are to become more similar in some other respect (e.g., Bybee 1985). A range of models account for these observations based on the premise that 1) even predictably derivable output forms can be represented in the lexicon (Butterworth 1983, Tenpenny 1995, Baayen, Dijkstra, and Schreuder 1997), and 2) that all forms in the mental lexicon are associated in a web of connections, where the strength of each connection depends in part on similarity (cf. Chandler, in press). Under these models, differential connection strengths between lexical items can feed language change, for example, by differentially biasing production or perception errors. Here, I show that when similarity-dependent connections between lexical entries probabilistically influence lexical output form, patterns described by the Optimality Theoretic (Prince and Smolensky, 1993) principles of (i) constraint dominance, and (ii) strict constraint dominance rapidly arise, driven by competition between leveling pressures within the lexicon and differentiating pressures from lexicon-external performance biases. Importantly, this model predicts that even if performance biases are additive (as seems desirable if markedness is grounded in the physical properties of articulation, perception, and processing), they may yet be manifested in lexical patterns as if they were not. Further, the model predicts occasional entrenchment of fortuitous, 'unnatural' patterns through analogical extension (cf. Garrett and Blevins, in press).



08 Feb 2005

Mariko Sugahara & Alice Turk

Sublexical Constituent Duration Adjustments

One of the universal aspects of speech is that speakers manipulate acoustic parameters to mark linguistic boundaries. For example, it is well-documented that word-final segments and syllables are longer when immediately followed by higher order linguistic boundaries such as sentence/utterance and phrase boundaries than when followed by no such boundaries (Cooper & Cooper; Whiteman et al 1992 among many others). Recent work by Turk and colleagues (Turk & Shattuck-Hufnagel 2000; Turk & Sawusch 1997; Turk & White 1999) and Beckman & Edwards (1990) have further shown that segmental duration adjustment is also affected by the presence or absence of a word-level constituent boundaries in two-word sequences. Those ³near-boundary² duration adjustments are then taken as good evidence for the prosodic constituents proposed at and above the word-level. It is, however, not yet clear whether such near-boundary duration adjustment is found at sub-lexical (within-word) levels, especially at within-word feet and at within-word prosodic words. The main goal of this study is to examine whether there is any duration-based evidence for those sub-lexical constituents in English. We obtained results supporting the within-word prosodic word boundary while no supporting evidence was found for the within-word foot.



22 Feb 2005

Bob Ladd

Effects of syllable structure on F0/segmental alignment: new data from RP and Scottish English

Since the discovery in the mid 1990s of "segmental anchoring" - regularities in the way F0 turning points are aligned with identifiable landmarks in the segmental string - a number of recent studies have tried to discover the nature of the phonological structures and phonological domains that affect alignment. Ladd, Mennen & Schepman 2000 (JASA) reported a difference in the alignment of rising prenuclear accents in Dutch that is caused by phonological vowel length even in the absence of any difference of phonetic vowel duration. On this basis they argued that the true cause of the difference is syllable structure (which is affected by phonological vowel length) rather than vowel length per se. However, subsequent studies (including a recent paper on Dutch *nuclear* accents by Schepman, Lickley & Ladd) make the syllable structure account harder to maintain. New data from English, to be reported here, also argue against a simple autosegmental analysis in which tones are associated with syllable boundaries. However, they may be consistent with a gestural model in which accentual pitch movements are in some way coordinated with syllables and/or words.



08 Mar 2005

Marianne Pouplier

Articulatory phonology: A guided tour

The tour will start with a review of the basic ideas behind a gestural approach to phonetics and phonology, and then proceed to discuss some of the new developments in the area. The latter include the pi-gesture model of prosodic boundary effects (Byrd & Saltzman 2003), new approaches to gestural structure (split gesture hypothesis; Nam 2004; Mooshammer 1998) and recent work that shows how templatic processes in Moroccan Colloquial Arabic (Gafos 2002) and transparency in Hungarian vowel harmony (Benus 2005) can be understood from a gestural perspective.



15 Mar 2005

Mits Ota

Frequency effects on the prosodic size and shape of 1- to 2-year-olds' word production

Infant perception studies suggest that the capacity to track phonological pattern frequency is present as early as 8 months (Jusczyk, Luce, & Charles-Luce, 1994; Saffran, Aslin, & Newport, 1996). However, it is still not clear how this precocious ability eventually leads to the type of probabilistic phonological knowledge we find in adult speakers. One way to investigate this question is to look at frequency effects manifested in the phonology of slightly older children. To this end, this study has examined different types of frequency effects on the truncation rates in words produced by 1- to 2-year-old Japanese-speaking children. The analysis so far suggests that the statistical information most relevant to the prosodic size and shape of their word production is the token frequencies of similar prosodic structures in the input. Given the bulk of evidence indicating that adult speaker's knowledge of phonological well-formedness is closely tied to structural *type* frequency (e.g., Vitevich, Luce, Charles-Luce & Kemmerer, 1997; Hay, Pierrehumbert & Beckman, 2003), this finding suggests an interesting difference in statistical induction between young children and adults.



29 Mar 2005

Maria Wolters (QMUC and HCRC) and Robin Lickley (QMUC)

Disfluencies and Dysfluencies in Parkinson's Disease — A Pilot Study


05 Apr 2005

Tim Mills

Putting cameras up people's noses

In normal speech, the vocal folds are abducted (drawn apart) from their voiced position for voiceless sounds. In my current study, I ask whether there are glottal "devoicing" (abduction) gestures in whispered speech. To answer this question, I have recorded minimal pairs (such as peer/beer, fear/veer) using a nasal endoscope. This talk reviews the methodology of acquiring such data and presents a scheme for measurement and analysis of the audiovisual data acquired.



19 Apr 2005

Linguistics Circle Talk



10 May 2005

Hiroshi Shimodaira

Eyesfree Handwriting Interface for Wearable Computing

As mobile and wearable computing devices have become popular, a number of text input interfaces as substitutes for keyboard interface have been developed. If we assume much smaller devices than PDAs, i.e. cellular phones or a small touch-pad attached to human bodies for wearable computing, handwriting interface would be the most natural interface widely accepted by the users, while button interface is preferred by a limited number of users.

To take full advantage of handwriting interface, we assume that the users are allowed to write characters continuously without watching them. Under the assumed condition, characters are written one after another without pauses on a small writing area, with the result that written characters are heavily distorted and overlaid each other. Conventional handwriting recognition engines, however, can not handle this sort of handwritings. To tackle the problem, we employ the state-of-the-art automatic speech recognition technologies, and develop special handwriting input devices for different styles of wearable computing.

In the talk, I will show a prototype system and several handwriting input devices, one of which is for handwriting in the air. In addition to that, I will introduce a handwriting interface for the visually impaired people, which is one of the applications of the eyesfree handwriting interface.



17 May 2005

Bruce Birch (University of Melbourne)

A Tour of Iwaidja Intonation

This talk is based on an acoustic analysis of monologue narratives in Iwaidja, a highly endangered Australian language. The most common intonation contours are presented to illustrate a provisional tonal inventory influenced by Gussenhoven's ToDI transcription of Dutch. Other intonational features such as the variable placement of pitch accents in multipedal words, and F0 variation in the context of the reduced pitch range found in plateau phrases are also discussed.



23 May 2005

Geoff Morrison (University of Alberta)

Logistic regression modelling of cross-language perception data

Researchers in cross-language speech perception are often interested in delineating the boundaries between phoneme categories in each of the languages under investigation. Researchers may wish to compare vowel spaces to predict how monolingual speakers of one language will classify the sounds of the other language. They may also wish to compare the perceptual boundaries of L2 learners with the boundaries of native speakers of the L2 and monolingual speakers of the L1. Other issues involve the relative weighting of multiple acoustic cues, and the crispness or fuzziness of the boundaries. A typical experimental design involves listeners identifying the phonemes they hear in a synthetic continuum. This results in proportional data: for each stimulus, a proportion of the responses are category x, a proportion are category y, etc.. Logistic regression is a statistical technique for modelling this type of data (see Pampel, 2000; Menard, 2002; Hosmer & Lemeshow, 2000) and has been successfully applied to L1- English perception data (e.g., Nearey 1990, 1997; Benki, 2001). The resulting logistic regression coefficients indicate the weighting of each acoustic cue and crispness of the boundary. They can be used to generate graphical representations of the perceptual space, and to calculate the location of categorical boundaries. They can also be used as dependent variables in statistical tests comparing listener groups. This presentation demonstrates some of the benefits of logistic regression based graphical representation and statistical tests for L1-Spanish and L2-English perception data. A graphical representation of L1-Spanish vowel perception will be presented based on data from Alvarez Gonzalez (1980). Escudero & Boersma (2004) analysed L1-English and L1-Spanish L2-English listeners perception of an English /Sip/ /SIp/ continuum varying in spectral properties and duration. A comparison will be made between their graphical representations and those produced by logistic regression modelling. A comparison will also be made between statistical tests based on their relatively crude reliance measures (also used in Flege, Bohn, & Jang, 1997), and tests based on logistic regression coefficients. A recent study by the presenter also analysed L1-English and L1-Spanish L2-English listeners perception of a continuum varying in spectral properties and duration, but with a more complex response set: /bit/ /bid/ /bIt/ /bId/ /bEt/ /bEd/. A summary of the logistic regression analysis of this data will be presented.



24 May 2005

Manchester Phonology practice talks

Bob Ladd

Vowel length in Scottish English: new data from the alignment of accent peaks

Koen Sebregts and James M Scobbie

From facts to phonology: an empirical study of rhotic allophony



31 May 2005

Joe Pater (University of Massachusetts, Amherst) and Andries Coetzee (University of Michigan)

Lexically specific constraints: gradience, learnability and perception

Lexically specific constraints are indexed versions of constraints that apply only when a morpheme that bears that index is evaluated by the grammar. They have been used in Optimality Theory to deal with exceptions (e.g. Pater 2000), and have also been applied to the lexical strata of Japanese and other languages (Fukuzawa 1999, Ito and Mester 2000). In this paper, we propose a further application of lexically specific constraints: to the analysis of gradient phonotactics (cf. Frisch et al. 2004). Markedness constraints are ranked according to the degree to which they are obeyed across the words of the language, with lexically specific constraints interspersed between them. We then show that rankings of this type can be learned with a relatively minor elaboration of the Biased Constraint Demotion Algorithm (Prince and Tesar 2004). Finally, we provide experimental evidence from speech perception, lexical decision tasks and acceptability judgments that language users are aware of such lexical patterns.



07 Jun 2005

Laurence White & Sven Mattys (University of Bristol)

Calibrating rhythm: cross-dialectal and cross-linguistic studies

The rhythm of so-called "stress-timed" languages such as English and Dutch has long been contrasted with that of "syllable-timed" languages like Spanish and French, but quantifying this distinction has proved difficult. "Stress-timed" languages have stressed vowels that are substantially longer than (typically reduced) unstressed vowels, whereas, in "syllable-timed" languages, vowel duration varies less between stressed and unstressed syllables. Additionally, "stress-timed" languages allow greater complexity in syllable onsets and codas. Ramus, Nespor and Mehler (1999) and Grabe and Low (2002) have proposed rhythm metrics which exploit these patterns to quantify rhythmic distinctions between languages.

Subjective impressions suggest that rhythm contributes to the perception of a speaker's accent as being non-native. We examined English spoken as a second languages (L2) by native Spanish and Dutch speakers, and Spanish and Dutch spoken as L2s by native English speakers, and compared these L2s on a range of rhythm measures with Spanish, English and Dutch spoken as a first language (L1). We found that the influence of L1 was manifest primarily in measures of vocalic durational variation. The overall balance of vocalic and intervocalic measures also differed between L1s and L2s, but not necessarily in the direction of L2 speakers' first language. Metrics relating to variation in consonantal intervals showed strong language dependence, regardless of speakers' linguistic origins.

Differences in rhythmicity may also be observed between dialects of a given language, arising either from segmental or suprasegmental variation. Dialects such as Bristolian manifest less vowel reduction and less contrast between the length of tense and lax vowels; dialects with pitch-peak delay, such as Welsh Valleys and Orcadian, may show levelling of the duration contrast between stressed and post-stressed syllables. We recorded speakers of several British dialects— Bristolian, Welsh Valleys, Orcadian and Shetland (which lacks pitch-peak delay, in contrast with Orcadian)—together with standard Southern British English. We found that certain vocalic metrics successfully captured the impressionistic observation of more "syllable-timed" rhythm in, for example, Orcadian or Welsh Valleys English. In particular, the standard deviation of vocalic intervals, normalised for speech rate, provided a discriminant gradient index of dialect rhythmicity, as does a measure of the relative proportions of vocalic and consonantal intervals. These results suggest that rhythmic variation is continuous rather than categorical.



[back to PWorkshop Archives]

<owner-pworkshop@ling.ed.ac.uk>