The Centre for Speech Technology Research, The university of Edinburgh

23 May 2005

Geoff Morrison (University of Alberta)


Logistic regression modelling of cross-language perception data

Researchers in cross-language speech perception are often interested in delineating the boundaries between phoneme categories in each of the languages under investigation. Researchers may wish to compare vowel spaces to predict how monolingual speakers of one language will classify the sounds of the other language. They may also wish to compare the perceptual boundaries of L2 learners with the boundaries of native speakers of the L2 and monolingual speakers of the L1. Other issues involve the relative weighting of multiple acoustic cues, and the crispness or fuzziness of the boundaries. A typical experimental design involves listeners identifying the phonemes they hear in a synthetic continuum. This results in proportional data: for each stimulus, a proportion of the responses are category x, a proportion are category y, etc.. Logistic regression is a statistical technique for modelling this type of data (see Pampel, 2000; Menard, 2002; Hosmer & Lemeshow, 2000) and has been successfully applied to L1- English perception data (e.g., Nearey 1990, 1997; Benki, 2001). The resulting logistic regression coefficients indicate the weighting of each acoustic cue and crispness of the boundary. They can be used to generate graphical representations of the perceptual space, and to calculate the location of categorical boundaries. They can also be used as dependent variables in statistical tests comparing listener groups. This presentation demonstrates some of the benefits of logistic regression based graphical representation and statistical tests for L1-Spanish and L2-English perception data. A graphical representation of L1-Spanish vowel perception will be presented based on data from Alvarez Gonzalez (1980). Escudero & Boersma (2004) analysed L1-English and L1-Spanish L2-English listeners perception of an English /Sip/ /SIp/ continuum varying in spectral properties and duration. A comparison will be made between their graphical representations and those produced by logistic regression modelling. A comparison will also be made between statistical tests based on their relatively crude reliance measures (also used in Flege, Bohn, & Jang, 1997), and tests based on logistic regression coefficients. A recent study by the presenter also analysed L1-English and L1-Spanish L2-English listeners perception of a continuum varying in spectral properties and duration, but with a more complex response set: /bit/ /bid/ /bIt/ /bId/ /bEt/ /bEd/. A summary of the logistic regression analysis of this data will be presented.

[back to PWorkshop Archives]

<owner-pworkshop@ling.ed.ac.uk>