Publications by Cassie Mayo
M. Cooke, C. Mayo, C. Valentini-Botinhao, Y. Stylianou, B. Sauert, and Y. Tang.
Evaluating the intelligibility benefit of speech modifications in
known noise conditions.
Speech Communication, 55:572-585, 2013.
[ bib |
The use of live and recorded speech is widespread in applications where correct message reception is important. Furthermore, the deployment of synthetic speech in such applications is growing. Modifications to natural and synthetic speech have therefore been proposed which aim at improving intelligibility in noise. The current study compares the benefits of speech modification algorithms in a large-scale speech intelligibility evaluation and quantifies the equivalent intensity change, defined as the amount in decibels that unmodified speech would need to be adjusted by in order to achieve the same intelligibility as modified speech. Listeners identified keywords in phonetically-balanced sentences representing ten different types of speech: plain and Lombard speech, five types of modified speech, and three forms of synthetic speech. Sentences were masked by either a stationary or a competing speech masker. Modification methods varied in the manner and degree to which they exploited estimates of the masking noise. The best-performing modifications led to equivalent intensity changes of around 5 dB in moderate and high noise levels for the stationary masker, and 3-4 dB in the presence of competing speech. These gains exceed those produced by Lombard speech. Synthetic speech in noise was always less intelligible than plain natural speech, but modified synthetic speech reduced this deficit by a significant amount.
|||C. Mayo, V. Aubanel, and M. Cooke. Effect of prosodic changes on speech intelligibility. In Proc. Interspeech, Portland, OR, USA, 2012. [ bib ]|
|||M. Koutsogiannaki, M. Pettinato, C. Mayo, V. Kandia, and Y. Stylianou. Can modified casual speech reach the intelligibility of clear speech? In Proc. Interspeech, Portland, OR, USA, 2012. [ bib ]|
|||V. Aubanel, M. Cooke, E. Foster, M. L. Garcia-Lecumberri, and C. Mayo. Effects of the availability of visual information and presence of competing conversations on speech production. In Proc. Interspeech, Portland, OR, USA, 2012. [ bib ]|
C. Mayo, R. A. J. Clark, and S. King.
Listeners' weighting of acoustic cues to synthetic speech
naturalness: A multidimensional scaling analysis.
Speech Communication, 53(3):311-326, 2011.
[ bib |
The quality of current commercial speech synthesis systems is now so high that system improvements are being made at subtle sub- and supra-segmental levels. Human perceptual evaluation of such subtle improvements requires a highly sophisticated level of perceptual attention to specific acoustic characteristics or cues. However, it is not well understood what acoustic cues listeners attend to by default when asked to evaluate synthetic speech. It may, therefore, be potentially quite difficult to design an evaluation method that allows listeners to concentrate on only one dimension of the signal, while ignoring others that are perceptually more important to them. The aim of the current study was to determine which acoustic characteristics of unit-selection synthetic speech are most salient to listeners when evaluating the naturalness of such speech. This study made use of multidimensional scaling techniques to analyse listeners' pairwise comparisons of synthetic speech sentences. Results indicate that listeners place a great deal of perceptual importance on the presence of artifacts and discontinuities in the speech, somewhat less importance on aspects of segmental quality, and very little importance on stress/intonation appropriateness. These relative differences in importance will impact on listeners' ability to attend to these different acoustic characteristics of synthetic speech, and should therefore be taken into account when designing appropriate methods of synthetic speech evaluation.
Keywords: Speech synthesis; Evaluation; Speech perception; Acoustic cue weighting; Multidimensional scaling
Vasilis Karaiskos, Simon King, Robert A. J. Clark, and Catherine Mayo.
The blizzard challenge 2008.
In Proc. Blizzard Challenge Workshop, Brisbane, Australia,
[ bib |
The Blizzard Challenge 2008 was the fourth annual Blizzard Challenge. This year, participants were asked to build two voices from a UK English corpus and one voice from a Man- darin Chinese corpus. This is the first time that a language other than English has been included and also the first time that a large UK English corpus has been available. In addi- tion, the English corpus contained somewhat more expressive speech than that found in corpora used in previous Blizzard Challenges. To assist participants with limited resources or limited ex- perience in UK-accented English or Mandarin, unaligned la- bels were provided for both corpora and for the test sentences. Participants could use the provided labels or create their own. An accent-specific pronunciation dictionary was also available for the English speaker. A set of test sentences was released to participants, who were given a limited time in which to synthesise them and submit the synthetic speech. An online listening test was con- ducted, to evaluate naturalness, intelligibility and degree of similarity to the original speaker.
|||F. Gibbon and C. Mayo. Adults' perception of conflicting acoustic cues associated with epg-defined undifferentiated gestures. In 4th International EPG Symposium, Edinburgh, UK., 2008. [ bib ]|
Robert A. J. Clark, Monika Podsiadlo, Mark Fraser, Catherine Mayo, and Simon
Statistical analysis of the Blizzard Challenge 2007 listening
In Proc. Blizzard 2007 (in Proc. Sixth ISCA Workshop on Speech
Synthesis), Bonn, Germany, August 2007.
[ bib |
Blizzard 2007 is the third Blizzard Challenge, in which participants build voices from a common dataset. A large listening test is conducted which allows comparison of systems in terms of naturalness and intelligibility. New sections were added to the listening test for 2007 to test the perceived similarity of the speaker's identity between natural and synthetic speech. In this paper, we present the results of the listening test and the subsequent statistical analysis.
|||C. Mayo, R. A. J. Clark, and S. King. Multidimensional scaling of listener responses to synthetic speech. In Proc. Interspeech 2005, Lisbon, Portugal, September 2005. [ bib | .pdf ]|
|||C. Mayo and A. Turk. The influence of spectral distinctiveness on acoustic cue weighting in children's and adults' speech perception. Journal of the Acoustical Society of America, 118:1730-1741, 2005. [ bib | .pdf ]|
|||C. Mayo and A. Turk. No available theories currently explain all adult-child cue weighting differences. In Proc. ISCA Workshop on Plasticity in Speech Perception, London, UK, 2005. [ bib | .pdf ]|
|||C. Mayo and A. Turk. The development of perceptual cue weighting within and across monosyllabic words. In LabPhon 9, University of Illinois at Urbana-Champaign, 2004. [ bib ]|
|||C. Mayo and T. Turk. Adult-child differences in acoustic cue weighting are influenced by segmental context: Children are not always perceptually biased towards transitions. Journal of the Acoustical Society of America, 115:3184-3194, 2004. [ bib | .pdf ]|
|||C. Mayo and A. Turk. Is the development of cue weighting strategies in children's speech perception context-dependent? In XVth International Congress of Phonetic Sciences, Barcelona, 2003. [ bib | .pdf ]|
|||C. Mayo, J. Scobbie, N. Hewlett, and D. Waters. The influence of phonemic awareness development on acoustic cue weighting in children's speech perception. Journal of Speech, Language and Hearing Research, 46:1184-1196, 2003. [ bib | .pdf ]|
|||C. Mayo, A. Turk, and J. Watson. Development of cue weighting strategies in children's speech perception. In Proceedings of TIPS: Temporal Integration in the Perception of Speech, Aix-en-Provence, 2002. [ bib ]|
|||C. Mayo, A. Turk, and J. Watson. Flexibility of acoustic cue weighting in children's speech perception. Journal of the Acoustical Society of America, 109:2313, 2001. [ bib | .pdf ]|
|||C. Mayo. The relationship between phonemic awareness and cue weighting in speech perception: longitudinal and cross-sectional child studies. PhD thesis, Queen Margaret University College, 2000. [ bib | .pdf ]|
|||C. Mayo. Perceptual weighting and phonemic awareness in pre-reading and early-reading children. In XIVth International Congress of Phonetic Sciences, San Francisco, 1999. [ bib | .pdf ]|
|||C. Mayo. The development of phonemic awareness and perceptual weighting in relation to early and later literacy acquisition. In 20th Annual Child Phonology Conference, Bangor, Wales, 1999. [ bib ]|
|||C. Mayo. The developmental relationship between perceptual weighting and phonemic awareness. In LabPhon 6, University of York, UK, 1998. [ bib ]|
|||C. Mayo. A longitudinal study of perceptual weighting and phonemic awarenes. In Chicago Linguistics Society 34, 1998. [ bib ]|
|||C. Mayo, M. Aylett, and D. R. Ladd. Prosodic transcription of glasgow english: an evaluation study of GlaToBI. In Intonation: Theory, Models and Applications, 1997. [ bib | .pdf ]|