The Centre for Speech Technology Research, The university of Edinburgh

Publications by Vipperla Ravi Chander

[1] Ravichander Vipperla, Maria Wolters, and Steve Renals. Spoken dialogue interfaces for older people. In Kenneth J. Turner, editor, Advances in Home Care Technologies. IOS Press, 2012. [ bib | .pdf ]
Although speech is a highly natural mode of communication, building robust and usable speech-based interfaces is still a challenge, even if the target user group is restricted to younger users. When designing for older users, there are added complications due to cognitive, physiological, and anatomical ageing. Users may also find it difficult to adapt to the interaction style required by the speech interface. In this chapter, we summarise the work on spoken dialogue interfaces that was carried out during the MATCH project. After a brief overview of relevant aspects of ageing and previous work on spoken dialogue interfaces for older people, we summarise our work on managing spoken interactions (dialogue management), understanding older people's speech (speech recognition), and generating spoken messages that older people can understand (speech synthesis). We conclude with suggestions for design guidelines that have emerged from our work and suggest directions for future research.

[2] Ravi Chander Vipperla, Steve Renals, and Joe Frankel. Augmentation of adaptation data. In Proc. Interspeech, pages 530-533, Makuhari, Japan, September 2010. [ bib | .pdf ]
Linear regression based speaker adaptation approaches can improve Automatic Speech Recognition (ASR) accuracy significantly for a target speaker. However, when the available adaptation data is limited to a few seconds, the accuracy of the speaker adapted models is often worse compared with speaker independent models. In this paper, we propose an approach to select a set of reference speakers acoustically close to the target speaker whose data can be used to augment the adaptation data. To determine the acoustic similarity of two speakers, we propose a distance metric based on transforming sample points in the acoustic space with the regression matrices of the two speakers. We show the validity of this approach through a speaker identification task. ASR results on SCOTUS and AMI corpora with limited adaptation data of 10 to 15 seconds augmented by data from selected reference speakers show a significant improvement in Word Error Rate over speaker independent and speaker adapted models.

[3] Ravi Chander Vipperla, Steve Renals, and Joe Frankel. Ageing voices: The effect of changes in voice parameters on ASR performance. EURASIP Journal on Audio, Speech, and Music Processing, 2010. [ bib | DOI | http | .pdf ]
With ageing, human voices undergo several changes which are typically characterized by increased hoarseness and changes in articulation patterns. In this study, we have examined the effect on Automatic Speech Recognition (ASR) and found that the Word Error Rates (WER) on older voices is about 9% absolute higher compared to those of adult voices. Subsequently, we compared several voice source parameters including fundamental frequency, jitter, shimmer, harmonicity and cepstral peak prominence of adult and older males. Several of these parameters show statistically significant difference for the two groups. However, artificially increasing jitter and shimmer measures do not effect the ASR accuracies significantly. Artificially lowering the fundamental frequency degrades the ASR performance marginally but this drop in performance can be overcome to some extent using Vocal Tract Length Normalisation (VTLN). Overall, we observe that the changes in the voice source parameters do not have a significant impact on ASR performance. Comparison of the likelihood scores of all the phonemes for the two age groups show that there is a systematic mismatch in the acoustic space of the two age groups. Comparison of the phoneme recognition rates show that mid vowels, nasals and phonemes that depend on the ability to create constrictions with tongue tip for articulation are more affected by ageing than other phonemes.

[4] Maria Wolters, Ravichander Vipperla, and Steve Renals. Age recognition for spoken dialogue systems: Do we need it? In Proc. Interspeech, September 2009. [ bib | .pdf ]
When deciding whether to adapt relevant aspects of the system to the particular needs of older users, spoken dialogue systems often rely on automatic detection of chronological age. In this paper, we show that vocal ageing as measured by acoustic features is an unreliable indicator of the need for adaptation. Simple lexical features greatly improve the prediction of both relevant aspects of cognition and interactions style. Lexical features also boost age group prediction. We suggest that adaptation should be based on observed behaviour, not on chronological age, unless it is not feasible to build classifiers for relevant adaptation decisions.

[5] Ravi Chander Vipperla, Maria Wolters, Kallirroi Georgila, and Steve Renals. Speech input from older users in smart environments: Challenges and perspectives. In Proc. HCI International: Universal Access in Human-Computer Interaction. Intelligent and Ubiquitous Interaction Environments, number 5615 in Lecture Notes in Computer Science. Springer, 2009. [ bib | DOI | http | .pdf ]
Although older people are an important user group for smart environments, there has been relatively little work on adapting natural language interfaces to their requirements. In this paper, we focus on a particularly thorny problem: processing speech input from older users. Our experiments on the MATCH corpus show clearly that we need age-specific adaptation in order to recognize older users' speech reliably. Language models need to cover typical interaction patterns of older people, and acoustic models need to accommodate older voices. Further research is needed into intelligent adaptation techniques that will allow existing large, robust systems to be adapted with relatively small amounts of in-domain, age appropriate data. In addition, older users need to be supported with adequate strategies for handling speech recognition errors.

[6] Ravichander Vipperla, Steve Renals, and Joe Frankel. Longitudinal study of ASR performance on ageing voices. In Proc. Interspeech, Brisbane, 2008. [ bib | .pdf ]
This paper presents the results of a longitudinal study of ASR performance on ageing voices. Experiments were conducted on the audio recordings of the proceedings of the Supreme Court Of The United States (SCOTUS). Results show that the Automatic Speech Recognition (ASR) Word Error Rates (WERs) for elderly voices are signi´Čücantly higher than those of adult voices. The word error rate increases gradually as the age of the elderly speakers increase. Use of maximum likelihood linear regression (MLLR) based speaker adaptation on ageing voices improves the WER though the performance is still considerably lower compared to adult voices. Speaker adaptation however reduces the increase in WER with age during old age.