14 Nov 2000

Dr. Tae-Yeoub Jang

Phonetics of `Segmental F0' and its application for Korean ASR

The main goal of the study is to improve performance of Korean automatic speech recognition by exploiting the fundamental frequency (F0) of vowels, which is affected by identity of the preceding consonant. The hypothesis is that if the vowel F0 is given, the identification of the consonant can be more accurate. The effect, which I will call the `segmental F0 effect', has been confirmed by many phonetic studies across various languages. Most frequently, the F0 value of a vowel has been suggested to be a cue to the voiced/voiceless distinction of the preceding consonant. In Korean, segmental F0 can be useful for differentiating three manners (lax, tense, and aspirated) of stop and affricate articulation. Earlier phonetic studies have found that F0 of a vowel onset becomes higher after strong stops (eg., tense and aspirated sounds) and lower after lax stops. It is also suggested that this effect is more salient in Korean than European languages like English and French.

If the segmental F0 effect is going to be helpful for speech recognition, it has to be detectable outside the carefully controlled data used for phonetic studies. I show that automatic measurements over a large amount of data can also capture the effect. Other related issues regarding segmental perturbation which have not been dealt with in earlier studies are also investigated. Integration of the segmental F0 effect with speech recognition is achieved using demisyllables as basic recognition units. As some demisyllables are composed of both an onset consonant and the front part of the nucleus, it is relatively easy for them to carry characteristics of the consonant-vowel relation, such as segmental F0, on their own. Besides, I find that an HMM demisyllable based recogniser performs better than a baseline HMM recogniser with phone-like units even before F0 is included. Thus, using demisyllables in Korean speech recognition has an independent motivation. I show that inclusion of F0 in the demisyllable recogniser gives further improvement in results.

[back to PWorkshop Archives]

<owner-pworkshop@ling.ed.ac.uk>