Modelling global information with Latent Semantic Analysis

Detecting unknown words in spontaneous speech

Josep Casarramona

This presentation deals with the problem of Out-of-Vocabulary words. It is an important issue in a speech recognizer to detect those words which are probably wrongly recognized. A singular case are words that are not included in the vocabulary. The speech recognizer performs in the normal way when an OOV word is presented: it searches in its vocabulary which word better matches what it has heard, so that the resulting hypothesis will be an in-Vocabulary word. But in the process of the recognition it might be possible to detect some indicators of this bad recognition. In this work it is intended to collect a number of such features, which are introduced into a classifier and finally a confidence measure for each word is obtained. We extend the framework of confidence estimation so that for each word a probability of belonging to one of the three classes COR, INC and OOV is obtained. It is seen here that it is possible to obtain promising results with this approach and it is also seen that the alignment, the classifier and the structure of the classification are very important in order to obtain good results.