Speech recognisers produce a phoneme confusion matrix, which plots phonemes against the phonemes they are recognised as. This can be used to measure recogniser performance and highlight problem areas. The initial aim of my project is to use the phoneme confusion matrix to devise a metric to express how acoustically similar or different a new word is from existing entries in a grammar, i.e., a measure of confusability.
Using the understanding of common substitution errors and how words are misrecognised, it is proposed that large vocabularies can be reduced to subsets of less common words, identified by a common word. Word frequency statistics for a language can be used to identify the common word set, and the devised metric is then used to map phonetically similar uncommon words to entries in the common word set. If a speech recogniser is then configured for the "common" vocabulary lexicon, giving speed and performance improvements over larger lexicons, it is hoped that it will still be possible to identify uncommon words spoken by a user using the mappings. For example, if the recogniser outputs a list of 5 possible candidates for a spoken utterance, the mapping can then be used to look up all uncommon words for each candidate, and add these to the recognition output.
The objective of my project is to be able to reduce vocabulary size without impacting recogniser performance, by producing subsets of surnames. The goal of my project is to define the rules used for defining the subsets and to investigate automation of this subsetting process.