Modelling global information with Latent Semantic Analysis

LVCSR of meeting data

Martin Karafiat

This poster is focused in the building of recognizer for large vocabulary continuous speech recognition for meeting data. Basic feature extraction and HMM training methods are used but decoding algorithm is adjusted for a minimization of computation time without waste of accuracy. The combination of best first decoding with cross-internal triphones and time synchronous decoding with cross-external triphones is implemented for this purpose. This poster also discusses the importance of selection and good balance of language models training data. All experiments are performed on ICSI meetings database.