The BARKS project is exploring the use of switching linear dynamic models for automatic speech recognition.
BARKS stands for 'Better Recognition through Kalman Switching'
This project concerns the application of linear dynamic models (LDM) to automatic speech recognition (ASR). The LDM is a generative model which gives a time-varying multivariate Gaussian distribution over the observations. Underlying dynamics are modelled by the state, which evolves according to a first-order auto-regressive (AR) process. The potential benefits of using such a model for ASR compared to hidden Markov models (HMM) include:
- first-order dynamics of state gives a model of inter-frame correlations.
- spatial correlations can be modelled fully or approximated via projection of lower dimensional state.
- passing state information across phone boundaries relaxes the assumption of segmental independence.
- continuous underlying representation reflects known properties of speech production.
Previous work has demonstrated a benefit from the addition of a hidden dynamic state. The current project extends this by developing a switching system which will allow:
- multimodal output distributions without introducing problems of computational intractability.
- approximation of non-linear dynamics whilst retaining the
linear-Gaussian properties which make filtering simple.
The Engineering and Physical Scienc Research Council (EPSRC grant GR/S21281/01)