WWW pages of 3rd European Master School on Language and Speech

Dynamic Bayesian Network-Based Speech Recognition with Energy as Auxiliary Variable

Jaume Escofet Carmona
(Universitat Politècnica de Catalunya)

The energy is a fundamental property of the speech signal and carry a lot of significant information which should be exploited in applications like automatic speech recognition (ASR).

Traditionally the energy has been tried in ASR systems by appending it to the standard feature vectors and this leads to significant degradation in the recognition performance. In my work, I use short-term energy as an auxiliary variable, i.e., the emission probability distributions are conditioned upon this feature. I will also present the benefits of using it in the training step but marginalizing it out during recognition.

Since this is not obvious to do with standard hidden Markov models (HMMs), this work has been performed in the framework of dynamic Bayessian networkd (DBNs), providing more flexibility in defining the topology of the emission probability distributions and in specifying whether variables should be marginalized out or not.

I have performed several experiments based on this approach. Results indicate that the recognition performance can be improved by using energy as an auxiliary variable.