The Centre for Speech Technology Research, The university of Edinburgh

Zhang Le

Project Summary

Acoustic Modelling with Dynamic Bayesian Networks

Project Details

Acoustic Modelling is embedded in a context that may look back over as many as forty years of cunning experiment and elaborate theory. Yet, to date, the state-of-the-art ASR system was still built on Hidden Markov Models, a relatively simple model that has been used for nearly three decades.

The success of HMMs can be attributed to its simple structure and efficient parameter estimating algorithm, despiting the fact that their basic mathematical assumptions evidently violates many phonetic knowledge about speech. However, the limited expressive power offered by conventional HMMs makes characterizing the dynamic spectral properties of human speech a difficult task.

The recent development of various probabilistic graphical modelling in general, and Dynamic Bayesian Networks in particular, has intensified the interest in applying DBNs to statistical speech modelling problems. The flexible graphical representation of DBNs, as well as the general inference algorithms (GEMs alike), provide a principled way of modelling complex dynamic events in acoustic models.

We hope this research will deepen our understanding of the statistical speech processing enterprise, which inevitably brings with it some insight into the nature of human speech production process.
