Espresso

Project Summary

Novel acoustic models for ASR

Project Details

The ESPRESSO Project:
Novel methods for automatic speech recognition

The Espresso project is investigating new models for automatic speech recognition. Our motivation comes from:

the fact that the performance of conventional HMM-based recognisers has reached a plateau
a desire for models which account for the underlying processes of speech production in a more elegant and effective way than context-dependent phone models.

Our approach is in contrast to that taken in the development of tied-parameter context-dependent phone HMMs. Rather than try and model every phone in every context, and resort to parameter sharing to make parameter estimation possible, we are looking for models which capture context-dependency without an explosion in the number of free parameters.

Espresso I

The first phase of the project investigated the use of phonetically featured syllables for speech recognition. We concentrated on the automatic detection of phonetic or phonological features from the speech signal. For more information, including publications and grant reports, go to the Espresso I page.

Espresso II

In the second phase of the project, we examined new acoustic models for speech recognition which account for the continuous, asynchronous nature of the speech signal. These models - linear dynamical systems (LDMs), also known as Kalman filters - have a continuous hidden state which reflects the fact that the human speech production mechanism is itself continuous and not finit state.

Espresso III

In the latest phase of this project, we will be extending the power of the LDM models from the previous phase by looking at ways of automatically learning the unit inventory (the previous phase used phone units, which are probably not optimal for this type of model).

Contact Simon King for more details.