The ESPRESSO Project: Students

Espresso I

Alex Strachan

Alex used HMMs for feature detection and compared their performance on both SPE and multivalued feature systems for his first degree in Artificial Intelligence and Linguistics final year project.

Stefanie Aalburg

Stefanie's MSc project examined surface form variation at the syllable level. One of the aims of the Espresso project is to account for systematic variations within the syllable models themselves, allowing pronuncation dictionaries to be written in terms of lexical syllables. Statistics about the types of surface form variation will be a valuable resource. Stefanie's MSc. in Speech and Language Processing dissertation was:

"Syllable-Based Multiple-Pronunciation Recognition with discrete Hidden Markov Models", MSc dissertation, Sept. 1998.

Todd Stephenson

Todd' first project used neural networks to perform the same tasks as Alex's HMMs. He then used the NN output for phone recognition using HMMs. His dissertation extended this to syllable modelling. State-tied triphone models were used for phone recognition, and a similar system for syllable recognition, both employing tying driven by decision trees. He investigated various HMM topologies for syllable models. Todd is now studying for a PhD at the Institut Dalle Molle d'Intelligence Artificielle Perceptive (IDIAP) with Hervé Bourlard (email Todd.Stephenson@idiap.ch). During his MSc. in Cognitive Science Todd produced:

"Artificial neural networks in recognition of phonetic features of speech" April 1998.
"Speech recognition of phones using feature streams" May 1998.
Dissertation: "Speech recognition using phonetically featured syllables" Sept. 1998.

Angela Michelfelder

Angela's MSc project investigated whether there is a "natural" segmentation of articulatory data - do the articulatory trajectories themselves suggest (bottom-up) a set of segments, and what relationship is there with a top-down definition, such as the syllable? Are syllable boundaries apparent in the articulatory data? What about phone boundaries?

Simon Ahern

Simon's MSc project investigated Government Phonology, which uses a system of around 8 primes to describe segments.

MSc. dissertation: "A Government Phonology Approach to Automatic Speech Recognition " Sept. 1999.

Espresso II

Joe Frankel

Joe's PhD project is investigating various aspects of linear dynamical models for ASR, including

phone classification
phone recognition
LDMs with parameters that switch for different temporal regions of each unit (i.e. a finite state switching process with a left-to-right linear topology)
LDMs with a state variable that is continuous across model boundaries
computational efficiency: caching and pre-computation; efficient search (A*); convergence of Kalman K; etc.

Dissertation due late 2002.

Fiona Couper

Fiona's MSc project was a pilot for Espresso III and investigated ways of enhancing the power of LDMs by introducing mutliple sets of model parameters, which are controlled by a simple finite state switching process with two parallel states (in contrast to the sequential states used by Joe above). We ultimately hope to use topologies that combine both features. The main focus of Fiona's project was how to train what is effectively a "mixture of LDMs" and she investigated three methods:

splitting the training data into two parts according to token likelihood given a single model
splitting the training data into two parts according to token duration
initialising two sets of model paramters by perturbing the parameters of a single models (c.f. HTK-style "mixing-up" for training mixture-of-Gaussian distributions)

Dissertation due Sept 2002.

Espresso III

Fiona Couper

Fiona's PhD will start early 2003 - watch this space!