The ESPRESSO Project: Students
Espresso I
Alex Strachan
Alex used HMMs for feature detection and compared their performance on
both SPE and multivalued feature systems for his first degree in
Artificial Intelligence and Linguistics final year project.
Stefanie Aalburg
Stefanie's MSc project examined surface form variation at the syllable
level. One of the aims of the
Espresso project is to account
for systematic variations within the syllable models themselves,
allowing pronuncation dictionaries to be written in terms of lexical
syllables. Statistics about the types of surface form variation will
be a valuable resource. Stefanie's MSc. in Speech and Language
Processing dissertation was:
- "Syllable-Based Multiple-Pronunciation Recognition with discrete
Hidden Markov Models", MSc dissertation, Sept. 1998.
Todd Stephenson
Todd' first project used neural networks to perform the same tasks
as Alex's HMMs. He then used the NN output for phone recognition
using HMMs. His dissertation extended this to syllable modelling.
State-tied triphone models were used for phone recognition, and a
similar system for syllable recognition, both employing tying
driven by decision trees. He investigated various HMM topologies
for syllable models. Todd is now studying for a PhD at the Institut
Dalle Molle d'Intelligence Artificielle Perceptive (IDIAP) with
Hervé Bourlard (email
Todd.Stephenson@idiap.ch).
During his MSc. in Cognitive Science Todd produced:
Angela Michelfelder
Angela's MSc project investigated whether there is a "natural"
segmentation of articulatory data - do the articulatory trajectories
themselves suggest (bottom-up) a set of segments, and what
relationship is there with a top-down definition, such as the
syllable? Are syllable boundaries apparent in the articulatory data?
What about phone boundaries?
Simon Ahern
Simon's MSc project investigated Government Phonology, which uses a
system of around 8
primes to describe segments.
Espresso II
Joe's PhD project is investigating various aspects of linear dynamical
models for ASR, including
- phone classification
- phone recognition
- LDMs with parameters that switch for different temporal regions of each unit (i.e. a finite state switching process with a left-to-right linear topology)
- LDMs with a state variable that is continuous across model boundaries
- computational efficiency: caching and pre-computation; efficient search (A*); convergence of Kalman K; etc.
Dissertation due late 2002.
Fiona's MSc project was a pilot for Espresso III and investigated ways
of enhancing the power of LDMs by introducing mutliple sets of model
parameters, which are controlled by a simple finite state switching
process with two parallel states (in contrast to the sequential states
used by Joe above). We ultimately hope to use topologies that combine
both features. The main focus of Fiona's project was how to train what
is effectively a "mixture of LDMs" and she investigated three methods:
- splitting the training data into two parts according to token likelihood given a single model
- splitting the training data into two parts according to token duration
- initialising two sets of model paramters by perturbing the parameters of a single models (c.f. HTK-style "mixing-up" for training mixture-of-Gaussian distributions)
Dissertation due Sept 2002.
Espresso III
Fiona's PhD will start early 2003 - watch this space!