Modelling Speech Dynamics with Trajectory-HMMs

This page contains information on a PhD project completed at Edinburgh, with downloadable papers, source code, and sample programs.

Thesis

Thesis: Modelling Speech Dynamics with Trajectory-HMMs (pdf).

References

Le Zhang and Steve Renals. Acoustic-Articulatory Modelling with the Trajectory HMM. IEEE Signal Processing Letters, 15:245-248, 2008. pdf 
Le Zhang and Steve Renals. Phone Recognition Analysis for Trajectory HMM. In Proc. Interspeech 2006, Pittsburgh, USA, September 2006. pdf 

Source Code

The source code (written in C) for training, decoding and scoring Trajectory-HMMs can be obtained from trajectory-20090427.tar.bz2. They were used in my PhD project and are now available for the interest of general public under BSD license. The code and program are provided AS IS, so there is no support.

Binary and Sample Programs

Statically-linked Binary

Pre-built statically-linked binaries for Linux are included in the source tar ball, which includes trajectory_train, trajectory_score and trajectory_decode for performing training, scoring and decoding a Trajectory-HMM in HTK's model format. These tools can handle monophone HMMs built with HTK. The training program can also perform simple triphone update as used in Chapter 5 of the thesis. The decoding algorithm can handle Bigram network built by HBuild, although only phoneloop network was used in the experiments. In addition, an hmm_decode is provided to do normal HMM token-passing inference (compatible with HVite, albeit slower).

Sample Programs

Sample script and data for training, scoring, force-aligning or decoding a Trajectory-HMM can be obtained from trajectory_example.tar.bz2. The data are 14 channel EMA data processed from MOCHA-TIMIT corpus, with delta and delta-deltas appended using a 3-frame dynamic window. The files are in HTK format and can be examined using HList.

Have fun!

April, 2009.