GlobalPhone MLPs

Project Summary

A freely available set of articulatory feature and phoneme MLPs trained on several languages from the GlobalPhone corpus.

Project Details

Multi-layer Perceptrons (MLPs) trained with the GlobalPhone corpus

This archive contains MLPs which will provide as their outputs the posterior probabilities for the speech labels listed below. Some useful scripts are also provided.

Phonemes
Place of articulation
Manner of articulation
Nasality
Voicing
Vowel
(vowel) Frontness
(vowel) Height
(lip) Rounding
Stress

The inputs to the nets are drawn from PLPs - details are given in Section 2.2.1 of Partha Lal's thesis. The exact set of values possible for each of the variables above varies with the language used to train the net. The language(s) used were:

Mandarin Chinese
German
(Brazilian) Portuguese
Russian
Spanish
Swedish
German, Portuguese and Spanish
Portuguese, Spanish and Swedish

Using the nets:

We have included scripts that will allow you to generate the PLP features needed as inputs to the MLP and to pass those inputs forward through the net to obtain class posteriors.

PLP generation

Speaker normalised PLPs can be generated with the following scripts.

ids_files contain one utterance id from the GlobalPhone corpus (e.g. GE010_21) per line.

wavmap files contain an utterance id and a wavfile file path for that utterance on each line.

src/gen.base.sh -id <ids_file> \ -wavmap <ids_to_wavfiles> \ -bdir <base_dir> \ -config constants/PLP_E_D_A_Z.config

src/gen.spknorm.sh -feature PLP_E_D_A_Z \ -wavmap <ids_to_wavfiles> \ -odir <output_PLP_features> \ -config constants/PLP_E_D_A_Z.config \ -bdir <base_dir> \ -ids <ids_file>

MLP forward pass

The MLPs are created for use with quicknet.

The HTK features files generated by the above will need to be converted to pfiles if they are to used in quicknet. <featfiles> is a text file containing utterance ids and paths to HTK feaure files, one pair per line.

src/features2pfile.pl -output train|dev|eval_PLP.pfile \ -ids train|dev|eval.utids \ -featfiles <featfiles> \ -verbosity 51 -feacat_args " -v "

The forward pass can be performed with the following script.

src/pfile2acts.notrans.sh -pad \ -threads <cores_available> \ -ifile <input_pfile> \ -ofile <output_pfile> \ -config <fwd_config_file>

Acknowledgement required

If you make use of these nets in any published work, you must cite Partha Lal's PhD thesis, as follows:

Partha Lal (2011). Cross-lingual automatic speech recognition using tandem features. PhD thesis, The Centre for Speech Technology Research, Edinburgh University. Available from http://hdl.handle.net/1842/5773

Downloads

Download the nets here (1.1GB, md5 checksum 3ebb2ce636686cf7f3a96850dcbfca7e)

Personnel

[an error occurred while processing this directive]
Simon King