GlobalPhone MLPs
Project Summary
A freely available set of articulatory feature and phoneme MLPs trained on several languages from the GlobalPhone corpus.
Project Details
Multi-layer Perceptrons (MLPs) trained with the GlobalPhone corpus
This archive contains MLPs which will provide as their outputs the posterior probabilities for the speech labels listed below. Some useful scripts are also provided.- Phonemes
- Place of articulation
- Manner of articulation
- Nasality
- Voicing
- Vowel
- (vowel) Frontness
- (vowel) Height
- (lip) Rounding
- Stress
- Mandarin Chinese
- German
- (Brazilian) Portuguese
- Russian
- Spanish
- Swedish
- German, Portuguese and Spanish
- Portuguese, Spanish and Swedish
Using the nets:
We have included scripts that will allow you to generate the PLP features needed as inputs to the MLP and to pass those inputs forward through the net to obtain class posteriors.PLP generation
Speaker normalised PLPs can be generated with the following scripts.
ids_file
s contain one utterance id from the GlobalPhone corpus (e.g. GE010_21) per line.
wavmap
files contain an utterance id and a wavfile file path for that utterance on each line.
src/gen.base.sh -id <ids_file> \
-wavmap <ids_to_wavfiles> \
-bdir <base_dir> \
-config constants/PLP_E_D_A_Z.config
src/gen.spknorm.sh -feature PLP_E_D_A_Z \
-wavmap <ids_to_wavfiles> \
-odir <output_PLP_features> \
-config constants/PLP_E_D_A_Z.config \
-bdir <base_dir> \
-ids <ids_file>
MLP forward pass
The MLPs are created for use with quicknet.The HTK features files generated by the above will need to be converted to pfiles if they are to used in quicknet. <featfiles> is a text file containing utterance ids and paths to HTK feaure files, one pair per line.
src/features2pfile.pl -output train|dev|eval_PLP.pfile \
-ids train|dev|eval.utids \
-featfiles <featfiles> \
-verbosity 51 -feacat_args " -v "
The forward pass can be performed with the following script.
src/pfile2acts.notrans.sh -pad \
-threads <cores_available> \
-ifile <input_pfile> \
-ofile <output_pfile> \
-config <fwd_config_file>
Acknowledgement required
If you make use of these nets in any published work, you must cite Partha Lal's PhD thesis, as follows:
- Partha Lal (2011). Cross-lingual automatic speech recognition using tandem features. PhD thesis, The Centre for Speech Technology Research, Edinburgh University. Available from http://hdl.handle.net/1842/5773
Downloads
Download the nets here (1.1GB, md5 checksum 3ebb2ce636686cf7f3a96850dcbfca7e)Personnel
- [an error occurred while processing this directive]
- Simon King