A freely available set of articulatory feature MLPs trained on 2000 hours of conversational telephone speech.
Articulatory feature MLPs.
Background.This page gives information on a set of articulatory feature (AF) classification multi-layer perceptrons (MLPs) which were trained as part of the Johns Hopkins 2006 summer workshop. The resources required to generate compatible front-end parameters are given along with the MLP weights. As a starting point, the work is summarised in a paper which was presented at Interspeech 2007:
J. Frankel, M. Magimai-Doss, S. King, K. Livescu and
O. Cetin. Articulatory feature classifiers trained on 2000 hours of
telephone speech. Proc Interspeech 2007. pdf
A number of related papers came out of the workshop, and these
K. Livescu, A. Bezman, N. Borges, L. Yung, O. Cetin, J. Frankel, S. King, M. Magimai-Doss, X. Chi and L. Lavoie. Manual transcription of conversational speech at the articulatory feature level. Proc. ICASSP 2007. pdf
K. Livescu, O. Cetin, M. Hasegawa-Johnson, S. King, C. Bartels, N. Borges, A. Kantor, P. Lal, L. Yung, A. Bezman, S. Dawson-Haggerty, B. Woods, J. Frankel, M. Magimai-Doss and K. Saenko. Articulatory feature-based methods for acoustic and audio-visual speech recognition: Summary from the 2006 JHU Summer Workshop Proc. ICASSP 2007. pdf
O. Cetin, A. Kantor, S. King, C. Bartels, M. Magimai-Doss, J. Frankel and K. Livescu. An articulatory feature-based tandem approach and factored observation modeling. Proc ICASSP 2007. pdf
Acoustic parameterization.The input to the MLPs are PLP cepstra computed using HTK. In order to compute compatible front-end features, this HCopy config file should be used. Once the base parameters (12 PLP cepstra plus energy) have been computed, these should be mean and variance normalized on a per-speaker basis, then expanded to include first a second order derivatives, and scaled against this global variance. A script which takes care of this process is available on request.
MLP weights.Links to the MLP weight files are given in the table below.
|place||pl1||351, 1900, 10|
|degree||dg1||351, 1600, 6|
|nasality||nas||351, 1200, 3|
|rounding||rou||351, 1200, 3|
|glottal state||glo||351, 1400, 4|
|vowel||vow||351, 2400, 23|
|height||ht||351, 1800, 8|
|frontness||frt|| 351, 1700, 7
They are all in matlab binary format, as used by quicknet. Therefore,
if you wish to inspect or manipulate the weights, fire up matlab, and
load (e.g. the place MLP weights) using:
>> weights = load('pl1_win9_idim39_size351,1900,10_lr0.0001.wts', '-mat')
weights12: [1900x351 double]
bias2: [1x1900 double]
weights23: [10x1900 double]]
bias3: [1x10 double]
The weights for the input to hidden layer are in weights12, the bias on layer 2 is in bias2 and so on. Should you wish to manipulate weights and then save them again, this can be achieved in matlab as:
>> save -v4 my_new_weights_file weights12 bias2 weights23 bias3
MLP forward passThe Quicknet MLP toolkit was used to train the MLPs, and supports a forward pass in order to generate activations given acoustic input. The HTK features are not normalized to have zero mean and unit variance, so the norm file WS06_AFMLP_PLP.norm is supplied to quicknet.
Below is an example call to qnmultifwd. Here we assume the input PLPs (input.spknorm-plp.pfile) are in pfile format, and that pfile is also the format we would like the outputs written to. A log file is written to out.log.
ftr1_ftr_count=39 ftr1_window_len=9 window_extent=9 \
mlp_size=351,1900,10 mlp_output_type=softmax mlp_bunch_size=256 \
init_weight_format=matlab activation_format=pfile \
log_file=out.log ftr1_file=input.spknorm-plp.pfile \
Note that because of the 9-frame input windows on the MLPs, 4 frames at either end of each utterance are lost. If a set of activations are required with numbers of frames matching those of the inputs, then the PLPs for each utterance should be padded prior to running qnmultifwd. This can be done using feacat, e.g.:
feacat -ip pfile -op pfile -i plp.pfile -o plp.pad4.pfile -pad 4