The Centre for Speech Technology Research, The university of Edinburgh

Reproducible Research

Convolutional Neural Networks for Distant Speech Recognition

Paper information and status

P Swietojanski, A Ghoshal, and S Renals. "Convolutional Neural Networks for Distant Speech Recognition". Signal Processing Letters, IEEE, Volume:21 , Issue: 9 2014.

[ pdf | IEEE Xplore | bibtex]

Abstract

We investigate convolutional neural networks (CNNs) for large vocabulary distant speech recognition, trained using speech recorded from a single distant microphone (SDM) and multiple distant microphones (MDM). In the MDM case we explore a beamformed signal input representation compared with the direct use of multiple acoustic channels as a parallel input to the CNN. We have explored different weight sharing approaches, and propose a channel-wise convolution with two-way pooling. Our experiments, using the AMI meeting corpus, found that CNNs improve the word error rate (WER) by 6.5% relative compared to conventional deep neural network (DNN) models and 15.7% over a discriminatively trained Gaussian mixture model (GMM) baseline. For cross-channel CNN training, the WER improves by 3.5% relative over the comparable DNN structure. Compared with the best beamformed GMM system, cross-channel convolution reduces the WER by 9.7% relative, and matches the accuracy of a beamformed DNN.

Data

The train, development, and eval sets are defined below. These are the same as the sets called ``Full-corpus-ASR partition of meetings'' on the AMI Corpus page.

We use the AMI Annotations v1.6

Code

The recipe has been released with Kaldi Speech Recognition Toolkit . Look for egs/ami in the repository.

Contact

Drop an email to Pawel Swietojasnki (p.swietojanski@ed.ac.uk) in case experience any issues with the recpie or need any further infomration.