Reproducible Research

Convolutional Neural Networks for Distant Speech Recognition

Paper information and status

P Swietojanski, A Ghoshal, and S Renals. "Convolutional Neural Networks for Distant Speech Recognition". Signal Processing Letters, IEEE, Volume:21 , Issue: 9 2014.

[ pdf | IEEE Xplore | bibtex]

Abstract

We investigate convolutional neural networks (CNNs) for large vocabulary distant speech recognition, trained using speech recorded from a single distant microphone (SDM) and multiple distant microphones (MDM). In the MDM case we explore a beamformed signal input representation compared with the direct use of multiple acoustic channels as a parallel input to the CNN. We have explored different weight sharing approaches, and propose a channel-wise convolution with two-way pooling. Our experiments, using the AMI meeting corpus, found that CNNs improve the word error rate (WER) by 6.5% relative compared to conventional deep neural network (DNN) models and 15.7% over a discriminatively trained Gaussian mixture model (GMM) baseline. For cross-channel CNN training, the WER improves by 3.5% relative over the comparable DNN structure. Compared with the best beamformed GMM system, cross-channel convolution reduces the WER by 9.7% relative, and matches the accuracy of a beamformed DNN.

Data

The train, development, and eval sets are defined below. These are the same as the sets called ``Full-corpus-ASR partition of meetings'' on the AMI Corpus page.

Train set: ES2002, ES2003, ES2005, ES2006, ES2007, ES2008, ES2009, ES2010, ES2012, ES2013, ES2014, ES2015, ES2016; IS1000, IS1001, IS1002 (no a), IS1003, IS1004, IS1005 (no d), IS1006, IS1007; TS3005, TS3006, TS3007, TS3008, TS3009, TS3010, TS3011, TS3012, EN2001, EN2003, EN2004a, EN2005a, EN2006, EN2009, IN1001, IN1002. IN1005, IN1007, IN1008, IN1009, IN1012, IN1013, IN1014, IN1016
Dev set: ES2011, IS1008, TS3004, IB4001, IB4002, IB4003, IB4004, IB4010, IB4011
Eval set: ES2004, IS1009, TS3003, EN2002

We use the AMI Annotations v1.6

Code

The recipe has been released with Kaldi Speech Recognition Toolkit . Look for egs/ami in the repository.

Contact

Drop an email to Pawel Swietojasnki (p.swietojanski@ed.ac.uk) in case experience any issues with the recpie or need any further infomration.