MOCHA-TIMIT

General

Authors: Alan Wrench, Queen Margaret University College
Funded by: Engineering and Physical Sciences Research Council:
When created: November 1999
Availability: English speakers available here free for non-commercial use and may be distributed on CDROM for a fee.
Purpose: Phonetically balanced dataset for training an automatic speech recognition system
Description: Overview

Instrumentation

Components:

Microphone 16kHz sample rate (audio-technica ATM10a)
Laryngograph 16kHz sample rate
Electromagnetic Articulograph 500Hz sample rate (Carstens 10 Channel)
upper incisor

lower incisor

upper lip

lower lip

tongue tip

tongue blade

tongue dorsum
velum
- EPG 200Hz sample rate
- SVHS video of front view of mouth area. (Available by special request)
Corpus
Texts:
1. Sentences:
  - A set of 460 sentences designed to include the main connected speech processes in English (eg. assimilations, weak forms ..).
  Orthography
Subjects: 2 speakers, 1 male and 1 female are currently available but another 38 are planned to be completed by May 2001. The subjects have a variety of accents of English.

Conditions:

All recordings made in the same sound damped studio at the Edinburgh Speech Production Facility. All data were recorded direct to computer and carefully synchronised.

Languages: English
Access
Platforms: The data files have headers which retain byte order information.

Media: Internet (FTP) and possibly CDROM

SVHS video available by special request.

Format:

Audio and Laryngograph are stored with 1024 byte ascii NIST headers. EPG remains in raw binary (8 bytes per sample). EMA data is stored in Edinburgh Speech Tools Trackfile format consisting of a variable length ascii header and a 4 byte float representation per channel. The first channel is a time value in seconds the second value is always 1 (used to indicate if the sample is present or not) subsequent 5 values are coil 1-5 x-values followed by coil 1-5 y-values followed by coil 6-10 x-values and finally coils 6-10 y-values.

Size:~200kBytes per speaker

Software:
Edinburgh Speech tools is free and contains routines ch_wave and ch_track which can be used to convert the waveform and EMA files into other formats such as ESPS waves format, HTK or raw binary as well as other routines such as pitch tracking.

MATLAB - a set of macros to read and write the stored data formats are available along with supplimentary routines for the EMATOOLS set.
Software Documentation:

Edinburgh Speech Tools

MATLAB-EMATools

Software Source/Executables:

See above.
Further information

Contact: A. Wrench

Address:

Dept. Speech and language Sciences, Queen Margaret University College, Clerwood Terrace, Edinburgh. EH12 8TS.

Telephone:

+44 131 317 3692

Fax:

+44 131 317 3689

Email:

a.wrench@sls.qmced.ac.uk

WWW:

http://sls.qmuc.ac.uk

MOCHA-TIMIT

General

Instrumentation

Corpus

Access

Further information