MOCHA-TIMIT
General
- Authors: Alan Wrench, Queen Margaret University
College
- Funded by: Engineering and Physical Sciences Research
Council:
- When created: November 1999
- Availability: English speakers available here free for non-commercial
use and may be distributed on CDROM for a fee.
- Purpose: Phonetically balanced dataset for training an
automatic speech recognition system
- Description: Overview
Instrumentation
- Components:
- Microphone 16kHz sample rate (audio-technica ATM10a)
- Laryngograph 16kHz sample rate
- Electromagnetic Articulograph 500Hz sample rate (Carstens 10
Channel)
- upper incisor
- lower incisor
- upper lip
- lower lip
- tongue tip
- tongue blade
- tongue dorsum
- velum
- EPG 200Hz sample rate
- SVHS video of front view of mouth area. (Available by special
request)
Corpus
- Texts:
-
- Sentences:
- A set of 460 sentences designed to include the main connected
speech processes in English (eg. assimilations, weak forms
..).
Orthography
- Subjects: 2 speakers, 1 male and 1 female are currently
available but another 38 are planned to be completed by May 2001.
The subjects have a variety of accents of English.
- Conditions:
- All recordings made in the same sound damped studio at the
Edinburgh Speech Production Facility. All data were recorded direct
to computer and carefully synchronised.
- Languages: English
-
-
Access
- Platforms: The data files have headers which retain byte
order information.
- Media: Internet (FTP) and possibly CDROM
- SVHS video available by special request.
- Format:
- Audio and Laryngograph are stored with 1024 byte ascii NIST
headers. EPG remains in raw binary (8 bytes per sample). EMA data
is stored in Edinburgh Speech Tools Trackfile format consisting of
a variable length ascii header and a 4 byte float representation
per channel. The first channel is a time value in seconds the
second value is always 1 (used to indicate if the sample is present
or not) subsequent 5 values are coil 1-5 x-values followed by coil
1-5 y-values followed by coil 6-10 x-values and finally coils 6-10
y-values.
- Size:~200kBytes per speaker
- Software:
-
- Edinburgh Speech
tools is free and contains routines ch_wave and ch_track which
can be used to convert the waveform and EMA files into other
formats such as ESPS waves format, HTK or raw binary as well as
other routines such as pitch tracking.
- MATLAB - a set of macros to read and write the stored data
formats are available along with supplimentary routines for the
EMATOOLS set.
- Software Documentation:
- Edinburgh Speech
Tools
- MATLAB-EMATools
- Software Source/Executables:
- See above.
-
Further information
- Contact: A. Wrench
- Address:
-
Dept. Speech and language Sciences, Queen Margaret
University College, Clerwood Terrace, Edinburgh. EH12
8TS.
- Telephone:
- +44 131 317 3692
- Fax:
- +44 131 317 3689
- Email:
- a.wrench@sls.qmced.ac.uk
- WWW:
- http://sls.qmuc.ac.uk