Research

Our current research covers speech recognition, speech synthesis, speech signal processing, information access, multimodal interfaces, dialogue systems, machine learning and acoustic phonetics.

Complete list of CSTR publications.

Current projects

Acoustic-articulatory inversion: This inversion project aims to estimate the articulatory movements which underpin an acoustic speech signal.
Combilex: Combilex is a high-quality multi-accent pronunciation lexicon for English with several advanced features.
Deep architectures for statistical speech synthesis: This fellowship is concerned with developing a new model for statistical speech synthesis which allows us to include more information about how speech is produced, as well as information about how it is perceived and how external factors, such as background noise, affect speech.
The Edinburgh Speech Tools: Speech tools is a set of core libraries used by Festival and various other applications
EU-Bridge: EU-Bridge is a three year project which will develop automatic transcription and translation technology to enable innovative multimedia captioning and translation services of audiovisual documents between European and non-European languages. The project will provide streaming technology that can convert speech from lectures, meetings, and telephone conversations into the text in another language. Within Edinburgh CSTR will work closely with the Statistical Machine Translation Group.
The Festival speech synthesis system: The Festival Speech Synthesis system
InEvent: InEvent is a three year project whose main goal is to develop new means to structure, retrieve, and share large archives of networked, and dynamically changing, multimedia recordings, mainly consisting of meetings, video-conferences, and lectures.
INSPIRE: INSPIRE is a Marie Curie Initial Training Network, concerned with investigating speech processing in realistic environments.
LISTA: The Listening Talker: LISTA is an EU project about speaker- and environment-adaptive speech synthesis and speech modification
MultiMemoHome: MultiMemoHome is a research project aiming to develop user-friendly, accessible and effective reminder systems in order to improve home care.
Natural Speech Technology: Natural Speech Technology (NST) is a 5-year EPSRC Programme Grant with the aim of significantly advancing the state-of-the-art in speech technology by making it more natural, approaching human levels of reliability, adaptability and conversational richness. NST is a collaboration between CSTR, the Speech Group at the University of Cambridge and the Speech and Hearing Research Group (SpandH), University of Sheffield.
RSE / NSFC Bilateral Research Award: The Royal Society of Edinburgh / National Science Foundation China travel grant has been awarded to CSTR and USTC for further joint and linked research on our novel framework for speech synthesis.
SALB: Speech synthesis of Auditory Lecture books for Blind children: In this project we want to evaluate HMM-based synthesis of different language varieties (standard, dialect, sociolect) for auditive lecture books. Moreover, we want to analyze the influence of different social roles (teacher vs student) as well as of self-perception and perception of others, that exists between the listener and the person whose voice is synthesized.
SCALE: SCALE is a Marie-Curie Initial Training Network. The research themes are: Automatic Speech Recognition, Machine learning, Speech Synthesis, Signal Processing, and Human speech recognition
Simple4All: The Simple4All project created speech synthesis technology which learns from data with little or no expert supervision, and continually improves simply by being used. It was one of the first attempts to use lightly-supervised and unsupervised methods to create speech synthesis systems.
SSPNet: The Social Signal Processing Network: SSPNet is an EU FP7 Network of Excellence project about social signal processing
uDialogue: Joint with Nagoya Institute of Technology, uDialogue is a five year project concerned with crowdsourcing multimodal dialogue systems, speech synthesis, and speech recognition.
Ultrax: Ultrax aims to develop ultrasound scanning technology into a useful and effective tool for child speech therapy.
Voicebank: The Voicebank project aims to develop clinical applications of HMM-based speech synthesis such as personalised voices for communication aids. The project is a collaboration between CSTR, the Euan MacDonald Centre for Motor Neurone Disease and the Anne Rowling Regenerative Neurology Clinic.
Voice Building KTP: This is a Knowledge Transfer Partnership with Orange/France Telecom. The aim of the KTP is to improve automatic voice building through development/integration of novel automatic speech recognition techniques and build commercial-grade systems for bringing personalised speech technology to Orange customers.

Previous projects are listed in the Project Archive.

Corpora

Two corpora recorded at the CSTR are available through the LDC, these are:

MC-WSJ-AV: The MC-WSJ-AV (multi-channel Wall Street Journal Audio-Visual) corpus is a corpus of read speech (WSJ) recorded with close talking and distant microphone (arrays) enabling research in speaker localisation, (blind) speech separation and speech recognition.
2012_MMA: The 2012_MMA (multi microphone array) corpus is a corpus of read speech (WSJ) recorded with multiple distant microphone (arrays) enabling research in speaker localisation, (blind) speech separation and speech recognition.

The MC-WSJ-AV and 2012_MMA corpus data are managed by the LDC. For more details see LDC (Linguistic Data Consortium)

Software

Much of our research is released within open source software toolkits. CSTR researchers contribute to three major toolkits in particular:

Festival: The Festival Speech Synthesis System offers a general framework for building speech synthesis systems as well as including examples of various modules. As a whole it offers full text to speech through a number APIs. Festival is multi-lingual, though English is the most advanced. Full tools and documentation for build new voices are available through the CMU FestVox project.
HTS: The HMM-based Speech Synthesis System (HTS) is a toolkit for statistical parametric speech synthesis. The training part of HTS has been implemented as a modified version of HTK and released as a form of patch code to HTK under the New and Simplified BSD license.
Kaldi: Kaldi is a toolkit for speech recognition written in C++ and licensed under the Apache License v2.0.

Seminar schedules

ILCC Seminar schedule (Computational linguistics, speech processing, cognitive science)
PWorkshop schedule (Phonetics/Phonology Workshop)
IANC seminar schedule (Machine learning and computational neuroscience)
Signal and Image Processing Seminars (Edinburgh Research Partnership in Engineering and Mathematics)

International conference schedules

Upcoming conferences [sorted by date] [sorted by deadline]
Past conferences