RESEARCH FELLOW IN SPEECH SYNTHESIS

The closing date for this post has now passed

Job Description

The Centre for Speech Technology Research at the University of Edinburgh is seeking a research fellow to work on the leading text-to-speech research toolkit, Festival (www.cstr.ed.ac.uk/projects/festival) through the ongoing project Expressive Prosody for Unit-selection Speech Synthesis. The project's aims are to add explicit control of prosody to unit-selection speech synthesis, generate prosody appropriate for communicating specific meanings and information structures and to realise this prosody with sequences of appropriately-sized pitch accents, arranged into valid intonation tunes. This project is jointly lead by Simon King, Mark Steedman and Rob Clark (Edinburgh) and Dan Jurafsky (Stanford, USA).

The position is suitable for a candidate with either a Ph.D. in speech technology, prosody, laboratory phonetics or computational linguistics, or with equivalent research experience. The successful candidate will ideally have good programming skills, preferably in C++, and experience with one or more of: concatenative speech synthesis techniques; statistical modelling of linguistic data; computational approaches to prosody; perceptual evaluations; Festival. An automatic speech recognition background is also appropriate for this position.

The starting date will be 1st January 2005, although we are able to delay this for the right candidate.

Informal enquiries can be made to Dr. Simon King (Simon.King@ed.ac.uk) or Prof. Mark Steedman (steedman@inf.ed.ac.uk). Further particulars can be found below.

The closing date for applications is 30th November 2004 and we expect to hold interviews during the week of 6th December 2004.

Further particulars

General Information

Centre for Speech Technology Research

The Centre for Speech Technology Research (CSTR) is a joint venture of the School of Informatics and the School of Philosophy, Psychology and Language Sciences (which includes Theoretical and Applied Linguistics). CSTR has operated for over 20 years and currently consists of 4 members of teaching staff, 7 research staff and 12 Ph.D. students. Current projects cover the full spectrum of speech technologies including many collaborations within University and with groups across the world.

The post will be based in CSTR, which is housed at 2, Buccleuch Place, within the central University area and in the heart of Edinburgh.

Festival

Festival is the leading text-to-speech research toolkit, used by both academic and commercial groups around the world. The latest release (version 2) now contains a full unit-selection engine. This position offers the successful candidate the opportunity to become a core member of the Festival team. Speech synthesis work at CSTR has a high profile internationally and CSTR has strong links to many other groups around the world. This position offers the successful candidate the chance to make a significant impact and would be an excellent start to a career in speech technology.

Principal duties

The research fellow will be responsible for:

Designing, implementing and evaluating computational methods for controlling prosody in a unit-selection synthesiser, Festival (version 2). This will include: designing and recording intonationally rich corpora (voice data sets); planning and supervising automatic and manual prosodic annotation of this data; continuing and extending the development and implementation of prosodic modelling techniques.

Candidate profile

The successful candidate will have the following profile:

completed, or thesis submitted, Ph.D in speech technology, prosody, laboratory phonetics or computational linguistics.
ability to work effectively and to meet deadlines (project milestones and Festival code releases) as an independent researcher, a member of the CSTR team and in collaboration with our partners in Stanford
good programming skills, preferably C++, optionally C or Java, plus a scripting language, preferably Python, optionally Lisp/Scheme or Perl.

In addition, the following are desirable

familiarity with the architecture of typical, large speech synthesis systems or other large language processing systems
some understanding of, or experience with, of natural language processing techniques
some understanding of, or experience with, statistical modelling techniques and algorithms such as CARTs, HMMs, the Viterbi algorithm, finite-state machines.
experience running perceptual experiments, preferably listening tests of speech synthesis
experience of recording speech in a studio setting

Project description

Current unit-selection speech synthesis systems cannot usually generate speech with prosody that conveys specific meaning or information structure, such as contrastive stress, theme/rheme distinctions, list structures, emphasis and so on. The project will develop methods for predicting and realising more appropriate prosody. We will incorporate explicit control of prosody into the standard unit-selection paradigm, in which units are selected from a large database, and minimal signal processing is performed.

The unit-selection paradigm involves a selecting appropriate units from an annotated database. In current systems, those units are usually context-dependent diphones, not whole words. This intermediate layer of representation between the words and the speech signal is necessary to avoid data sparsity problems and to be able to synthesise words not in the database.

In a parallel fashion, we will use an intermediate representation to mediate between the "high level" specification (e.g. which words to emphasise, what the intonational phrase structure is) and the realisation (what type, shape and size of pitch accent to use). The database will be annotated with phonetic information such as pitch accent types and details of their realisation (e.g. the height of an H*).

This three year project has been running for one year. Progress so far includes: development of a parallel segment/prosody selection technique; outline designs of the data sets that are needed for a voice to be sufficiently expressive; prosodic labelling of a pilot data set; ongoing comparisons of different levels of prosodic markup.

The project is jointly carried out in Edinburgh and Stanford. There are many opportunities for travel, including spending time at Stanford and attending conferences.

Informal enquiries can be made to Dr. Simon King (Simon.King@ed.ac.uk) or Prof. Mark Steedman (steedman@inf.ed.ac.uk). The CSTR web site contains information about Festival and other projects: www.cstr.ed.ac.uk.

How to apply

Please apply online via the University jobs website. The vacancy reference number is 3003174.