Satissfy

Project Summary

Statistical Analysis of Text In Speech Synthesis

Project Details

The Satissfy project brings together the work in probabilistic and statistical language anlaysis with text to speech synthesis. Satissfy stands for Statistical Analysis of Text In Speech Synthesis (for you!).

In many text to speech systems analysis of raw text has been done to heuristic rule based systems. However progress has been made in applying statistical models of language which could benefit analysis of text in a speech synthesis system. For example, parsing systems, part of speech taggers, phrase break prediction etc. But we are interested in using statistical knowledge better and be able to influence the predictions in contextually appropriate ways. For example, raw text analysis in email messages is quite distinct from analysis of clearly written single threaded text in a press release.

In the long term we wish to develop statistical models for anaylsis of text types (e.g. email, news, HTML, Latex etc) that make for better synthesis.

Personnel

Alan W Black
Paul Taylor (PI)
Steve Isard (PI)

Current Progress

Part of speech tagger: using HMM technology we have built a part of speech tagger, optimising its tagset to resolve homograph ambiguities.
Phrase breaks: using part of speech tags and phrase break distributions we have built a probabilistic phrase break assignment algorithm. As far as we are aware, this produces the best results of any similar algorithm reported in the literature. The method works by using two models:
1. A model which gives the probability of a POS sequence given a break
2. A ngram model which gives the prior probability of a sequence of breaks occuring
These are combined using Bayes equation, and a viterbi decoding procedure is used to assign the most probable sequence of breaks to the input sentence. This work is fully explained in
P. A. Taylor. and A. W. Black"Assigning phrase breaks from part-of-speech sequences".

These are fully implemented within the Festival Speech Synthesis System.

Future Work

Three areas are identified for the next few months of work.

Implementation, and evaluation of the Magerman statisical parser.
Implementation, and evaluation of David Yarowsky's work on modifying probabilistiies by more general features for homograph disabu]iguation
Training of letter-to-sound rules for examples in lexicons, by minimisation of rewrite-rules.
Investigate HTML to SSML conversion as a basic model for integrating for pre-processing of text.

Funding

This work is being funded by the Engineering and Physical Science Research Council, EPSRC grant GR/K54229: "Statistical Text Analysis for Speech Synthesis"

Duration: April 1996 - March 1999

Contact Alan W Black for more details.

Personnel

Alan Black (CMU)