Statistical Analysis of Text In Speech Synthesis
The Satissfy project brings together the work in probabilistic and statistical language anlaysis with text to speech synthesis. Satissfy stands for Statistical Analysis of Text In Speech Synthesis (for you!).
In many text to speech systems analysis of raw text has been done to heuristic rule based systems. However progress has been made in applying statistical models of language which could benefit analysis of text in a speech synthesis system. For example, parsing systems, part of speech taggers, phrase break prediction etc. But we are interested in using statistical knowledge better and be able to influence the predictions in contextually appropriate ways. For example, raw text analysis in email messages is quite distinct from analysis of clearly written single threaded text in a press release.
In the long term we wish to develop statistical models for anaylsis of text types (e.g. email, news, HTML, Latex etc) that make for better synthesis.
- Alan W Black
- Paul Taylor (PI)
- Steve Isard (PI)
- Part of speech tagger: using HMM technology we have built a part of speech tagger, optimising its tagset to resolve homograph ambiguities.
- Phrase breaks: using part of speech tags and phrase break
distributions we have built a probabilistic phrase break assignment
algorithm. As far as we are aware, this produces the best results
of any similar algorithm reported in the literature. The method
works by using two models:
- A model which gives the probability of a POS sequence given a break
- A ngram model which gives the prior probability of a sequence of breaks occuring
- P. A. Taylor. and A. W. Black"Assigning phrase breaks from part-of-speech sequences".
These are fully implemented within the Festival Speech Synthesis System.
Three areas are identified for the next few months of work.
- Implementation, and evaluation of the Magerman statisical parser.
- Implementation, and evaluation of David Yarowsky's work on modifying probabilistiies by more general features for homograph disabu]iguation
- Training of letter-to-sound rules for examples in lexicons, by minimisation of rewrite-rules.
- Investigate HTML to SSML conversion as a basic model for integrating for pre-processing of text.
This work is being funded by the Engineering and Physical Science Research Council, EPSRC grant GR/K54229: "Statistical Text Analysis for Speech Synthesis"
Duration: April 1996 - March 1999
Contact Alan W Black for more details.
- Alan Black (CMU)