The main goal of my work was to create the Polish diphone database for speech synthesis. The whole project is deeply connected with the MBROLA system. MBROLA is an international project. It was started by Thierry Dutoit in 1995 in Belgium. The goal of the MBROLA project is to obtain a set of a high quality speech synthesizers for as many languages as possible, free for use in non-commercial applications. An additional aim of the project was to obtain high quality speech synthesis for the Polish language.
Several steps had to be made: first of all the corpus had to be created, secondly the recordings had to be made. The next and the most sophisticated stage was to prepare a segmentation process of the recorded diphone corpus. It required accuracy and precision. The last stage was to test the quality of speech synthesis by using the most popular connections of diphones in the Polish language. Finalization of the work was the normalization of the database by MBROLA at the Polytechnical University in Mons.
The segmentation is a very sophisticated proccess. The sound elements are taken from the speech material "by hand". Generally Praat was used as a tool for segmentation. It has a built-in spectrogram and uses a graphic-acoustic display for marking the parts of the signal to be cut out and for acoustical control. During manual segmentation, considerable difficulties might occur. The main difficulties arise, if there are no sharp boundaries between the individual sounds (flowing sound transitions). In such cases, marking of the boundaries is often arbitrary.
The quality of speech synthesis must be intelligible, and it has to sound natural so that the acoustic model could be further used in public applications that use speech synthesis. By public use I mean the using of the database for education, voice portals, Enhanced Eyes-Free Access to Critical Information While Driving, and also an aid to the people with speech disabilities.
Now the speech synthesis system works for the input data as phonetic transcription. The next stage will be the creation of the prosodic model and natural language processing in order to create the full TTS system.
The Polish database is available since May this year on the MBROLA website (http://tcts.fpms.ac.be/synthesis/mbrola) as the new voice model of the Polish language.