Tutorials

available courses

You will need to log in to choose your tutorials.

Tutorial 1
Perception of Major Speech Cues
Astrid van Wieringen, Leuven
A talker’s message is conveyed to the listener through the acoustic speech wave. Knowledge of the most important spectral and temporal features of speech sounds provides insight into both the production processes of speech and the processes by which the listener perceives speech. The aim of this tutorial is to gain understanding on the major acoustic cues underlying the identification of speech sounds. This is of importance for automatic speech recognition and also for assessing speech perception performance in hearing-impaired persons. After a brief overview of the major spectral and temporal cues of speech sounds and of different types of speech tests and speech materials, participants will run perception tests with (low-pass, high-pass, band-pass) filtered speech stimuli. The speech test consists of nonsense words and is, therefore, language-independent. The data will be analyzed and discussed with respect to automatic speech recognition, hearing aids and cochlear implants (an implantable aid for profoundly deaf persons that transmits speech cues via electrical stimulation of the auditory nerve).
Presentation (Microsoft PowerPoint 11 MB, IPA Fonts required - see link below)
Homepage (external Link)
SIL Encore IPA Download (external Link)

Tutorials 2 and 3
Unit Selection Synthesis with BOSSII
Stefan Breuer, Universität Bonn
In this practical, you will get an introduction to unit selection speech synthesis in general and to the BOSSII architecture in particular. In a practical part, you will adapt an existing speech corpus and a lexicon to the speech synthesis architecture BOSSII and build a new synthesis system.
The tutorial corresponds to 1 session and will take place twice during the week.
1 session: monday 11.00-12.30, 13.00-15.00 (2) or tuesday 9.00-12.30 (3)
Introduction to Unit Selection
Introduction to BOSS

Tutorial 4
Building a Semantic Network
Karel Pala and Pavel Smrz, Brno
1st session: The short overview of the WordNet-like lexical databases - some examples: Princeton WordNet (+versions), EuroWordNet, BalkaNet, GermaNet, RussNet, ... The main semantic relations: synonymy, hypero/hyponymy, Internal Language Relations, Top Ontology ... WordNet representation and WordNet editing, text format, XML format, DTD, tools for WordNet editing and browsing - VisDic, tool configuration.
2nd session: The students should try to prepare a small wordnet cluster (up to 50 synsets) for their language (not existing in the so far built wordnets) and to assign the selected concepts to ontologies. An attempt to compare the ontologies can be made - EuroWordNet Top Ontology, SUMO, Time Ontology, ...
Finally, they will summarise the results and write the final report (in HTML).
2 session: monday 16.00-19.30; tuesday 13.30-17.00

Tutorial 5
Analyzing Speech Rhythm: An overview and perceptual approach to isochrony
Ingmar Steiner, Bonn
This lab will present the history and state of the art of speech rhythm research. The rhythm class hypothesis and the notion of isochrony will be presented and discussed, as well as recent developments towards an acoustical account of speech rhythm. Furthermore, to explore the perception of isochrony, the participants will develop and (time permitting) perform a simple perceptual experiment comparing natural and synthetically "isochronized" speech stimuli. This experiment will be prepared using the phonetic software Praat, including its annotation, scripting, and experimentation capabilities. Prior familiarization with the corresponding aspects of the Praat documentation is recommended, but not mandatory.
3 sessions: monday 16.00-19.30; tuesday 13.30-17.00; wednesday 15.00-18.30
Tutorial Slides
Student Presentation

Tutorial 6
Speech Signal Processing with MATLAB - An Introduction
Christian Weiss, Bonn
In course of this tutorial the generation, analysis and representation of audio signals with the help of MATLAB is introduced.
In the beginning, elementary audio signals and their mathematical representation with the MATLAB programming language are studied. An introduction to MATLAB will take place in course of the tutorial so that no prior knowledge of MATLAB is needed - knowledge of other programming languages will be assumed.
Audio signals are usually sinusoidal and noise signals and their combinations. The signal representation in the time and spectral domain will be used in order to introduce the concept of the Fourier transformation in a practical way. Digital filters offer a simple option to modify audio signals.
2 sessions: tuesday 13.30-17.00; wednesday 15.00-18.30

Tutorial 7
Building Corpora from Web
Radek Sedláček and Karel Pala, FI MU Brno
The goal of this tutorial is to create a small corpus of texts in mother tongue of participating students and to obtain some statistical characteristics for the respective language. The number of sessions comes out from the experience of the last Summer School:
1st session: The students have to have some data, i.e. they should build their corpora by downloading text data from web or other publicly available resources, clear it and transform to the so called vertical format.
2nd session: The students should tag the verticalised source text with structural or/and grammatical tags. Tagged texts will be accessible from corpus query processor Bonito and other programmes that allow the students to compute some statistical characteristics.
3rd session: The students can modify existing PERL scripts or create new ones to gain more characteristics. Finally, they will summarise the results and write the final report in HTML.
3 sessions: wednesday 9.00-12.30; thursday 9.00-12.30; friday 8.30-12.00

Tutorial 8
Voice XML & Finite State Dialogue Processing
Ivan Kopecek, Brno and Martin Rajman, Lausanne
The goal of this tutorial is to introduce an approach to human-machine dialogue design that comes from finite-state abstract analysis of dialogue; connections with current industrial standards such as VoiceXML, as well as the use of rapid dialogue development tools such as CSLU RAD, will be presented. The various techniques will be illustrated during two practicals, one focussing on VoiceXML presentation and one on rapid dialogue prototyping with the CSLU RAD tool.
3 sessions: wednesday 9.00-12.30; thursday 9.00-12.30; friday 8.30-12.00
Ivan Kopecek's Tutorial Slides