TUTORIAL ABSTRACTS

1) Perception of Major Speech Cues
    Astrid van Wieringen
    Leuven
    2 sessions

A talker's message is conveyed to the listener through the acoustic speech wave. Knowledge of the most important spectral and temporal features of speech sounds provides insight into both the production processes of speech and the processes by which the listener perceives speech. The aim of this tutorial is to gain understanding on the major acoustic cues underlying the identification of speech sounds. This is of importance for automatic speech recognition and also for assessing speech perception performance in hearing-impaired persons. After a brief overview of the major spectral and temporal cues of speech sounds and of different types of speech tests and speech materials, participants will run perception tests with (low-pass, high-pass, band-pass) filtered speech stimuli. The speech test consists of nonsense words and is, therefore, language-independent. The data will be analyzed and discussed with respect to automatic speech recognition, hearing aids and cochlear implants (an implantable aid for profoundly deaf persons that transmits speech cues via electrical stimulation of the auditory nerve).

2) Auditory Coding of Speech Sounds
    Sarah Simpson
    Sheffield
    2 sessions

Speech and other sounds sources are encoded as a time-varying pattern of spikes in the auditory nerve. However, it is not yet certain which features of this pattern are responsible for conveying information in speech. On the one hand, the average firing rate in each fibre can be considered to be an internal representation or an auditory spectrum. Alternatively, the time intervals between spikes also reflects locally-dominant spectral components and can be processed to produce an interval-based spectrum. The situation is made more complex by the presence of a number of important nonlinearities in the processing chain within the cochlea. One of these ensures that the rate response saturates at moderate to high signal levels. Consequently, it is difficult to understand how a rate-based representation can encode stimuli such as vowels at high intensities, or with added noise, since most of the auditory nerve fibres will show a saturated rate response, leading to a flat spectral representation and an absence of formant peaks. An interval-based representation, on the other hand, shows no such saturation.

The purpose of this tutorial is to build computational models for both rate and interval based processing, and to evaluate their response to speech in various levels of background noise. Results will be compared with physiological data. Various components of an auditory model will be provided as MATLAB functions. If time permits, a number of different approaches to the estimation of dominant frequencies can be explored, including Seneff's generalised synchrony detector, Ghitza's ensemble interval histogram and Cooke's instantaneous frequency strands.

The project is suited to either individual or paired study. Some basic familiarity with MATLAB is needed for at least one member of the team.

The tutorial will begin with a presentation on speech processing in the peripheral auditory system.

You can find more information on the tutorial here.

3) Very Low Bit Rate Speech Coding
    Jan Cernocky & Petr Motlicek
    Brno
    3 sessions

The goal of the project will be to effectively encode the speech within the rate of several hundred bits per second. Studentswill be able to go from raw speech data to a speaker-dependent coder. The coder can be built on American English or Czech databases that will be available, or student are free to bring their own data.
The main steps of the project are:

  • Parameterization of speech
  • Segmentation
  • Vector quantization
  • HMM training and refinement
  • Representatives
  • Speech synthesis

Documentation available for the project:

[1] J. Cernocky: "Speech Processing Using Automatically Derived Segmental Units: Applications to Very Low Rate Coding and Speaker Verification", PhD thesis, Universite Paris-Sud (France), December 18, 1998. Available online at: http://www.feec.vutbr.cz/~cernocky/Thesis.html

[2] J. Cernocky: "Very low bit rate coding using automatically derived units - a cookbook", Technical report, Brno University of Technology, May 2000. available upon email request to {motlicek,cernocky}@fit.vutbr.cz

4) Voice XML & Finite State Dialogue Processing
    Ivan Kopecek & Martin Rajman
    Brno & Lausanne
    3 sessions

The goal of this tutorial is to introduce an approach to human-machine dialogue design that comes from finite-state abstract analysis of dialogue; connections with current industrial standards such as VoiceXML, as well as the use of rapid dialogue development tools such as CSLU RAD, will be presented. The various techniques will be illustrated during two practicals, one focussing on VoiceXML presentation and one on rapid dialogue prototyping with the CSLU RAD tool. The tutorial corresponds to 2 sessions and will take place twice during the week.

5) Machine Learning and NLP
    Lluis Marquez & Xavier Carreras
    Barcelona
    3 sessions

In this course, the fundamentals of Machine Learning (ML) will be introduced, focusing on the "inductive learning for classification" paradigm with application to Natural Language Processing tasks.

In particular, two recently very succesful ML algorithms will be explained in detail, namely AdaBoost and Support Vector Machines (SVM), together with their application to simple and rather (structurally) complex NLP problems. This part will be theoretically covered in the first part of the course (50-60%).

In the practical part of the course (about 40-50%), the students will have the opportunity of playing with the AdaBoost and SVM algorithms on a variety of sequence tagging problems for chunk detection (e.g, noun phrases, verb phrases, Named Entities, etc.). In a controlled and "easy-to-use" software framework, the students will be able to optimize learning parameters and to test/tune the algorithms for constructing a "real" chunker that will be evaluated using running text.

The practical part will be developed under Unix environment.

6) Building Corpora
    Karel Pala, Radek Sedlacek
    Brno
    3 sessions

The goal of this tutorial is to create a small text corpus in a mother tongue of the participating students and to obtain some statistical characteristics of the respective language. The number of sessions comes out of the experience of the last Summer School:

1st session

The students have to collect some data, i.e. they should build their corpora by downloading the data from the web or other publicly available resources, clear it and transform to what is called vertical format.

2nd session

The students should tag the verticalised source text with structural and/or (possibly) grammatical tags. The tool that will be used - the corpus query processor Bonito and other programmes makes it possible to process tagged texts and allows the students to compute some statistical characteristics.

3rd session

The students can (as an option) modify existing PERL scripts or create new ones to obtain more characteristics. Finally, they will summarise the results and write the final report in HTML.

You can find more information on the tutorial here.

7) Building a Semantic Network
    Karel Pala, Pavel Smrz
    Brno
    2 sessions

1st session

The short overview of the WordNet-like lexical databases - some examples: Princeton WordNet (+versions), EuroWordNet, BalkaNet, GermaNet, RussNet, ... The main semantic relations: synonymy, hypero/hyponymy, Internal Language Relations, Top Ontology ... WordNet representation and WordNet editing, text format, XML format, DTD, tools for WordNet editing and browsing - VisDic, tool configuration.

2nd session

The students should try to prepare a small wordnet cluster (up to 50 synsets) for their language (not existing in the so far built wordnets) and to assign the selected concepts to ontologies. An attempt to compare the ontologies can be made - EuroWordNet Top Ontology, SUMO, Time Ontology, ...

Finally, they will summarise the results and write the final report (in HTML).

8) NLP with Prolog
    Hans-Christian Schmitz & Bernhard Schröder
    Bonn
    3 sessions

The course starts with a very concise introduction into logical programming with Prolog.
We discuss recursion, lists and definite clause grammars (DCGs).
It is an advantage but not a necessary requirement that the participants have some basic knowledge of Prolog.
We experiment with the implementation of various grammars in the DCG-formalism.
The course includes practical exercises.
The advanced tutorial builds on this course.

8adv) NLP with Prolog (advanced tutorial)
    Hans-Christian Schmitz & Bernhard Schröder
    Bonn
    1 session

The course gives an overview of language processing techniques in Prolog. Parsing strategies are reviewed.
We augment a top-down parser with a chart mechanism and discuss the handling of feature-structure grammars.
Some practical exercises are included.
Finally, we might give a prospect of semantic interpretation and processing of semi-structured (XML) documents.

9) Limited Domain Synthesis Exercise (FESTVOX tools)
    Simon King
    Edinburgh
    1 session (twice)

In this practical, you will build a limited domain speech synthesiser based on their own voice, using the Festival speech synthesis toolkit with the Festvox voice-building tools.

The synthesiser uses a type of unit selection in which multiple examples of each unit type are recorded and an algorithm automatically chooses which particular example to use at synthesis time.

The type of unit selection used here is a simplified one, in which the units are whole phones. This works very well for limited domain applications like speaking clocks, reading the weather or sports results, etc. With a little effort, you can build a synthesiser which is better than general unit-selection based TTS (e.g. rVoice which is a direct descendant of Festival) within the limited domain for which it is designed but considerably worse outside that domain."

Online instructions including some things you can do before coming to Barcelona.

The tutorial corresponds to 1 session and will take place twice during the week.

9adv) Limited Domain Synthesis Exercise (advanced tutorial)
    Simon King
    Edinburgh
    1 session

Students taking both the basic and advanced tutorials will have more time to spend designing and building their sytems, plus we will look at how to evaluate the system, and how to work out when - and more importantly why - it makes mistakes. There will be enough time to try correcting some of the problems you find with your system, and to listen to the improved output.