TUTORIAL ABSTRACTS
1) Perception of Major Speech Cues
Astrid van Wieringen
Leuven
2 sessions
A talker's message is conveyed to the listener through the acoustic speech
wave. Knowledge of the most important spectral and temporal features of
speech sounds provides insight into both the production processes of speech
and the processes by which the listener perceives speech. The aim of this
tutorial is to gain understanding on the major acoustic cues underlying the
identification of speech sounds. This is of importance for automatic speech
recognition and also for assessing speech perception performance in
hearing-impaired persons. After a brief overview of the major spectral and
temporal cues of speech sounds and of different types of speech tests and
speech materials, participants will run perception tests with (low-pass,
high-pass, band-pass) filtered speech stimuli. The speech test consists of
nonsense words and is, therefore, language-independent. The data will be
analyzed and discussed with respect to automatic speech recognition,
hearing aids and cochlear implants (an implantable aid for profoundly deaf
persons that transmits speech cues via electrical stimulation of the
auditory nerve).
2) Auditory Coding of Speech Sounds
Sarah Simpson
Sheffield
2 sessions
Speech and other sounds sources are encoded as a time-varying pattern of
spikes in the auditory nerve. However, it is not yet certain which features
of this pattern are responsible for conveying information in speech. On the
one hand, the average firing rate in each fibre can be considered to be an
internal representation or an auditory spectrum. Alternatively, the time
intervals between spikes also reflects locally-dominant spectral components
and can be processed to produce an interval-based spectrum. The situation is
made more complex by the presence of a number of important nonlinearities in
the processing chain within the cochlea. One of these ensures that the rate
response saturates at moderate to high signal levels. Consequently, it is
difficult to understand how a rate-based representation can encode stimuli
such as vowels at high intensities, or with added noise, since most of the
auditory nerve fibres will show a saturated rate response, leading to a flat
spectral representation and an absence of formant peaks. An interval-based
representation, on the other hand, shows no such saturation.
The purpose of this tutorial is to build computational models for both rate
and interval based processing, and to evaluate their response to speech in
various levels of background noise. Results will be compared with physiological
data. Various components of an auditory model will be provided as MATLAB functions.
If time permits, a number of different approaches to the estimation of dominant
frequencies can be explored, including Seneff's generalised synchrony detector,
Ghitza's ensemble interval histogram and Cooke's instantaneous frequency strands.
The project is suited to either individual or paired study. Some basic familiarity
with MATLAB is needed for at least one member of the team.
The tutorial will begin with a presentation on speech processing in
the peripheral auditory system.
You can find more information on the tutorial here.
3) Very Low Bit Rate Speech Coding
Jan Cernocky & Petr Motlicek
Brno
3 sessions
The goal of the project will be to effectively encode the speech within the rate
of several hundred bits per second. Studentswill be able to go from raw speech data to
a speaker-dependent coder. The coder can be built on American English or Czech databases
that will be available, or student are free to bring their own data.
The main steps of the project are:
- Parameterization of speech
- Segmentation
- Vector quantization
- HMM training and refinement
- Representatives
- Speech synthesis
Documentation available for the project:
[1] J. Cernocky: "Speech Processing Using Automatically Derived
Segmental
Units: Applications to Very Low Rate Coding and Speaker Verification",
PhD thesis, Universite Paris-Sud (France), December 18, 1998.
Available online at: http://www.feec.vutbr.cz/~cernocky/Thesis.html
[2] J. Cernocky: "Very low bit rate coding using automatically derived
units - a cookbook", Technical report, Brno University of Technology,
May 2000. available upon email request to
{motlicek,cernocky}@fit.vutbr.cz
4) Voice XML & Finite State Dialogue Processing
Ivan Kopecek & Martin Rajman
Brno & Lausanne
3 sessions
The goal of this tutorial is to introduce an approach to human-machine
dialogue design that comes from finite-state abstract analysis of dialogue;
connections with current industrial standards such as VoiceXML, as well
as the use of rapid dialogue development tools such as CSLU RAD, will be
presented. The various techniques will be illustrated during two practicals,
one focussing on VoiceXML presentation and one on rapid dialogue prototyping
with the CSLU RAD tool.
The tutorial corresponds to 2 sessions and will take place twice during
the week.
5) Machine Learning and NLP
Lluis Marquez & Xavier Carreras
Barcelona
3 sessions
In this course, the fundamentals of Machine Learning (ML) will be
introduced, focusing on the "inductive learning for classification"
paradigm with application to Natural Language Processing tasks.
In particular, two recently very succesful ML algorithms will be
explained in detail, namely AdaBoost and Support Vector Machines
(SVM), together with their application to simple and rather
(structurally) complex NLP problems. This part will be theoretically
covered in the first part of the course (50-60%).
In the practical part of the course (about 40-50%), the students will
have the opportunity of playing with the AdaBoost and SVM algorithms
on a variety of sequence tagging problems for chunk detection (e.g,
noun phrases, verb phrases, Named Entities, etc.). In a controlled
and "easy-to-use" software framework, the students will be able to
optimize learning parameters and to test/tune the algorithms for
constructing a "real" chunker that will be evaluated using running
text.
The practical part will be developed under Unix environment.
6) Building Corpora
Karel Pala, Radek Sedlacek
Brno
3 sessions
The goal of this tutorial is to create a small text corpus in a mother
tongue of the participating students and to obtain some statistical
characteristics of the respective language. The number of sessions comes
out of the experience of the last Summer School:
1st session
The students have to collect some data, i.e. they should build their corpora
by downloading the data from the web or other publicly available resources,
clear it and transform to what is called vertical format.
2nd session
The students should tag the verticalised source text with structural and/or
(possibly) grammatical tags. The tool that will be used - the corpus
query processor Bonito and other programmes makes it possible to process
tagged texts and allows the students to compute some statistical
characteristics.
3rd session
The students can (as an option) modify existing PERL scripts or create
new ones to obtain more characteristics. Finally, they will summarise
the results and write the final report in HTML.
You can find more information on the tutorial here.
7) Building a Semantic Network
Karel Pala, Pavel Smrz
Brno
2 sessions
1st session
The short overview of the WordNet-like lexical databases - some examples:
Princeton WordNet (+versions), EuroWordNet, BalkaNet, GermaNet, RussNet, ...
The main semantic relations: synonymy, hypero/hyponymy, Internal Language
Relations, Top Ontology ...
WordNet representation and WordNet editing, text format, XML format, DTD,
tools for WordNet editing and browsing - VisDic, tool configuration.
2nd session
The students should try to prepare a small wordnet cluster (up to 50
synsets) for their language (not existing in the so far built wordnets)
and to assign the selected concepts to ontologies. An attempt to compare
the ontologies can be made - EuroWordNet Top Ontology, SUMO, Time
Ontology, ...
Finally, they will summarise the results and write the final report (in
HTML).
8) NLP with Prolog
Hans-Christian Schmitz & Bernhard Schröder
Bonn
3 sessions
The course starts with a very concise introduction into logical
programming with Prolog.
We discuss recursion, lists and definite clause grammars (DCGs).
It is an advantage but not a necessary requirement that the participants
have some basic knowledge of Prolog.
We experiment with the implementation of various grammars in the
DCG-formalism.
The course includes practical exercises.
The advanced tutorial builds on this course.
8adv) NLP with Prolog (advanced tutorial)
Hans-Christian Schmitz & Bernhard Schröder
Bonn
1 session
The course gives an overview of language processing techniques in Prolog.
Parsing strategies are reviewed.
We augment a top-down parser with a chart mechanism and discuss the
handling of feature-structure grammars.
Some practical exercises are included.
Finally, we might give a prospect of semantic interpretation and
processing of semi-structured (XML) documents.
9) Limited Domain Synthesis Exercise (FESTVOX tools)
Simon King
Edinburgh
1 session (twice)
In this practical, you will build a limited domain speech synthesiser
based on their own voice, using the Festival speech synthesis toolkit
with the Festvox voice-building tools.
The synthesiser uses a type of unit selection in which multiple examples
of each unit type are recorded and an algorithm automatically chooses
which particular example to use at synthesis time.
The type of unit selection used here is a simplified one, in which the
units are whole phones. This works very well for limited domain
applications like speaking clocks, reading the weather or sports
results, etc. With a little effort, you can build a synthesiser which is
better than general unit-selection based TTS (e.g. rVoice which is a
direct descendant of Festival) within the limited domain for which it is
designed but considerably worse outside that domain."
Online instructions
including some things you can do before coming to Barcelona.
The tutorial corresponds to 1 session and will take place twice during
the week.
9adv) Limited Domain Synthesis Exercise (advanced tutorial)
Simon King
Edinburgh
1 session
Students taking both the basic and advanced tutorials will have more
time to spend designing and building their sytems, plus we will look at
how to evaluate the system, and how to work out when - and more
importantly why - it makes mistakes. There will be enough time to try
correcting some of the problems you find with your system, and to listen
to the improved output.
|