Student Presentations

Kim van Esbroeck

Name: GOOFI : MULTI WORD TERM RECOGNITION
Abstract:

The general aim of the GOOFI project (GOOds FInder), in which the internship can be situated, is to build an NLP model for the comparison of Trademark Goods Descriptions. The comparison of texts is a heavily discussed topic in the information retrieval domain and can have large implications because of the richness of natural language. In Trademark research systems, one explores the existing trademark population for confusingly similar marks. This is done in a Full Availability Search. In order to reach a higher consistency and productivity, GOOFI was set up to provide a distance measure on goods texts. A crucial step in deriving such a measure is the Term Extraction phase, in which the most meaningful terms of a goods text are extracted in order to derive one or several concepts of the goods text. The scope of the internship was to implement this subpart of the larger project in JAVA in order to obtain a refinement of the results obtained after applying a rough Term Extraction algorithm.

Dirk Vervloet

Abstract:

The work done during my internship at Financial Architects consists of two parts. The first assignment involved creating a dialog for the access of quote info and the transaction of shares, using natural language to accomplish this. The speech technology software available was L&H ASR 1600. I develped a general strategy, the architecture of the dialog and I designed the grammars to model the answers of the user.

The product of Financial Architects, FinVoice, currently uses a menu-driven approach. The user navigates through the dialogs mainly by using one-word commands. In the quote dialog for example, the caller is asked to specify the market and the company one-by-one. If the recognition is not good at any time, the system asks for confirmation of the first N-best results one at a time. Next, the system states the quote (using a default price). The customer is then able to change one of these three: the market, company or the price. Or he can go to the next dialog (using a buy or sell command), namely the transaction dialog. The same strategy is used in the transaction dialog. The user specifies the amount of shares and the price (the limit price with a limit or the market price) separately. In case of a limit price, it’s also necessary to specify the date of execution. When all the needed information is gathered, the user is asked to confirm his order. A negative answer leads to the question to indicate which element is wrong. The incorrect piece of information can then be restated. A positive answer completes the transaction.

The second assignment was to put more intelligence in the existing system described above. More specifically, they had problems with the recognition of the company "Agfa". It was always mistaken for "Dexia". They wanted me to create a strategy that learns to correct these frequent errors. Two options for this strategy are given. Also, they had trouble with the recognition of an account number consisting of six numbers and a letter. I developed a method to handle this situation.

Stijn De Saeger

Abstract:

The project I did was basically some sort of thought experiment. The idea was to investigate whether, given sufficient domain knowledge, it would be feasible to parse unrestricted medical text from patient discharge letters, without explicitly analyzing the syntax of the text/sentences - thereby treating parsing as some kind of classification task in order to generate a semantic representation of the text. The domain knowledge that was provided for this task consisted of a fully specified medical domain ontology (around 80.000 concepts and over a million terms) that could be exploited as "search space" for the conceptual configuration of the content of a text.

The way I approached this project basically involves 3 components. First, a module with some algorithms to compute possible relations between the medical concepts mentioned in the letters. Second, a system of heuristics based on Resnik as notion of relative entropy and selectional association. And third, a module that generates all possible semantic hyper graphs from a given sentence with the results from the other two modules, and ranks these graphs according to optimality theoretic principles.

Paolo Martini

Name:Internship at YY Software Company
Abstract:

The internship at YY Software consisted of carrying out the given project.

The project’s goals are:

Project’s Global goal

To provide intelligent auto-response software for eBusiness platforms using speech. To use the company’s breakthrough-technology products that automatically read and respond to electronic inquiries with the highest levels of accuracy and functionality, using a speech interface.

Project’s Specific goal

One specific goal of the project is to be able to automatically generate the required grammars needed by the speech recognition system, using the company’s technology.

The following tasks were realized in order to achieve the project’s goals:

Generate reference data in order to compare between manual made grammars and automatic made grammars.
Create a program that will take the parsed corpora taken from the company’s language server and automatically convert it to the Nuance grammar, which is in GSL syntax form.
Create a demo that integrates the recognition system and the company’s language server, using a telephony platform.

Kris Heylen

Name: A Machine-Learning Approach to Natural Language Processing for a Dutch Text-to-Speech System
Abstract:

As part of an ongoing project at Telia Promotor Infovox (Sweden) that aims to replace the Infovox Text-to-Speech system's current rule-based NLP-component by a data-driven Machine-Learning one, a Machine-Learning NLP-component was developed for Dutch. This was done as a test case to study the problems that occur when applying the technology originally developed for Swedish to a new language. The NLP-component consisted of different sub-components, viz. a grapheme-to-phoneme converter, a part-of-speech tagger, and 2 sub-components that assigned phrase breaks and prominence respectively. The Machine-Learning technique used was Memory Based Learning (also called Instance Based Learning). The different sub-components were trained on a pronunciation lexicon and on a corpus annotated with part-of-speech, phrase breaks and prominence respectively.

Francis Real Vázquez

Name: Automatic Treatment of the Syntactic Dependences
Abstract:

The objective of this article is to give a brief introduction of the treatment of the syntactic dependences and show a little example about its application in real systems. The chapter 2 shows what is that we understand for syntactic dependences and how his treatment can help in the syntactic analysis. Later we will center on a type of automatic analysis with chunks. In this way, the chapter 3 explain about the analysis of the chunks consists and the chapter 4 shows how the automatic treatment of the syntactic dependences will be applied to the result of the analysis of the chunks.

Along the whole of the article two functional analysers, the CHAOS system and the TACAT system, will be referenced. The CHAOS system is a syntactic analyser created by the natural language research group from the University of Tor Vergata, Rome [2]. This system includes a processing of the syntactic dependences. The other system, the TACAT system, is an analyser created by the Natural Language Processing Group from the University of Cataluña (UPC) [3]. This system does not has an automatic treatment of the syntactic dependences, but recently a new extension of this system was proposed. This new extension includes a treatment of syntactic dependences. The chapter 4 explain the details of the treatment of syntactic dependences in the new extension and a few examples of its functionality.