WWW pages of 3rd European Master School on Language and Speech

Vector Models for Term Discovery and Ontology Construction

Scott Martens
(KU Leuven)

Related term discovery is one of the most basic tasks in lexicography and is essential to ontology construction. However, it is very difficult for people to interrogate their own knowledge for terms that may enjoy some useful semantic relationship with a specific term or concept. Nowhere is this more a problem than in medical terminology, which is rich in unique words with restricted usages. This paper will describe some preliminary efforts to partially automate lexical discovery using two models for representing words as high-dimensional vectors. These vectors are derived from distributional information in a corpus of medical journal abstracts. The first model is the vector space model developed by Gerard Salton, which assumes that documents in a corpus are topically coherent and uses word frequencies to construct vector representations of words. The second is a sentence-centred model that is more sensitive to the immediate contexts of words, inspired by Curt Burgess' HAL model. Basic geometric operat

ions, such as calculating cosines between word vectors, can be used to infer important information about relationships between words. We can extract fairly reliable lists of related words from corpora using these techniques. By using cosine information to construct a spreading activation network where nodes represent words, it is possible to expand the vector model to grade relatedness over large numbers of terms. By using a conjunction of different models, we can also automatically extract the most meaningful collocations of a term from the right kind of corpus. Finally, this model can also sort words into predefined categories with fair - although imperfect - accuracy. This paper summarises some rough tests of these methods applied to medical terms and claims that they can serve lexicographers and ontology builders as a labour saving device.

Vector Models for Term Discovery and Ontology Construction

Scott Martens (KU Leuven)

Scott Martens
(KU Leuven)