home page
home page

Back to menu

Euromasters summer school 2005

Tutorial 8: Building Corpora from Scratch

Pavel Rychly, Masaryk University, Brno, Czech Republic

Text corpora are collections of samples of written and/or spoken texts from a wide range of sources, designed to represent language usage. After an overview of corpus linguistics, corpus types, corpus building and usage, participants will learn how to create their own corpora using language from freely available sources. They will learn to use a wide range of statistical methods on small and large corpora. Advanced students will learn how to annotate a corpus. The aim of the tutorial is to provide the participants with practical methods based on simple Unix tools and the Manatee corpus handling system.

screenshot

Back to registration page


Location of tutorial files

/group/cstr/projects/euromasters/tutorial8 on the Informatics linux machines ("DICE" machines)

Notes

  • Your PATH needs to include /group/cstr/projects/euromasters/tutorial8/bin