Euromasters summer school 2005
Tutorial 8: Building Corpora from ScratchPavel Rychly, Masaryk University, Brno, Czech Republic
Text corpora are collections of samples of written and/or spoken texts from a wide range of sources, designed to represent language usage. After an overview of corpus linguistics, corpus types, corpus building and usage, participants will learn how to create their own corpora using language from freely available sources. They will learn to use a wide range of statistical methods on small and large corpora. Advanced students will learn how to annotate a corpus. The aim of the tutorial is to provide the participants with practical methods based on simple Unix tools and the Manatee corpus handling system.
Location of tutorial files/group/cstr/projects/euromasters/tutorial8 on the Informatics linux machines ("DICE" machines)