scfg_train Train the parameters of a stochastic context free grammar

Table of Contents
Synopsis
OPTIONS

Synopsis

scfg_train [options [-grammar ifile] [-corpus ifile] [-method string " {inout}"] [-passes int " {50}"] [-startpass int " {0}"] [-spread int] [-checkpoint int] [-heap int " {210000}"] [-o ofile]

scfg_train takes a stochastic context free grammar (SCFG) and trains the probabilities with repsect to a given bracket corpus using the inside-outside algorithm. This is basically an implementation of Pereira and Schabes 1992. Note using this program properly may require months of CPU time.

OPTIONS

-grammar

ifile Grammar file, one rule per line.

-corpus

ifile Corpus file, one bracketed sentence per line.

-method

string " {inout}" Method for training: inout.

-passes

int " {50}" Number of training passes.

-startpass

int " {0}" Starting at pass N.

-spread

int Spread training data over N passes.

-checkpoint

int Save grammar every N passes

-heap

int " {210000}" Set size of Lisp heap, needed for large corpora

-o

ofile Output file for trained grammar.