scfg_train Train the parameters of a stochastic context free grammar

Table of Contents


scfg_train [options [-grammar ifile] [-corpus ifile] [-method string " {inout}"] [-passes int " {50}"] [-startpass int " {0}"] [-spread int] [-checkpoint int] [-heap int " {210000}"] [-o ofile]

scfg_train takes a stochastic context free grammar (SCFG) and trains the probabilities with repsect to a given bracket corpus using the inside-outside algorithm. This is basically an implementation of Pereira and Schabes 1992. Note using this program properly may require months of CPU time.



ifile Grammar file, one rule per line.


ifile Corpus file, one bracketed sentence per line.


string " {inout}" Method for training: inout.


int " {50}" Number of training passes.


int " {0}" Starting at pass N.


int Spread training data over N passes.


int Save grammar every N passes


int " {210000}" Set size of Lisp heap, needed for large corpora


ofile Output file for trained grammar.