mkcls is a tool to train word classes by using a maximum-likelihood-criterion. The resulting word classes are especially suited for language models or statistical translation models. The program mkcls was written by Franz Josef Och.
mkcls [-nnum] [-ptrain] [-Vfile] opt
-V output classes
-n number of optimization runs (Default: 1); larger number => better results
-p filename of training corpus (Default: 'train')
Example:
mkcls -c80 -n10 -pkorpus -Vkats opt
(generates 80 classes for the corpus 'in' and writes the classes in 'out')
It is released under the GNU Public License (GPL).
Franz Josef Och: »Maximum-Likelihood-Schätzung von Wortkategorien mit Verfahren der kombinatorischen Optimierung« Studienarbeit, Universität Erlangen-Nürnberg, Germany,1995.