A CRF-based word segmenter in Java. Supports Arabic and Chinese
Some languages require extensive token pre-processing, which is usually called segmentation. The Stanford Word Segmenter currently supports Arabic and Chinese. The provided segmentation schemes have been found to work well for a variety of applications.
|stanford-segmenter-3.2.0.tar.gz||0258877056247 MB||1377078658over 5 years ago|
|stanford-segmenter.spec||00000014141.38 KB||1377079636over 5 years ago|