Text Charset and Language Guesser

Overview Repositories Revisions Requests Users Attributes Meta

Text Charset and Language Guesser

mguesser is a standalong part of libudmsearch (a core of mnogo search engine http://mnogosearch.org) which allows to guess text's charset and language.

Guessing is implemented using "N-Gram-Based Text Categorization" technique which is implemented in TextCat language guesser written in Perl (http://www.let.rug.nl/~vannoord/TextCat/). mguesser is significantly faster than TextCat especially on large texts.

This package consist of C written N-gram based algorythms as well as a number of maps for texts in various languages and charsets. Take a look into "maps" directory of this package to check currently supported languages and charsets.

Download package
Checkout Package
osc -A https://api.opensuse.org checkout home:zhonghuaren/mguesser && cd $_
Create Badge

Build Results
RPM Lint

Refresh

Source Files

Filename	Size	Changed
mguesser-0.4.tar.bz2	0000128769 126 KB	about 14 years ago
mguesser-fix_printf_format.patch	0000000296 296 Bytes	about 16 years ago
mguesser-makefile.patch	0000000572 572 Bytes	about 14 years ago
mguesser.spec	0000002004 1.96 KB	over 12 years ago

Latest Revision

Huaren Zhong (zhonghuaren) committed over 12 years ago (revision 4)

Places

Actions on this page