Text Charset and Language Guesser

Edit Package mguesser

mguesser is a standalong part of libudmsearch (a core of mnogo search engine http://mnogosearch.org) which allows to guess text's charset and language.

Guessing is implemented using "N-Gram-Based Text Categorization" technique which is implemented in TextCat language guesser written in Perl (http://www.let.rug.nl/~vannoord/TextCat/). mguesser is significantly faster than TextCat especially on large texts.

This package consist of C written N-gram based algorythms as well as a number of maps for texts in various languages and charsets. Take a look into "maps" directory of this package to check currently supported languages and charsets.

Refresh
Refresh
Source Files
Filename Size Changed
mguesser-0.4.tar.bz2 0000128769 126 KB
mguesser-fix_printf_format.patch 0000000296 296 Bytes
mguesser-makefile.patch 0000000572 572 Bytes
mguesser.spec 0000002004 1.96 KB
Latest Revision
Huaren Zhong's avatar Huaren Zhong (zhonghuaren) committed (revision 4)
Comments 0
openSUSE Build Service is sponsored by