LogoopenSUSE Build Service > Projects
Sign Up | Log In

Text Charset and Language Guesser

mguesser is a standalong part of libudmsearch (a core of mnogo search engine http://mnogosearch.org) which allows to guess text's charset and language.

Guessing is implemented using "N-Gram-Based Text Categorization" technique which is implemented in TextCat language guesser written in Perl (http://www.let.rug.nl/~vannoord/TextCat/). mguesser is significantly faster than TextCat especially on large texts.

This package consist of C written N-gram based algorythms as well as a number of maps for texts in various languages and charsets. Take a look into "maps" directory of this package to check currently supported languages and charsets.

Source Files

Filename Size Changed Actions
mguesser-0.4.tar.bz2 126 KB Download File
mguesser-fix_printf_format.patch 296 Bytes Download File
mguesser-makefile.patch 572 Bytes Download File
mguesser.spec 1.96 KB Download File

Comments for home:zhonghuaren (0)

Login required, please login or signup in order to comment