Text Charset and Language Guesser
mguesser is a standalong part of libudmsearch (a core of mnogo search engine http://mnogosearch.org) which allows to guess text's charset and language.
Guessing is implemented using "N-Gram-Based Text Categorization" technique which is implemented in TextCat language guesser written in Perl (http://www.let.rug.nl/~vannoord/TextCat/). mguesser is significantly faster than TextCat especially on large texts.
This package consist of C written N-gram based algorythms as well as a number of maps for texts in various languages and charsets. Take a look into "maps" directory of this package to check currently supported languages and charsets.
- Download package
-
Checkout Package
osc -A https://api.opensuse.org checkout home:zhonghuaren/mguesser && cd $_
- Create Badge
Refresh
Refresh
Source Files
Filename | Size | Changed |
---|---|---|
mguesser-0.4.tar.bz2 | 0000128769 126 KB | |
mguesser-fix_printf_format.patch | 0000000296 296 Bytes | |
mguesser-makefile.patch | 0000000572 572 Bytes | |
mguesser.spec | 0000002004 1.96 KB |
Latest Revision
Huaren Zhong (zhonghuaren)
committed
(revision 4)
Comments 0