A high-speed character set detection library

Edit Package libguess

A high-speed character set detection library

libguess has two functions:

libguess_determine_encoding(const char *inbuf, int length, const char *region);
This detects a character set. Returns an appropriate charset name that can be
passed to iconv_open(). Region is the name of the language or region that the
data is related to, e.g. 'Baltic' for the Baltic states, or 'Japanese' for

libguess_validate_utf8(const char *inbuf, int length);
This employs libguess's DFA-based character set validation rules to ensure that
a string is pure UTF-8. GLib's UTF-8 validation functions are broken, for

Just include libguess.h and link to libguess to get these functions in your
program. For your convenience, a pkg-config file is also supplied.
libguess employs discrete-finite automata to deduce the character set of the
input buffer. The advantage of this is that all character sets can be checked
in parallel, and quickly. Right now, libguess passes a byte to each DFA on the
same pass, meaning that the winning character set can be deduced as efficiently
as possible.
libguess is fully reentrant, using only local stack memory for DFA operations.

  • Sources inherited from project openSUSE:13.1
  • Download package
  • osc -A https://api.opensuse.org checkout openSUSE:13.1:Update/libguess && cd $_
  • Create Badge
Source Files
Filename Size Changed
baselibs.conf 0000000010 10 Bytes over 11 years
libguess-1.1-libmowgli.h-path.patch 0000000338 338 Bytes almost 10 years
libguess-1.1.tar.bz2 0000079915 78 KB almost 11 years
libguess.changes 0000000759 759 Bytes almost 10 years
libguess.spec 0000003700 3.61 KB almost 10 years
Comments 0
openSUSE Build Service is sponsored by