uchardet

Jul 2011

uchardet is a C language binding of the original C++ implementation of the universal charset detection library by Mozilla. uchardet is an encoding detector library, which takes a sequence of bytes in an unknown character encoding without any additional information, and attempts to determine the encoding of the text.

When I was developing the graphic user interface of OpenCC, I attempted to find a library to guess the encoding of plain texts. Then I found Mozilla universalchardet, which is a part of Firefox and Seamonkey for detecting web pages' encodings. Unluckily, it is not compatible with C language. Interestingly, there are many ports of other language:

There is barely no C language version of the ports. So I did the package work, separated it from Mozilla, and published it as a stand-alone library (libuchardet). Now libuchardet is accepted by Debian package system.

uchardet