universal character encoding detector
Find a file
2012-06-20 21:45:29 +09:00
testdata rename testdata files. 2012-06-20 21:45:29 +09:00
.gitignore first commit 2012-06-20 10:41:36 +09:00
cchardet.pyx first commit 2012-06-20 10:41:36 +09:00
charsetdetect.h first commit 2012-06-20 10:41:36 +09:00
libcharsetdetect.dll first commit 2012-06-20 10:41:36 +09:00
readme.md add spec 2012-06-20 11:40:03 +09:00
setup.py first commit 2012-06-20 10:41:36 +09:00
tests.py rename testdata files. 2012-06-20 21:45:29 +09:00

cChardet

This library is high speed universal character encoding detector. - binding to libcharsetdetect

Requires

Cython: http://www.cython.org/

uchardet-enhanced: https://bitbucket.org/medoc/uchardet-enhanced/overview

pip install or easy_install -U cython

Benchmark

see tests.TestCchardetSpeed

Sample(shift_jis):

testdata/wikipediaJa_One_Thousand_and_One_Nights.txt

PC Spec.:

CPU: Intel Core i7 860 2.8GHz

RAM: DDR3-1333 16GB

Result:

chardet: 4.009999990463257s, shift_jis

cchardet: 0.0009999275207519531s, shift_jis

Contact

My blog

Sorry for my poor English :)