cChardet

cChardet is high speed universal character encoding detector. - binding to charsetdetect.

Support codecs

Big5
EUC-JP
EUC-KR
GB18030
HZ-GB-2312
IBM855
IBM866
ISO-2022-CN
ISO-2022-JP
ISO-2022-KR
ISO-8859-2
ISO-8859-5
ISO-8859-7
ISO-8859-8
KOI8-R
Shift_JIS
TIS-620
UTF-8
UTF-16BE
UTF-16LE
UTF-32BE
UTF-32LE
WINDOWS-1250
WINDOWS-1251
WINDOWS-1252
WINDOWS-1253
WINDOWS-1255
EUC-TW
X-ISO-10646-UCS-4-2143
X-ISO-10646-UCS-4-3412
x-mac-cyrillic

Requires

Cython: http://www.cython.org/

Installation

$ cd /tmp
$ git clone git://github.com/PyYoshi/cChardet.git
$ cd cChardet
$ python setup.py install

$ pip install -U cchardet

Example

# -*- coding: utf-8 -*-
import cchardet as chardet
with open(r"src/tests/testdata/wikipediaJa_One_Thousand_and_One_Nights_SJIS.txt", "rb") as f:
    msg = f.read()
    result = chardet.detect(msg)
    print(result)

Benchmark

$ cd src/
$ pip install chardet
$ python tests/bench.py

Performance

CPU: Intel(R) Core(TM) i3-4170 CPU @ 3.70GHz

RAM: DDR3 1600Mhz 16GB

Platform: Ubuntu 16.04 amd64

Python 2.7.12

	Request (call/s)
chardet	0.26
cchardet	1408.73

Python 3.5.2

	Request (call/s)
chardet	0.28
cchardet	1380.40

2.2 KiB

Raw Blame History

cChardet

Support codecs

Requires

Installation

Example

Benchmark

Performance

Python 2.7.12

Python 3.5.2

License

Thanks

Contact

2.2 KiB Raw Blame History

cChardet

Support codecs

Requires

Installation

Example

Benchmark

Performance

Python 2.7.12

Python 3.5.2

License

Thanks

Contact

2.2 KiB

Raw Blame History