2012-06-26 01:54:57 +00:00
<!-- markdown to rst: http://johnmacfarlane.net/pandoc/try -->
2012-06-20 01:41:36 +00:00
# cChardet
2012-06-20 15:07:12 +00:00
This library is high speed universal character encoding detector. - binding to [charsetdetect ](https://bitbucket.org/medoc/uchardet-enhanced/overview ).
This library is faster than [chardet ](http://pypi.python.org/pypi/chardet ).
2012-07-07 03:30:02 +00:00
2012-06-20 15:07:12 +00:00
# Support codecs
* Big5
* EUC-JP
* EUC-KR
* GB18030
* gb18030
* HZ-GB-2312
* IBM855
* IBM866
* ISO-2022-CN
* ISO-2022-JP
* ISO-2022-KR
* ISO-8859-2
* ISO-8859-5
* ISO-8859-7
* ISO-8859-8
* KOI8-R
* Shift_JIS
* TIS-620
* UTF-8
* UTF-16BE
* UTF-16LE
* UTF-32BE
* UTF-32LE
2012-07-07 03:19:24 +00:00
* WINDOWS-1250
* WINDOWS-1251
* WINDOWS-1252
* WINDOWS-1253
* WINDOWS-1255
* EUC-TW
2012-06-20 15:07:12 +00:00
* X-ISO-10646-UCS-4-2143
* X-ISO-10646-UCS-4-3412
* x-mac-cyrillic
2012-07-07 03:30:02 +00:00
2012-06-20 01:41:36 +00:00
# Requires
2012-06-20 15:07:12 +00:00
* Cython: [http://www.cython.org/ ](http://www.cython.org/ )
2012-07-07 03:19:24 +00:00
e.g.) Ubuntu 12.04
$sudo apt-get install build-essential python-dev cython
2012-07-07 03:30:02 +00:00
2012-07-07 03:19:24 +00:00
# Installation
$cd /tmp
2012-06-20 13:18:38 +00:00
2012-07-07 03:19:24 +00:00
$git clone git://github.com/PyYoshi/cChardet.git
2012-06-20 13:18:38 +00:00
2012-07-07 03:19:24 +00:00
$cd cChardet
2012-06-20 13:18:38 +00:00
2012-07-07 03:19:24 +00:00
$python setup.py build
2012-06-20 13:18:38 +00:00
2012-07-07 03:19:24 +00:00
$sudo python setup.py install
or
$sudo easy_install cchardet
2012-07-07 03:30:02 +00:00
2012-06-20 15:07:12 +00:00
# Example
```python
# coding: utf8
import cchardet
2012-06-26 01:18:15 +00:00
msg = file(r"test/testdata/wikipediaJa_One_Thousand_and_One_Nights_SJIS.txt").read()
2012-06-20 15:11:47 +00:00
result = cchardet.detect(msg)
2012-06-20 15:07:12 +00:00
print(result)
2012-07-05 03:05:11 +00:00
result2 = cchardet.detect_with_confidence(msg)
print(result2)
2012-06-20 15:07:12 +00:00
```
2012-07-07 03:30:02 +00:00
2012-06-20 15:07:12 +00:00
# Test
2012-07-07 03:19:24 +00:00
$sudo easy_install or pip install -U chardet nose
$cd test
2012-06-20 15:07:12 +00:00
2012-07-07 03:19:24 +00:00
$nosetests --nocapture tests.py
2012-07-07 03:30:02 +00:00
2012-06-20 02:29:50 +00:00
# Benchmark
2012-07-07 03:19:24 +00:00
code: [tests.TestCchardetSpeed ](https://github.com/PyYoshi/cChardet/blob/master/test/tests.py#L415 )
2012-06-20 02:31:41 +00:00
2012-07-07 03:19:24 +00:00
sample: [test/testdata/wikipediaJa_One_Thousand_and_One_Nights_SJIS.txt ](https://github.com/PyYoshi/cChardet/blob/master/test/testdata/wikipediaJa_One_Thousand_and_One_Nights_SJIS.txt )
2012-07-07 03:30:02 +00:00
2012-07-07 03:19:24 +00:00
### Performance:
CPU: Intel Core i7 860 2.8GHz
2012-06-20 02:31:41 +00:00
2012-07-07 03:19:24 +00:00
RAM: DDR3-1333 16GB
2012-06-20 02:40:03 +00:00
2012-07-07 03:19:24 +00:00
Platform: Windows 7 HP x64, Python 2.7.3 32-bit
2012-07-07 03:30:02 +00:00
2012-06-20 02:40:03 +00:00
### Result:
2012-06-20 02:31:41 +00:00
2012-07-07 03:19:24 +00:00
< table >
< tr >
< th > < / th > < th > Request (call/s)< / th > < th > Result of encoding< / th >
< / tr >
< tr >
< td > chardet< / td > < td > 0.25< / td > < td > shift_jis< / td >
< / tr >
< tr >
< td > cchardet< / td > < td > 500.03< / td > < td > shift_jis< / td >
< / tr >
< / table >
2012-07-07 03:30:02 +00:00
2012-06-23 03:27:19 +00:00
# License
* This library files("cchardet.pyx","setup.py","tests.py") are "The MIT License".
2012-07-07 03:19:24 +00:00
* Other Libraries License: Please, look at the [ext ](https://github.com/PyYoshi/cChardet/tree/master/src/ext ) directory.
2012-07-07 03:30:02 +00:00
2012-06-26 01:18:15 +00:00
# Thanks
* [https://bitbucket.org/medoc/uchardet-enhanced/overview ](https://bitbucket.org/medoc/uchardet-enhanced/overview )
* [http://www.cython.org/ ](http://www.cython.org/ )
2012-07-07 03:30:02 +00:00
2012-06-20 01:41:36 +00:00
# Contact
[My blog ](http://blog.remu.biz )
Sorry for my poor English :)