This commit is contained in:
PyYoshi 2012-06-21 00:07:12 +09:00
parent 66670a0e3f
commit 221eb35cda

View file

@ -1,59 +1,110 @@
# cChardet # cChardet
This library is high speed universal character encoding detector. - binding to libcharsetdetect This library is high speed universal character encoding detector. - binding to [charsetdetect](https://bitbucket.org/medoc/uchardet-enhanced/overview).
This library is faster than [chardet](http://pypi.python.org/pypi/chardet).
# Support codecs
* Big5
* EUC-JP
* EUC-KR
* GB18030
* gb18030
* HZ-GB-2312
* IBM855
* IBM866
* ISO-2022-CN
* ISO-2022-JP
* ISO-2022-KR
* ISO-8859-2
* ISO-8859-5
* ISO-8859-7
* ISO-8859-8
* KOI8-R
* Shift_JIS
* TIS-620
* UTF-8
* UTF-16BE
* UTF-16LE
* UTF-32BE
* UTF-32LE
* windows-1250
* windows-1251
* windows-1252
* windows-1253
* windows-1255
* x-euc-tw
* X-ISO-10646-UCS-4-2143
* X-ISO-10646-UCS-4-3412
* x-mac-cyrillic
# Requires # Requires
Cython: [http://www.cython.org/](http://www.cython.org/) * Cython: [http://www.cython.org/](http://www.cython.org/)
uchardet-enhanced: [https://bitbucket.org/medoc/uchardet-enhanced/overview](https://bitbucket.org/medoc/uchardet-enhanced/overview) * uchardet-enhanced: [https://bitbucket.org/medoc/uchardet-enhanced/overview](https://bitbucket.org/medoc/uchardet-enhanced/overview)
# Install # Install
### Build uchardet-enhanced ### Build uchardet-enhanced
$cd /tmp 1. $cd /tmp
$hg clone https://bitbucket.org/medoc/uchardet-enhanced 2. $hg clone https://bitbucket.org/medoc/uchardet-enhanced
$cd uchardet-enhanced/libcharsetdetect 3. $cd uchardet-enhanced/libcharsetdetect
$./configure 4. $./configure
$make 5. $make
$sudo make install 6. $sudo make install
$ls -la /usr/local/lib 7. $ls -la /usr/local/lib
$ls -la /usr/local/include 8. $ls -la /usr/local/include
### Build cChardet ### Build cChardet
$cd /tmp 1. $cd /tmp
$git clone git://github.com/PyYoshi/cChardet.git 2. $git clone git://github.com/PyYoshi/cChardet.git
$cd cChardet 3. $cd cChardet
$sudo pip install or easy_install -U cython. (If your os is Ubuntu, I recommend that you do "sudo apt-get install python-dev cython") 4. $sudo pip install or easy_install -U cython. (If your os is Ubuntu, I recommend that you do "sudo apt-get install python-dev cython")
$python setup.py build 5. $python setup.py build
$sudo python setup.py install 6. $sudo python setup.py install
# Example
```python
# coding: utf8
import cchardet
msg = u'One Thousand and One Nights'
result = cchardet.detect(msg.encode('sjis'))
print(result)
```
# Test
* sudo easy_install or pip install -U chardet nose
* $nosetests --nocapture tests.py
# Benchmark # Benchmark
see tests.TestCchardetSpeed see [tests.TestCchardetSpeed](https://github.com/PyYoshi/cChardet/blob/master/tests.py#L414)
### Sample(shift_jis): ### Sample(shift_jis):
testdata/wikipediaJa_One_Thousand_and_One_Nights_SJIS.txt * [testdata/wikipediaJa_One_Thousand_and_One_Nights_SJIS.txt](https://github.com/PyYoshi/cChardet/blob/master/testdata/wikipediaJa_One_Thousand_and_One_Nights_SJIS.txt)
### PC Spec.: ### PC Spec.:
CPU: Intel Core i7 860 2.8GHz * CPU: Intel Core i7 860 2.8GHz
RAM: DDR3-1333 16GB * RAM: DDR3-1333 16GB
Platform: Windows 7 HP x64, Python 2.7.3 32-bit * Platform: Windows 7 HP x64, Python 2.7.3 32-bit
### Result: ### Result:
chardet: 4.009999990463257s, shift_jis * chardet: 4.009999990463257s, shift_jis
cchardet: 0.0009999275207519531s, shift_jis * cchardet: 0.0009999275207519531s, shift_jis
# Contact # Contact
[My blog](http://blog.remu.biz) [My blog](http://blog.remu.biz)