updated
This commit is contained in:
parent
66670a0e3f
commit
221eb35cda
1 changed files with 75 additions and 24 deletions
99
readme.md
99
readme.md
|
@ -1,59 +1,110 @@
|
|||
# cChardet
|
||||
This library is high speed universal character encoding detector. - binding to libcharsetdetect
|
||||
This library is high speed universal character encoding detector. - binding to [charsetdetect](https://bitbucket.org/medoc/uchardet-enhanced/overview).
|
||||
|
||||
This library is faster than [chardet](http://pypi.python.org/pypi/chardet).
|
||||
|
||||
# Support codecs
|
||||
* Big5
|
||||
* EUC-JP
|
||||
* EUC-KR
|
||||
* GB18030
|
||||
* gb18030
|
||||
* HZ-GB-2312
|
||||
* IBM855
|
||||
* IBM866
|
||||
* ISO-2022-CN
|
||||
* ISO-2022-JP
|
||||
* ISO-2022-KR
|
||||
* ISO-8859-2
|
||||
* ISO-8859-5
|
||||
* ISO-8859-7
|
||||
* ISO-8859-8
|
||||
* KOI8-R
|
||||
* Shift_JIS
|
||||
* TIS-620
|
||||
* UTF-8
|
||||
* UTF-16BE
|
||||
* UTF-16LE
|
||||
* UTF-32BE
|
||||
* UTF-32LE
|
||||
* windows-1250
|
||||
* windows-1251
|
||||
* windows-1252
|
||||
* windows-1253
|
||||
* windows-1255
|
||||
* x-euc-tw
|
||||
* X-ISO-10646-UCS-4-2143
|
||||
* X-ISO-10646-UCS-4-3412
|
||||
* x-mac-cyrillic
|
||||
|
||||
# Requires
|
||||
Cython: [http://www.cython.org/](http://www.cython.org/)
|
||||
* Cython: [http://www.cython.org/](http://www.cython.org/)
|
||||
|
||||
uchardet-enhanced: [https://bitbucket.org/medoc/uchardet-enhanced/overview](https://bitbucket.org/medoc/uchardet-enhanced/overview)
|
||||
* uchardet-enhanced: [https://bitbucket.org/medoc/uchardet-enhanced/overview](https://bitbucket.org/medoc/uchardet-enhanced/overview)
|
||||
|
||||
# Install
|
||||
### Build uchardet-enhanced
|
||||
$cd /tmp
|
||||
1. $cd /tmp
|
||||
|
||||
$hg clone https://bitbucket.org/medoc/uchardet-enhanced
|
||||
2. $hg clone https://bitbucket.org/medoc/uchardet-enhanced
|
||||
|
||||
$cd uchardet-enhanced/libcharsetdetect
|
||||
3. $cd uchardet-enhanced/libcharsetdetect
|
||||
|
||||
$./configure
|
||||
4. $./configure
|
||||
|
||||
$make
|
||||
5. $make
|
||||
|
||||
$sudo make install
|
||||
6. $sudo make install
|
||||
|
||||
$ls -la /usr/local/lib
|
||||
7. $ls -la /usr/local/lib
|
||||
|
||||
$ls -la /usr/local/include
|
||||
8. $ls -la /usr/local/include
|
||||
|
||||
### Build cChardet
|
||||
$cd /tmp
|
||||
1. $cd /tmp
|
||||
|
||||
$git clone git://github.com/PyYoshi/cChardet.git
|
||||
2. $git clone git://github.com/PyYoshi/cChardet.git
|
||||
|
||||
$cd cChardet
|
||||
3. $cd cChardet
|
||||
|
||||
$sudo pip install or easy_install -U cython. (If your os is Ubuntu, I recommend that you do "sudo apt-get install python-dev cython")
|
||||
4. $sudo pip install or easy_install -U cython. (If your os is Ubuntu, I recommend that you do "sudo apt-get install python-dev cython")
|
||||
|
||||
$python setup.py build
|
||||
5. $python setup.py build
|
||||
|
||||
$sudo python setup.py install
|
||||
6. $sudo python setup.py install
|
||||
|
||||
# Example
|
||||
|
||||
```python
|
||||
# coding: utf8
|
||||
import cchardet
|
||||
msg = u'One Thousand and One Nights'
|
||||
result = cchardet.detect(msg.encode('sjis'))
|
||||
print(result)
|
||||
```
|
||||
|
||||
# Test
|
||||
* sudo easy_install or pip install -U chardet nose
|
||||
|
||||
* $nosetests --nocapture tests.py
|
||||
|
||||
# Benchmark
|
||||
see tests.TestCchardetSpeed
|
||||
see [tests.TestCchardetSpeed](https://github.com/PyYoshi/cChardet/blob/master/tests.py#L414)
|
||||
|
||||
### Sample(shift_jis):
|
||||
testdata/wikipediaJa_One_Thousand_and_One_Nights_SJIS.txt
|
||||
* [testdata/wikipediaJa_One_Thousand_and_One_Nights_SJIS.txt](https://github.com/PyYoshi/cChardet/blob/master/testdata/wikipediaJa_One_Thousand_and_One_Nights_SJIS.txt)
|
||||
|
||||
### PC Spec.:
|
||||
CPU: Intel Core i7 860 2.8GHz
|
||||
* CPU: Intel Core i7 860 2.8GHz
|
||||
|
||||
RAM: DDR3-1333 16GB
|
||||
* RAM: DDR3-1333 16GB
|
||||
|
||||
Platform: Windows 7 HP x64, Python 2.7.3 32-bit
|
||||
* Platform: Windows 7 HP x64, Python 2.7.3 32-bit
|
||||
|
||||
### Result:
|
||||
chardet: 4.009999990463257s, shift_jis
|
||||
* chardet: 4.009999990463257s, shift_jis
|
||||
|
||||
cchardet: 0.0009999275207519531s, shift_jis
|
||||
* cchardet: 0.0009999275207519531s, shift_jis
|
||||
|
||||
# Contact
|
||||
[My blog](http://blog.remu.biz)
|
||||
|
|
Loading…
Reference in a new issue