From 221eb35cdafc444dae0607633b709f661c49f185 Mon Sep 17 00:00:00 2001 From: PyYoshi Date: Thu, 21 Jun 2012 00:07:12 +0900 Subject: [PATCH] updated --- readme.md | 99 +++++++++++++++++++++++++++++++++++++++++-------------- 1 file changed, 75 insertions(+), 24 deletions(-) diff --git a/readme.md b/readme.md index c95194b..ffa9730 100644 --- a/readme.md +++ b/readme.md @@ -1,59 +1,110 @@ # cChardet -This library is high speed universal character encoding detector. - binding to libcharsetdetect +This library is high speed universal character encoding detector. - binding to [charsetdetect](https://bitbucket.org/medoc/uchardet-enhanced/overview). + +This library is faster than [chardet](http://pypi.python.org/pypi/chardet). + +# Support codecs +* Big5 +* EUC-JP +* EUC-KR +* GB18030 +* gb18030 +* HZ-GB-2312 +* IBM855 +* IBM866 +* ISO-2022-CN +* ISO-2022-JP +* ISO-2022-KR +* ISO-8859-2 +* ISO-8859-5 +* ISO-8859-7 +* ISO-8859-8 +* KOI8-R +* Shift_JIS +* TIS-620 +* UTF-8 +* UTF-16BE +* UTF-16LE +* UTF-32BE +* UTF-32LE +* windows-1250 +* windows-1251 +* windows-1252 +* windows-1253 +* windows-1255 +* x-euc-tw +* X-ISO-10646-UCS-4-2143 +* X-ISO-10646-UCS-4-3412 +* x-mac-cyrillic # Requires -Cython: [http://www.cython.org/](http://www.cython.org/) +* Cython: [http://www.cython.org/](http://www.cython.org/) -uchardet-enhanced: [https://bitbucket.org/medoc/uchardet-enhanced/overview](https://bitbucket.org/medoc/uchardet-enhanced/overview) +* uchardet-enhanced: [https://bitbucket.org/medoc/uchardet-enhanced/overview](https://bitbucket.org/medoc/uchardet-enhanced/overview) # Install ### Build uchardet-enhanced -$cd /tmp +1. $cd /tmp -$hg clone https://bitbucket.org/medoc/uchardet-enhanced +2. $hg clone https://bitbucket.org/medoc/uchardet-enhanced -$cd uchardet-enhanced/libcharsetdetect +3. $cd uchardet-enhanced/libcharsetdetect -$./configure +4. $./configure -$make +5. $make -$sudo make install +6. $sudo make install -$ls -la /usr/local/lib +7. $ls -la /usr/local/lib -$ls -la /usr/local/include +8. $ls -la /usr/local/include ### Build cChardet -$cd /tmp +1. $cd /tmp -$git clone git://github.com/PyYoshi/cChardet.git +2. $git clone git://github.com/PyYoshi/cChardet.git -$cd cChardet +3. $cd cChardet -$sudo pip install or easy_install -U cython. (If your os is Ubuntu, I recommend that you do "sudo apt-get install python-dev cython") +4. $sudo pip install or easy_install -U cython. (If your os is Ubuntu, I recommend that you do "sudo apt-get install python-dev cython") -$python setup.py build +5. $python setup.py build -$sudo python setup.py install +6. $sudo python setup.py install + +# Example + +```python +# coding: utf8 +import cchardet +msg = u'One Thousand and One Nights' +result = cchardet.detect(msg.encode('sjis')) +print(result) +``` + +# Test +* sudo easy_install or pip install -U chardet nose + +* $nosetests --nocapture tests.py # Benchmark -see tests.TestCchardetSpeed +see [tests.TestCchardetSpeed](https://github.com/PyYoshi/cChardet/blob/master/tests.py#L414) ### Sample(shift_jis): -testdata/wikipediaJa_One_Thousand_and_One_Nights_SJIS.txt +* [testdata/wikipediaJa_One_Thousand_and_One_Nights_SJIS.txt](https://github.com/PyYoshi/cChardet/blob/master/testdata/wikipediaJa_One_Thousand_and_One_Nights_SJIS.txt) ### PC Spec.: -CPU: Intel Core i7 860 2.8GHz +* CPU: Intel Core i7 860 2.8GHz -RAM: DDR3-1333 16GB +* RAM: DDR3-1333 16GB -Platform: Windows 7 HP x64, Python 2.7.3 32-bit +* Platform: Windows 7 HP x64, Python 2.7.3 32-bit ### Result: -chardet: 4.009999990463257s, shift_jis +* chardet: 4.009999990463257s, shift_jis -cchardet: 0.0009999275207519531s, shift_jis +* cchardet: 0.0009999275207519531s, shift_jis # Contact [My blog](http://blog.remu.biz)