diff --git a/readme.md b/readme.md index a262648..e8152e8 100644 --- a/readme.md +++ b/readme.md @@ -4,6 +4,7 @@ This library is high speed universal character encoding detector. - binding to [charsetdetect](https://bitbucket.org/medoc/uchardet-enhanced/overview). This library is faster than [chardet](http://pypi.python.org/pypi/chardet). + # Support codecs * Big5 * EUC-JP @@ -37,12 +38,14 @@ This library is faster than [chardet](http://pypi.python.org/pypi/chardet). * X-ISO-10646-UCS-4-2143 * X-ISO-10646-UCS-4-3412 * x-mac-cyrillic + # Requires * Cython: [http://www.cython.org/](http://www.cython.org/) e.g.) Ubuntu 12.04 $sudo apt-get install build-essential python-dev cython + # Installation $cd /tmp @@ -57,6 +60,7 @@ e.g.) Ubuntu 12.04 or $sudo easy_install cchardet + # Example ```python @@ -68,22 +72,26 @@ print(result) result2 = cchardet.detect_with_confidence(msg) print(result2) ``` + # Test $sudo easy_install or pip install -U chardet nose $cd test $nosetests --nocapture tests.py + # Benchmark code: [tests.TestCchardetSpeed](https://github.com/PyYoshi/cChardet/blob/master/test/tests.py#L415) sample: [test/testdata/wikipediaJa_One_Thousand_and_One_Nights_SJIS.txt](https://github.com/PyYoshi/cChardet/blob/master/test/testdata/wikipediaJa_One_Thousand_and_One_Nights_SJIS.txt) + ### Performance: CPU: Intel Core i7 860 2.8GHz RAM: DDR3-1333 16GB Platform: Windows 7 HP x64, Python 2.7.3 32-bit + ### Result: @@ -97,14 +105,17 @@ Platform: Windows 7 HP x64, Python 2.7.3 32-bit
cchardet500.03shift_jis
+ # License * This library files("cchardet.pyx","setup.py","tests.py") are "The MIT License". * Other Libraries License: Please, look at the [ext](https://github.com/PyYoshi/cChardet/tree/master/src/ext) directory. + # Thanks * [https://bitbucket.org/medoc/uchardet-enhanced/overview](https://bitbucket.org/medoc/uchardet-enhanced/overview) * [http://www.cython.org/](http://www.cython.org/) + # Contact [My blog](http://blog.remu.biz) diff --git a/readme.rst b/readme.rst index 3cfabd3..83c519e 100644 --- a/readme.rst +++ b/readme.rst @@ -1,10 +1,16 @@ +.. raw:: html + + + cChardet ======== This library is high speed universal character encoding detector. - -binding to `charsetdetect`_. +binding to +`charsetdetect `_. -This library is faster than `chardet`_. +This library is faster than +`chardet `_. Support codecs ============== @@ -32,12 +38,12 @@ Support codecs - UTF-16LE - UTF-32BE - UTF-32LE -- windows-1250 -- windows-1251 -- windows-1252 -- windows-1253 -- windows-1255 -- x-euc-tw +- WINDOWS-1250 +- WINDOWS-1251 +- WINDOWS-1252 +- WINDOWS-1253 +- WINDOWS-1255 +- EUC-TW - X-ISO-10646-UCS-4-2143 - X-ISO-10646-UCS-4-3412 - x-mac-cyrillic @@ -45,80 +51,149 @@ Support codecs Requires ======== -- Cython: `http://www.cython.org/`_ +- Cython: `http://www.cython.org/ `_ -Install -======= +e.g.) Ubuntu 12.04 -1. $cd /tmp +:: -2. $git clone git://github.com/PyYoshi/cChardet.git + $sudo apt-get install build-essential python-dev cython -3. $cd cChardet +Installation +============ -4. $python setup.py build +:: -5. $sudo python setup.py install + $cd /tmp + + $git clone git://github.com/PyYoshi/cChardet.git + + $cd cChardet + + $python setup.py build + + $sudo python setup.py install + +or + +:: + + $sudo easy_install cchardet Test ==== -- $sudo easy\_install or pip install -U chardet nose +:: -- $nosetests –nocapture tests.py + $sudo easy_install or pip install -U chardet nose + + $cd test + + $nosetests --nocapture tests.py Benchmark ========= -see `tests.TestCchardetSpeed`_ +code: +`tests.TestCchardetSpeed `_ -Sample(shift\_jis): -~~~~~~~~~~~~~~~~~~~ +sample: +`test/testdata/wikipediaJa\_One\_Thousand\_and\_One\_Nights\_SJIS.txt `_ -- `test/testdata/wikipediaJa\_One\_Thousand\_and\_One\_Nights\_SJIS.txt`_ +Performance: +~~~~~~~~~~~~ -PC Spec.: -~~~~~~~~~ +CPU: Intel Core i7 860 2.8GHz -- CPU: Intel Core i7 860 2.8GHz +RAM: DDR3-1333 16GB -- RAM: DDR3-1333 16GB - -- Platform: Windows 7 HP x64, Python 2.7.3 32-bit +Platform: Windows 7 HP x64, Python 2.7.3 32-bit Result: ~~~~~~~ -- chardet: 4.009999990463257s, shift\_jis +.. raw:: html -- cchardet: 0.0009999275207519531s, shift\_jis + + + + + + + + + + +
+ +Request (call/s) + +.. raw:: html + + + +Result of encoding + +.. raw:: html + +
+ +chardet + +.. raw:: html + + + +0.25 + +.. raw:: html + + + +shift\_jis + +.. raw:: html + +
+ +cchardet + +.. raw:: html + + + +500.03 + +.. raw:: html + + + +shift\_jis + +.. raw:: html + +
License ======= -- This library files(“cchardet.pyx”,“setup.py”,“tests.py”) are “The MIT - License”. +- This library files("cchardet.pyx","setup.py","tests.py") are "The MIT + License". -- Other Library License: Please, look at the “ext” directory. +- Other Libraries License: Please, look at the + `ext `_ + directory. Thanks ====== -- `https://bitbucket.org/medoc/uchardet-enhanced/overview`_ +- `https://bitbucket.org/medoc/uchardet-enhanced/overview `_ -- `http://www.cython.org/`_ +- `http://www.cython.org/ `_ Contact ======= -`My blog`_ +`My blog `_ Sorry for my poor English :) - -.. _charsetdetect: https://bitbucket.org/medoc/uchardet-enhanced/overview -.. _chardet: http://pypi.python.org/pypi/chardet -.. _`http://www.cython.org/`: http://www.cython.org/ -.. _tests.TestCchardetSpeed: https://github.com/PyYoshi/cChardet/blob/master/test/tests.py#L415 -.. _test/testdata/wikipediaJa\_One\_Thousand\_and\_One\_Nights\_SJIS.txt: https://github.com/PyYoshi/cChardet/blob/master/test/testdata/wikipediaJa_One_Thousand_and_One_Nights_SJIS.txt -.. _`https://bitbucket.org/medoc/uchardet-enhanced/overview`: https://bitbucket.org/medoc/uchardet-enhanced/overview -.. _My blog: http://blog.remu.biz