diff --git a/README.markdown b/README.markdown new file mode 100644 index 0000000..fb7bd56 --- /dev/null +++ b/README.markdown @@ -0,0 +1,125 @@ + + +cChardet +======== +cChardet is high speed universal character encoding detector. - binding to [charsetdetect](https://bitbucket.org/medoc/uchardet-enhanced/overview). + +## Support codecs +* Big5 +* EUC-JP +* EUC-KR +* GB18030 +* HZ-GB-2312 +* IBM855 +* IBM866 +* ISO-2022-CN +* ISO-2022-JP +* ISO-2022-KR +* ISO-8859-2 +* ISO-8859-5 +* ISO-8859-7 +* ISO-8859-8 +* KOI8-R +* Shift_JIS +* TIS-620 +* UTF-8 +* UTF-16BE +* UTF-16LE +* UTF-32BE +* UTF-32LE +* WINDOWS-1250 +* WINDOWS-1251 +* WINDOWS-1252 +* WINDOWS-1253 +* WINDOWS-1255 +* EUC-TW +* X-ISO-10646-UCS-4-2143 +* X-ISO-10646-UCS-4-3412 +* x-mac-cyrillic + +## Requires +* Cython: [http://www.cython.org/](http://www.cython.org/) + +e.g.) Ubuntu 12.04 + +```bash +$ sudo apt-get install build-essential python-dev cython +``` + +## Installation + +```bash +$ cd /tmp +$ git clone git://github.com/PyYoshi/cChardet.git +$ cd cChardet +$ python setup.py build +$ sudo python setup.py install +``` + +or + +```bash +$ sudo easy_install cchardet +``` + +## Example + +```python +# -*- coding: utf-8 -*- +import cchardet as chardet +with open(r"test/testdata/wikipediaJa_One_Thousand_and_One_Nights_SJIS.txt") as f: + msg = f.read() +result = chardet.detect(msg) +print(result) +``` + +## Test + +```bash +$ sudo easy_install or pip install -U chardet nose +$ cd test +$ nosetests --nocapture tests.py +``` + +## Benchmark +code: [tests.TestCchardetSpeed](https://github.com/PyYoshi/cChardet/blob/master/test/tests.py#L415) + +sample: [test/testdata/wikipediaJa_One_Thousand_and_One_Nights_SJIS.txt](https://github.com/PyYoshi/cChardet/blob/master/test/testdata/wikipediaJa_One_Thousand_and_One_Nights_SJIS.txt) + +### Performance: +CPU: Intel Core i7 860 2.8GHz + +RAM: DDR3-1333 16GB + +Platform: Kubuntu 12.04 amd64, Python 2.7.3 64-bit + +### Result: + + + + + + + + + + + +
Request (call/s)
chardet0.32
cchardet1012.97
+ +### License +* The MIT License: [src/cchardet](https://github.com/PyYoshi/cChardet/tree/master/src/cchardet) + +* Other Libraries License: Please, look at the [src/ext](https://github.com/PyYoshi/cChardet/tree/master/src/ext) directory. + +### Thanks +* [uchardet-enhanced](https://bitbucket.org/medoc/uchardet-enhanced/overview) + +* [Cython](http://www.cython.org/) + +### Contact +[My blog](http://blog.remu.biz) + +[Issues](https://github.com/PyYoshi/cChardet/issues?page=1&state=open) + +Sorry for my poor English :) \ No newline at end of file diff --git a/readme.rst b/README.rst similarity index 50% rename from readme.rst rename to README.rst index 5f69439..8553537 100644 --- a/readme.rst +++ b/README.rst @@ -2,21 +2,17 @@ cChardet ======== -This library is high speed universal character encoding detector. - -binding to +cChardet is high speed universal character encoding detector. - binding +to `charsetdetect `_. -This library is faster than -`chardet `_. - Support codecs -============== +-------------- - Big5 - EUC-JP - EUC-KR - GB18030 -- gb18030 - HZ-GB-2312 - IBM855 - IBM866 @@ -46,7 +42,7 @@ Support codecs - x-mac-cyrillic Requires -======== +-------- - Cython: `http://www.cython.org/ `_ @@ -54,42 +50,48 @@ e.g.) Ubuntu 12.04 :: - $sudo apt-get install build-essential python-dev cython + $ sudo apt-get install build-essential python-dev cython Installation -============ +------------ :: - $cd /tmp - - $git clone git://github.com/PyYoshi/cChardet.git - - $cd cChardet - - $python setup.py build - - $sudo python setup.py install + $ cd /tmp + $ git clone git://github.com/PyYoshi/cChardet.git + $ cd cChardet + $ python setup.py build + $ sudo python setup.py install or :: - $sudo easy_install cchardet + $ sudo easy_install cchardet -Test -==== +Example +------- :: - $sudo easy_install or pip install -U chardet nose + # -*- coding: utf-8 -*- + import cchardet as chardet + with open(r"test/testdata/wikipediaJa_One_Thousand_and_One_Nights_SJIS.txt") as f: + msg = f.read() + result = chardet.detect(msg) + print(result) - $cd test +Test +---- - $nosetests --nocapture tests.py +:: + + $ sudo easy_install or pip install -U chardet nose + $ cd test + $ nosetests --nocapture tests.py Benchmark -========= +--------- code: `tests.TestCchardetSpeed `_ @@ -104,36 +106,39 @@ CPU: Intel Core i7 860 2.8GHz RAM: DDR3-1333 16GB -Platform: Windows 7 HP x64, Python 2.7.3 32-bit +Platform: Kubuntu 12.04 amd64, Python 2.7.3 64-bit Result: ~~~~~~~ :: - chardet: 0.25 (call/s) + chardet: 0.32 (call/s) - cchardet: 500.03 (call/s) + cchardet: 1012.97 (call/s) License -======= +~~~~~~~ -- This library files("cchardet.pyx","setup.py","tests.py") are "The MIT License". +- The MIT License: + `src/cchardet `_ - Other Libraries License: Please, look at the - `ext `_ + `src/ext `_ directory. Thanks -====== +~~~~~~ -- `https://bitbucket.org/medoc/uchardet-enhanced/overview `_ +- `uchardet-enhanced `_ -- `http://www.cython.org/ `_ +- `Cython `_ Contact -======= +~~~~~~~ `My blog `_ +`Issues `_ + Sorry for my poor English :) diff --git a/readme.md b/readme.md deleted file mode 100644 index b7af5f2..0000000 --- a/readme.md +++ /dev/null @@ -1,120 +0,0 @@ - - -# cChardet -This library is high speed universal character encoding detector. - binding to [charsetdetect](https://bitbucket.org/medoc/uchardet-enhanced/overview). - -This library is faster than [chardet](http://pypi.python.org/pypi/chardet). - -# Support codecs -* Big5 -* EUC-JP -* EUC-KR -* GB18030 -* gb18030 -* HZ-GB-2312 -* IBM855 -* IBM866 -* ISO-2022-CN -* ISO-2022-JP -* ISO-2022-KR -* ISO-8859-2 -* ISO-8859-5 -* ISO-8859-7 -* ISO-8859-8 -* KOI8-R -* Shift_JIS -* TIS-620 -* UTF-8 -* UTF-16BE -* UTF-16LE -* UTF-32BE -* UTF-32LE -* WINDOWS-1250 -* WINDOWS-1251 -* WINDOWS-1252 -* WINDOWS-1253 -* WINDOWS-1255 -* EUC-TW -* X-ISO-10646-UCS-4-2143 -* X-ISO-10646-UCS-4-3412 -* x-mac-cyrillic - -# Requires -* Cython: [http://www.cython.org/](http://www.cython.org/) - -e.g.) Ubuntu 12.04 - - $sudo apt-get install build-essential python-dev cython - -# Installation - $cd /tmp - - $git clone git://github.com/PyYoshi/cChardet.git - - $cd cChardet - - $python setup.py build - - $sudo python setup.py install - -or - - $sudo easy_install cchardet - -# Example - -```python -# coding: utf8 -import cchardet -msg = file(r"test/testdata/wikipediaJa_One_Thousand_and_One_Nights_SJIS.txt").read() -result = cchardet.detect(msg) -print(result) -``` - -# Test - $sudo easy_install or pip install -U chardet nose - - $cd test - - $nosetests --nocapture tests.py - -# Benchmark -code: [tests.TestCchardetSpeed](https://github.com/PyYoshi/cChardet/blob/master/test/tests.py#L415) - -sample: [test/testdata/wikipediaJa_One_Thousand_and_One_Nights_SJIS.txt](https://github.com/PyYoshi/cChardet/blob/master/test/testdata/wikipediaJa_One_Thousand_and_One_Nights_SJIS.txt) - -### Performance: -CPU: Intel Core i7 860 2.8GHz - -RAM: DDR3-1333 16GB - -Platform: Windows 7 HP x64, Python 2.7.3 32-bit - -### Result: - - - - - - - - - - - -
Request (call/s)Result of encoding
chardet0.25shift_jis
cchardet500.03shift_jis
- -# License -* This library files("cchardet.pyx","setup.py","tests.py") are "The MIT License". - -* Other Libraries License: Please, look at the [ext](https://github.com/PyYoshi/cChardet/tree/master/src/ext) directory. - -# Thanks -* [https://bitbucket.org/medoc/uchardet-enhanced/overview](https://bitbucket.org/medoc/uchardet-enhanced/overview) - -* [http://www.cython.org/](http://www.cython.org/) - -# Contact -[My blog](http://blog.remu.biz) - -Sorry for my poor English :) \ No newline at end of file