cChardet/readme.rst

125 lines
2.3 KiB
ReStructuredText
Raw Normal View History

2012-06-26 01:54:57 +00:00
cChardet
========
This library is high speed universal character encoding detector. -
binding to `charsetdetect`_.
This library is faster than `chardet`_.
Support codecs
==============
- Big5
- EUC-JP
- EUC-KR
- GB18030
- gb18030
- HZ-GB-2312
- IBM855
- IBM866
- ISO-2022-CN
- ISO-2022-JP
- ISO-2022-KR
- ISO-8859-2
- ISO-8859-5
- ISO-8859-7
- ISO-8859-8
- KOI8-R
- Shift\_JIS
- TIS-620
- UTF-8
- UTF-16BE
- UTF-16LE
- UTF-32BE
- UTF-32LE
- windows-1250
- windows-1251
- windows-1252
- windows-1253
- windows-1255
- x-euc-tw
- X-ISO-10646-UCS-4-2143
- X-ISO-10646-UCS-4-3412
- x-mac-cyrillic
Requires
========
- Cython: `http://www.cython.org/`_
Install
=======
1. $cd /tmp
2. $git clone git://github.com/PyYoshi/cChardet.git
3. $cd cChardet
4. $python setup.py build
5. $sudo python setup.py install
Test
====
- $sudo easy\_install or pip install -U chardet nose
- $nosetests nocapture tests.py
Benchmark
=========
see `tests.TestCchardetSpeed`_
Sample(shift\_jis):
~~~~~~~~~~~~~~~~~~~
- `test/testdata/wikipediaJa\_One\_Thousand\_and\_One\_Nights\_SJIS.txt`_
PC Spec.:
~~~~~~~~~
- CPU: Intel Core i7 860 2.8GHz
- RAM: DDR3-1333 16GB
- Platform: Windows 7 HP x64, Python 2.7.3 32-bit
Result:
~~~~~~~
- chardet: 4.009999990463257s, shift\_jis
- cchardet: 0.0009999275207519531s, shift\_jis
License
=======
- This library files(“cchardet.pyx”,“setup.py”,“tests.py”) are “The MIT
License”.
- Other Library License: Please, look at the “ext” directory.
Thanks
======
- `https://bitbucket.org/medoc/uchardet-enhanced/overview`_
- `http://www.cython.org/`_
Contact
=======
`My blog`_
Sorry for my poor English :)
.. _charsetdetect: https://bitbucket.org/medoc/uchardet-enhanced/overview
.. _chardet: http://pypi.python.org/pypi/chardet
.. _`http://www.cython.org/`: http://www.cython.org/
.. _tests.TestCchardetSpeed: https://github.com/PyYoshi/cChardet/blob/master/test/tests.py#L415
.. _test/testdata/wikipediaJa\_One\_Thousand\_and\_One\_Nights\_SJIS.txt: https://github.com/PyYoshi/cChardet/blob/master/test/testdata/wikipediaJa_One_Thousand_and_One_Nights_SJIS.txt
.. _`https://bitbucket.org/medoc/uchardet-enhanced/overview`: https://bitbucket.org/medoc/uchardet-enhanced/overview
.. _My blog: http://blog.remu.biz