This commit is contained in:
PyYoshi 2012-07-07 12:30:02 +09:00
parent ffb55ca55b
commit e36c244d37
2 changed files with 130 additions and 44 deletions

View file

@ -4,6 +4,7 @@
This library is high speed universal character encoding detector. - binding to [charsetdetect](https://bitbucket.org/medoc/uchardet-enhanced/overview).
This library is faster than [chardet](http://pypi.python.org/pypi/chardet).
# Support codecs
* Big5
* EUC-JP
@ -37,12 +38,14 @@ This library is faster than [chardet](http://pypi.python.org/pypi/chardet).
* X-ISO-10646-UCS-4-2143
* X-ISO-10646-UCS-4-3412
* x-mac-cyrillic
# Requires
* Cython: [http://www.cython.org/](http://www.cython.org/)
e.g.) Ubuntu 12.04
$sudo apt-get install build-essential python-dev cython
# Installation
$cd /tmp
@ -57,6 +60,7 @@ e.g.) Ubuntu 12.04
or
$sudo easy_install cchardet
# Example
```python
@ -68,22 +72,26 @@ print(result)
result2 = cchardet.detect_with_confidence(msg)
print(result2)
```
# Test
$sudo easy_install or pip install -U chardet nose
$cd test
$nosetests --nocapture tests.py
# Benchmark
code: [tests.TestCchardetSpeed](https://github.com/PyYoshi/cChardet/blob/master/test/tests.py#L415)
sample: [test/testdata/wikipediaJa_One_Thousand_and_One_Nights_SJIS.txt](https://github.com/PyYoshi/cChardet/blob/master/test/testdata/wikipediaJa_One_Thousand_and_One_Nights_SJIS.txt)
### Performance:
CPU: Intel Core i7 860 2.8GHz
RAM: DDR3-1333 16GB
Platform: Windows 7 HP x64, Python 2.7.3 32-bit
### Result:
<table>
@ -97,14 +105,17 @@ Platform: Windows 7 HP x64, Python 2.7.3 32-bit
<td>cchardet</td><td>500.03</td><td>shift_jis</td>
</tr>
</table>
# License
* This library files("cchardet.pyx","setup.py","tests.py") are "The MIT License".
* Other Libraries License: Please, look at the [ext](https://github.com/PyYoshi/cChardet/tree/master/src/ext) directory.
# Thanks
* [https://bitbucket.org/medoc/uchardet-enhanced/overview](https://bitbucket.org/medoc/uchardet-enhanced/overview)
* [http://www.cython.org/](http://www.cython.org/)
# Contact
[My blog](http://blog.remu.biz)

View file

@ -1,10 +1,16 @@
.. raw:: html
<!-- markdown to rst: http://johnmacfarlane.net/pandoc/try -->
cChardet
========
This library is high speed universal character encoding detector. -
binding to `charsetdetect`_.
binding to
`charsetdetect <https://bitbucket.org/medoc/uchardet-enhanced/overview>`_.
This library is faster than `chardet`_.
This library is faster than
`chardet <http://pypi.python.org/pypi/chardet>`_.
Support codecs
==============
@ -32,12 +38,12 @@ Support codecs
- UTF-16LE
- UTF-32BE
- UTF-32LE
- windows-1250
- windows-1251
- windows-1252
- windows-1253
- windows-1255
- x-euc-tw
- WINDOWS-1250
- WINDOWS-1251
- WINDOWS-1252
- WINDOWS-1253
- WINDOWS-1255
- EUC-TW
- X-ISO-10646-UCS-4-2143
- X-ISO-10646-UCS-4-3412
- x-mac-cyrillic
@ -45,80 +51,149 @@ Support codecs
Requires
========
- Cython: `http://www.cython.org/`_
- Cython: `http://www.cython.org/ <http://www.cython.org/>`_
Install
=======
e.g.) Ubuntu 12.04
1. $cd /tmp
::
2. $git clone git://github.com/PyYoshi/cChardet.git
$sudo apt-get install build-essential python-dev cython
3. $cd cChardet
Installation
============
4. $python setup.py build
::
5. $sudo python setup.py install
$cd /tmp
$git clone git://github.com/PyYoshi/cChardet.git
$cd cChardet
$python setup.py build
$sudo python setup.py install
or
::
$sudo easy_install cchardet
Test
====
- $sudo easy\_install or pip install -U chardet nose
::
- $nosetests nocapture tests.py
$sudo easy_install or pip install -U chardet nose
$cd test
$nosetests --nocapture tests.py
Benchmark
=========
see `tests.TestCchardetSpeed`_
code:
`tests.TestCchardetSpeed <https://github.com/PyYoshi/cChardet/blob/master/test/tests.py#L415>`_
Sample(shift\_jis):
~~~~~~~~~~~~~~~~~~~
sample:
`test/testdata/wikipediaJa\_One\_Thousand\_and\_One\_Nights\_SJIS.txt <https://github.com/PyYoshi/cChardet/blob/master/test/testdata/wikipediaJa_One_Thousand_and_One_Nights_SJIS.txt>`_
- `test/testdata/wikipediaJa\_One\_Thousand\_and\_One\_Nights\_SJIS.txt`_
Performance:
~~~~~~~~~~~~
PC Spec.:
~~~~~~~~~
CPU: Intel Core i7 860 2.8GHz
- CPU: Intel Core i7 860 2.8GHz
RAM: DDR3-1333 16GB
- RAM: DDR3-1333 16GB
- Platform: Windows 7 HP x64, Python 2.7.3 32-bit
Platform: Windows 7 HP x64, Python 2.7.3 32-bit
Result:
~~~~~~~
- chardet: 4.009999990463257s, shift\_jis
.. raw:: html
- cchardet: 0.0009999275207519531s, shift\_jis
<table>
<tr>
<th></th><th>
Request (call/s)
.. raw:: html
</th><th>
Result of encoding
.. raw:: html
</th>
</tr>
<tr>
<td>
chardet
.. raw:: html
</td><td>
0.25
.. raw:: html
</td><td>
shift\_jis
.. raw:: html
</td>
</tr>
<tr>
<td>
cchardet
.. raw:: html
</td><td>
500.03
.. raw:: html
</td><td>
shift\_jis
.. raw:: html
</td>
</tr>
</table>
License
=======
- This library files(“cchardet.pyx”,“setup.py”,“tests.py”) are “The MIT
License”.
- This library files("cchardet.pyx","setup.py","tests.py") are "The MIT
License".
- Other Library License: Please, look at the “ext” directory.
- Other Libraries License: Please, look at the
`ext <https://github.com/PyYoshi/cChardet/tree/master/src/ext>`_
directory.
Thanks
======
- `https://bitbucket.org/medoc/uchardet-enhanced/overview`_
- `https://bitbucket.org/medoc/uchardet-enhanced/overview <https://bitbucket.org/medoc/uchardet-enhanced/overview>`_
- `http://www.cython.org/`_
- `http://www.cython.org/ <http://www.cython.org/>`_
Contact
=======
`My blog`_
`My blog <http://blog.remu.biz>`_
Sorry for my poor English :)
.. _charsetdetect: https://bitbucket.org/medoc/uchardet-enhanced/overview
.. _chardet: http://pypi.python.org/pypi/chardet
.. _`http://www.cython.org/`: http://www.cython.org/
.. _tests.TestCchardetSpeed: https://github.com/PyYoshi/cChardet/blob/master/test/tests.py#L415
.. _test/testdata/wikipediaJa\_One\_Thousand\_and\_One\_Nights\_SJIS.txt: https://github.com/PyYoshi/cChardet/blob/master/test/testdata/wikipediaJa_One_Thousand_and_One_Nights_SJIS.txt
.. _`https://bitbucket.org/medoc/uchardet-enhanced/overview`: https://bitbucket.org/medoc/uchardet-enhanced/overview
.. _My blog: http://blog.remu.biz