update
This commit is contained in:
parent
ffb55ca55b
commit
e36c244d37
2 changed files with 130 additions and 44 deletions
11
readme.md
11
readme.md
|
@ -4,6 +4,7 @@
|
||||||
This library is high speed universal character encoding detector. - binding to [charsetdetect](https://bitbucket.org/medoc/uchardet-enhanced/overview).
|
This library is high speed universal character encoding detector. - binding to [charsetdetect](https://bitbucket.org/medoc/uchardet-enhanced/overview).
|
||||||
|
|
||||||
This library is faster than [chardet](http://pypi.python.org/pypi/chardet).
|
This library is faster than [chardet](http://pypi.python.org/pypi/chardet).
|
||||||
|
|
||||||
# Support codecs
|
# Support codecs
|
||||||
* Big5
|
* Big5
|
||||||
* EUC-JP
|
* EUC-JP
|
||||||
|
@ -37,12 +38,14 @@ This library is faster than [chardet](http://pypi.python.org/pypi/chardet).
|
||||||
* X-ISO-10646-UCS-4-2143
|
* X-ISO-10646-UCS-4-2143
|
||||||
* X-ISO-10646-UCS-4-3412
|
* X-ISO-10646-UCS-4-3412
|
||||||
* x-mac-cyrillic
|
* x-mac-cyrillic
|
||||||
|
|
||||||
# Requires
|
# Requires
|
||||||
* Cython: [http://www.cython.org/](http://www.cython.org/)
|
* Cython: [http://www.cython.org/](http://www.cython.org/)
|
||||||
|
|
||||||
e.g.) Ubuntu 12.04
|
e.g.) Ubuntu 12.04
|
||||||
|
|
||||||
$sudo apt-get install build-essential python-dev cython
|
$sudo apt-get install build-essential python-dev cython
|
||||||
|
|
||||||
# Installation
|
# Installation
|
||||||
$cd /tmp
|
$cd /tmp
|
||||||
|
|
||||||
|
@ -57,6 +60,7 @@ e.g.) Ubuntu 12.04
|
||||||
or
|
or
|
||||||
|
|
||||||
$sudo easy_install cchardet
|
$sudo easy_install cchardet
|
||||||
|
|
||||||
# Example
|
# Example
|
||||||
|
|
||||||
```python
|
```python
|
||||||
|
@ -68,22 +72,26 @@ print(result)
|
||||||
result2 = cchardet.detect_with_confidence(msg)
|
result2 = cchardet.detect_with_confidence(msg)
|
||||||
print(result2)
|
print(result2)
|
||||||
```
|
```
|
||||||
|
|
||||||
# Test
|
# Test
|
||||||
$sudo easy_install or pip install -U chardet nose
|
$sudo easy_install or pip install -U chardet nose
|
||||||
|
|
||||||
$cd test
|
$cd test
|
||||||
|
|
||||||
$nosetests --nocapture tests.py
|
$nosetests --nocapture tests.py
|
||||||
|
|
||||||
# Benchmark
|
# Benchmark
|
||||||
code: [tests.TestCchardetSpeed](https://github.com/PyYoshi/cChardet/blob/master/test/tests.py#L415)
|
code: [tests.TestCchardetSpeed](https://github.com/PyYoshi/cChardet/blob/master/test/tests.py#L415)
|
||||||
|
|
||||||
sample: [test/testdata/wikipediaJa_One_Thousand_and_One_Nights_SJIS.txt](https://github.com/PyYoshi/cChardet/blob/master/test/testdata/wikipediaJa_One_Thousand_and_One_Nights_SJIS.txt)
|
sample: [test/testdata/wikipediaJa_One_Thousand_and_One_Nights_SJIS.txt](https://github.com/PyYoshi/cChardet/blob/master/test/testdata/wikipediaJa_One_Thousand_and_One_Nights_SJIS.txt)
|
||||||
|
|
||||||
### Performance:
|
### Performance:
|
||||||
CPU: Intel Core i7 860 2.8GHz
|
CPU: Intel Core i7 860 2.8GHz
|
||||||
|
|
||||||
RAM: DDR3-1333 16GB
|
RAM: DDR3-1333 16GB
|
||||||
|
|
||||||
Platform: Windows 7 HP x64, Python 2.7.3 32-bit
|
Platform: Windows 7 HP x64, Python 2.7.3 32-bit
|
||||||
|
|
||||||
### Result:
|
### Result:
|
||||||
|
|
||||||
<table>
|
<table>
|
||||||
|
@ -97,14 +105,17 @@ Platform: Windows 7 HP x64, Python 2.7.3 32-bit
|
||||||
<td>cchardet</td><td>500.03</td><td>shift_jis</td>
|
<td>cchardet</td><td>500.03</td><td>shift_jis</td>
|
||||||
</tr>
|
</tr>
|
||||||
</table>
|
</table>
|
||||||
|
|
||||||
# License
|
# License
|
||||||
* This library files("cchardet.pyx","setup.py","tests.py") are "The MIT License".
|
* This library files("cchardet.pyx","setup.py","tests.py") are "The MIT License".
|
||||||
|
|
||||||
* Other Libraries License: Please, look at the [ext](https://github.com/PyYoshi/cChardet/tree/master/src/ext) directory.
|
* Other Libraries License: Please, look at the [ext](https://github.com/PyYoshi/cChardet/tree/master/src/ext) directory.
|
||||||
|
|
||||||
# Thanks
|
# Thanks
|
||||||
* [https://bitbucket.org/medoc/uchardet-enhanced/overview](https://bitbucket.org/medoc/uchardet-enhanced/overview)
|
* [https://bitbucket.org/medoc/uchardet-enhanced/overview](https://bitbucket.org/medoc/uchardet-enhanced/overview)
|
||||||
|
|
||||||
* [http://www.cython.org/](http://www.cython.org/)
|
* [http://www.cython.org/](http://www.cython.org/)
|
||||||
|
|
||||||
# Contact
|
# Contact
|
||||||
[My blog](http://blog.remu.biz)
|
[My blog](http://blog.remu.biz)
|
||||||
|
|
||||||
|
|
163
readme.rst
163
readme.rst
|
@ -1,10 +1,16 @@
|
||||||
|
.. raw:: html
|
||||||
|
|
||||||
|
<!-- markdown to rst: http://johnmacfarlane.net/pandoc/try -->
|
||||||
|
|
||||||
cChardet
|
cChardet
|
||||||
========
|
========
|
||||||
|
|
||||||
This library is high speed universal character encoding detector. -
|
This library is high speed universal character encoding detector. -
|
||||||
binding to `charsetdetect`_.
|
binding to
|
||||||
|
`charsetdetect <https://bitbucket.org/medoc/uchardet-enhanced/overview>`_.
|
||||||
|
|
||||||
This library is faster than `chardet`_.
|
This library is faster than
|
||||||
|
`chardet <http://pypi.python.org/pypi/chardet>`_.
|
||||||
|
|
||||||
Support codecs
|
Support codecs
|
||||||
==============
|
==============
|
||||||
|
@ -32,12 +38,12 @@ Support codecs
|
||||||
- UTF-16LE
|
- UTF-16LE
|
||||||
- UTF-32BE
|
- UTF-32BE
|
||||||
- UTF-32LE
|
- UTF-32LE
|
||||||
- windows-1250
|
- WINDOWS-1250
|
||||||
- windows-1251
|
- WINDOWS-1251
|
||||||
- windows-1252
|
- WINDOWS-1252
|
||||||
- windows-1253
|
- WINDOWS-1253
|
||||||
- windows-1255
|
- WINDOWS-1255
|
||||||
- x-euc-tw
|
- EUC-TW
|
||||||
- X-ISO-10646-UCS-4-2143
|
- X-ISO-10646-UCS-4-2143
|
||||||
- X-ISO-10646-UCS-4-3412
|
- X-ISO-10646-UCS-4-3412
|
||||||
- x-mac-cyrillic
|
- x-mac-cyrillic
|
||||||
|
@ -45,80 +51,149 @@ Support codecs
|
||||||
Requires
|
Requires
|
||||||
========
|
========
|
||||||
|
|
||||||
- Cython: `http://www.cython.org/`_
|
- Cython: `http://www.cython.org/ <http://www.cython.org/>`_
|
||||||
|
|
||||||
Install
|
e.g.) Ubuntu 12.04
|
||||||
=======
|
|
||||||
|
|
||||||
1. $cd /tmp
|
::
|
||||||
|
|
||||||
2. $git clone git://github.com/PyYoshi/cChardet.git
|
$sudo apt-get install build-essential python-dev cython
|
||||||
|
|
||||||
3. $cd cChardet
|
Installation
|
||||||
|
============
|
||||||
|
|
||||||
4. $python setup.py build
|
::
|
||||||
|
|
||||||
5. $sudo python setup.py install
|
$cd /tmp
|
||||||
|
|
||||||
|
$git clone git://github.com/PyYoshi/cChardet.git
|
||||||
|
|
||||||
|
$cd cChardet
|
||||||
|
|
||||||
|
$python setup.py build
|
||||||
|
|
||||||
|
$sudo python setup.py install
|
||||||
|
|
||||||
|
or
|
||||||
|
|
||||||
|
::
|
||||||
|
|
||||||
|
$sudo easy_install cchardet
|
||||||
|
|
||||||
Test
|
Test
|
||||||
====
|
====
|
||||||
|
|
||||||
- $sudo easy\_install or pip install -U chardet nose
|
::
|
||||||
|
|
||||||
- $nosetests –nocapture tests.py
|
$sudo easy_install or pip install -U chardet nose
|
||||||
|
|
||||||
|
$cd test
|
||||||
|
|
||||||
|
$nosetests --nocapture tests.py
|
||||||
|
|
||||||
Benchmark
|
Benchmark
|
||||||
=========
|
=========
|
||||||
|
|
||||||
see `tests.TestCchardetSpeed`_
|
code:
|
||||||
|
`tests.TestCchardetSpeed <https://github.com/PyYoshi/cChardet/blob/master/test/tests.py#L415>`_
|
||||||
|
|
||||||
Sample(shift\_jis):
|
sample:
|
||||||
~~~~~~~~~~~~~~~~~~~
|
`test/testdata/wikipediaJa\_One\_Thousand\_and\_One\_Nights\_SJIS.txt <https://github.com/PyYoshi/cChardet/blob/master/test/testdata/wikipediaJa_One_Thousand_and_One_Nights_SJIS.txt>`_
|
||||||
|
|
||||||
- `test/testdata/wikipediaJa\_One\_Thousand\_and\_One\_Nights\_SJIS.txt`_
|
Performance:
|
||||||
|
~~~~~~~~~~~~
|
||||||
|
|
||||||
PC Spec.:
|
CPU: Intel Core i7 860 2.8GHz
|
||||||
~~~~~~~~~
|
|
||||||
|
|
||||||
- CPU: Intel Core i7 860 2.8GHz
|
RAM: DDR3-1333 16GB
|
||||||
|
|
||||||
- RAM: DDR3-1333 16GB
|
Platform: Windows 7 HP x64, Python 2.7.3 32-bit
|
||||||
|
|
||||||
- Platform: Windows 7 HP x64, Python 2.7.3 32-bit
|
|
||||||
|
|
||||||
Result:
|
Result:
|
||||||
~~~~~~~
|
~~~~~~~
|
||||||
|
|
||||||
- chardet: 4.009999990463257s, shift\_jis
|
.. raw:: html
|
||||||
|
|
||||||
- cchardet: 0.0009999275207519531s, shift\_jis
|
<table>
|
||||||
|
<tr>
|
||||||
|
<th></th><th>
|
||||||
|
|
||||||
|
Request (call/s)
|
||||||
|
|
||||||
|
.. raw:: html
|
||||||
|
|
||||||
|
</th><th>
|
||||||
|
|
||||||
|
Result of encoding
|
||||||
|
|
||||||
|
.. raw:: html
|
||||||
|
|
||||||
|
</th>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td>
|
||||||
|
|
||||||
|
chardet
|
||||||
|
|
||||||
|
.. raw:: html
|
||||||
|
|
||||||
|
</td><td>
|
||||||
|
|
||||||
|
0.25
|
||||||
|
|
||||||
|
.. raw:: html
|
||||||
|
|
||||||
|
</td><td>
|
||||||
|
|
||||||
|
shift\_jis
|
||||||
|
|
||||||
|
.. raw:: html
|
||||||
|
|
||||||
|
</td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td>
|
||||||
|
|
||||||
|
cchardet
|
||||||
|
|
||||||
|
.. raw:: html
|
||||||
|
|
||||||
|
</td><td>
|
||||||
|
|
||||||
|
500.03
|
||||||
|
|
||||||
|
.. raw:: html
|
||||||
|
|
||||||
|
</td><td>
|
||||||
|
|
||||||
|
shift\_jis
|
||||||
|
|
||||||
|
.. raw:: html
|
||||||
|
|
||||||
|
</td>
|
||||||
|
</tr>
|
||||||
|
</table>
|
||||||
|
|
||||||
License
|
License
|
||||||
=======
|
=======
|
||||||
|
|
||||||
- This library files(“cchardet.pyx”,“setup.py”,“tests.py”) are “The MIT
|
- This library files("cchardet.pyx","setup.py","tests.py") are "The MIT
|
||||||
License”.
|
License".
|
||||||
|
|
||||||
- Other Library License: Please, look at the “ext” directory.
|
- Other Libraries License: Please, look at the
|
||||||
|
`ext <https://github.com/PyYoshi/cChardet/tree/master/src/ext>`_
|
||||||
|
directory.
|
||||||
|
|
||||||
Thanks
|
Thanks
|
||||||
======
|
======
|
||||||
|
|
||||||
- `https://bitbucket.org/medoc/uchardet-enhanced/overview`_
|
- `https://bitbucket.org/medoc/uchardet-enhanced/overview <https://bitbucket.org/medoc/uchardet-enhanced/overview>`_
|
||||||
|
|
||||||
- `http://www.cython.org/`_
|
- `http://www.cython.org/ <http://www.cython.org/>`_
|
||||||
|
|
||||||
Contact
|
Contact
|
||||||
=======
|
=======
|
||||||
|
|
||||||
`My blog`_
|
`My blog <http://blog.remu.biz>`_
|
||||||
|
|
||||||
Sorry for my poor English :)
|
Sorry for my poor English :)
|
||||||
|
|
||||||
.. _charsetdetect: https://bitbucket.org/medoc/uchardet-enhanced/overview
|
|
||||||
.. _chardet: http://pypi.python.org/pypi/chardet
|
|
||||||
.. _`http://www.cython.org/`: http://www.cython.org/
|
|
||||||
.. _tests.TestCchardetSpeed: https://github.com/PyYoshi/cChardet/blob/master/test/tests.py#L415
|
|
||||||
.. _test/testdata/wikipediaJa\_One\_Thousand\_and\_One\_Nights\_SJIS.txt: https://github.com/PyYoshi/cChardet/blob/master/test/testdata/wikipediaJa_One_Thousand_and_One_Nights_SJIS.txt
|
|
||||||
.. _`https://bitbucket.org/medoc/uchardet-enhanced/overview`: https://bitbucket.org/medoc/uchardet-enhanced/overview
|
|
||||||
.. _My blog: http://blog.remu.biz
|
|
||||||
|
|
Loading…
Reference in a new issue