update readme
This commit is contained in:
parent
63bbdf95b0
commit
a69598fe75
3 changed files with 166 additions and 156 deletions
125
README.markdown
Normal file
125
README.markdown
Normal file
|
@ -0,0 +1,125 @@
|
||||||
|
<!-- markdown to rst: http://johnmacfarlane.net/pandoc/try -->
|
||||||
|
|
||||||
|
cChardet
|
||||||
|
========
|
||||||
|
cChardet is high speed universal character encoding detector. - binding to [charsetdetect](https://bitbucket.org/medoc/uchardet-enhanced/overview).
|
||||||
|
|
||||||
|
## Support codecs
|
||||||
|
* Big5
|
||||||
|
* EUC-JP
|
||||||
|
* EUC-KR
|
||||||
|
* GB18030
|
||||||
|
* HZ-GB-2312
|
||||||
|
* IBM855
|
||||||
|
* IBM866
|
||||||
|
* ISO-2022-CN
|
||||||
|
* ISO-2022-JP
|
||||||
|
* ISO-2022-KR
|
||||||
|
* ISO-8859-2
|
||||||
|
* ISO-8859-5
|
||||||
|
* ISO-8859-7
|
||||||
|
* ISO-8859-8
|
||||||
|
* KOI8-R
|
||||||
|
* Shift_JIS
|
||||||
|
* TIS-620
|
||||||
|
* UTF-8
|
||||||
|
* UTF-16BE
|
||||||
|
* UTF-16LE
|
||||||
|
* UTF-32BE
|
||||||
|
* UTF-32LE
|
||||||
|
* WINDOWS-1250
|
||||||
|
* WINDOWS-1251
|
||||||
|
* WINDOWS-1252
|
||||||
|
* WINDOWS-1253
|
||||||
|
* WINDOWS-1255
|
||||||
|
* EUC-TW
|
||||||
|
* X-ISO-10646-UCS-4-2143
|
||||||
|
* X-ISO-10646-UCS-4-3412
|
||||||
|
* x-mac-cyrillic
|
||||||
|
|
||||||
|
## Requires
|
||||||
|
* Cython: [http://www.cython.org/](http://www.cython.org/)
|
||||||
|
|
||||||
|
e.g.) Ubuntu 12.04
|
||||||
|
|
||||||
|
```bash
|
||||||
|
$ sudo apt-get install build-essential python-dev cython
|
||||||
|
```
|
||||||
|
|
||||||
|
## Installation
|
||||||
|
|
||||||
|
```bash
|
||||||
|
$ cd /tmp
|
||||||
|
$ git clone git://github.com/PyYoshi/cChardet.git
|
||||||
|
$ cd cChardet
|
||||||
|
$ python setup.py build
|
||||||
|
$ sudo python setup.py install
|
||||||
|
```
|
||||||
|
|
||||||
|
or
|
||||||
|
|
||||||
|
```bash
|
||||||
|
$ sudo easy_install cchardet
|
||||||
|
```
|
||||||
|
|
||||||
|
## Example
|
||||||
|
|
||||||
|
```python
|
||||||
|
# -*- coding: utf-8 -*-
|
||||||
|
import cchardet as chardet
|
||||||
|
with open(r"test/testdata/wikipediaJa_One_Thousand_and_One_Nights_SJIS.txt") as f:
|
||||||
|
msg = f.read()
|
||||||
|
result = chardet.detect(msg)
|
||||||
|
print(result)
|
||||||
|
```
|
||||||
|
|
||||||
|
## Test
|
||||||
|
|
||||||
|
```bash
|
||||||
|
$ sudo easy_install or pip install -U chardet nose
|
||||||
|
$ cd test
|
||||||
|
$ nosetests --nocapture tests.py
|
||||||
|
```
|
||||||
|
|
||||||
|
## Benchmark
|
||||||
|
code: [tests.TestCchardetSpeed](https://github.com/PyYoshi/cChardet/blob/master/test/tests.py#L415)
|
||||||
|
|
||||||
|
sample: [test/testdata/wikipediaJa_One_Thousand_and_One_Nights_SJIS.txt](https://github.com/PyYoshi/cChardet/blob/master/test/testdata/wikipediaJa_One_Thousand_and_One_Nights_SJIS.txt)
|
||||||
|
|
||||||
|
### Performance:
|
||||||
|
CPU: Intel Core i7 860 2.8GHz
|
||||||
|
|
||||||
|
RAM: DDR3-1333 16GB
|
||||||
|
|
||||||
|
Platform: Kubuntu 12.04 amd64, Python 2.7.3 64-bit
|
||||||
|
|
||||||
|
### Result:
|
||||||
|
|
||||||
|
<table>
|
||||||
|
<tr>
|
||||||
|
<th></th><th>Request (call/s)</th>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td>chardet</td><td>0.32</td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td>cchardet</td><td>1012.97</td>
|
||||||
|
</tr>
|
||||||
|
</table>
|
||||||
|
|
||||||
|
### License
|
||||||
|
* The MIT License: [src/cchardet](https://github.com/PyYoshi/cChardet/tree/master/src/cchardet)
|
||||||
|
|
||||||
|
* Other Libraries License: Please, look at the [src/ext](https://github.com/PyYoshi/cChardet/tree/master/src/ext) directory.
|
||||||
|
|
||||||
|
### Thanks
|
||||||
|
* [uchardet-enhanced](https://bitbucket.org/medoc/uchardet-enhanced/overview)
|
||||||
|
|
||||||
|
* [Cython](http://www.cython.org/)
|
||||||
|
|
||||||
|
### Contact
|
||||||
|
[My blog](http://blog.remu.biz)
|
||||||
|
|
||||||
|
[Issues](https://github.com/PyYoshi/cChardet/issues?page=1&state=open)
|
||||||
|
|
||||||
|
Sorry for my poor English :)
|
|
@ -2,21 +2,17 @@
|
||||||
cChardet
|
cChardet
|
||||||
========
|
========
|
||||||
|
|
||||||
This library is high speed universal character encoding detector. -
|
cChardet is high speed universal character encoding detector. - binding
|
||||||
binding to
|
to
|
||||||
`charsetdetect <https://bitbucket.org/medoc/uchardet-enhanced/overview>`_.
|
`charsetdetect <https://bitbucket.org/medoc/uchardet-enhanced/overview>`_.
|
||||||
|
|
||||||
This library is faster than
|
|
||||||
`chardet <http://pypi.python.org/pypi/chardet>`_.
|
|
||||||
|
|
||||||
Support codecs
|
Support codecs
|
||||||
==============
|
--------------
|
||||||
|
|
||||||
- Big5
|
- Big5
|
||||||
- EUC-JP
|
- EUC-JP
|
||||||
- EUC-KR
|
- EUC-KR
|
||||||
- GB18030
|
- GB18030
|
||||||
- gb18030
|
|
||||||
- HZ-GB-2312
|
- HZ-GB-2312
|
||||||
- IBM855
|
- IBM855
|
||||||
- IBM866
|
- IBM866
|
||||||
|
@ -46,7 +42,7 @@ Support codecs
|
||||||
- x-mac-cyrillic
|
- x-mac-cyrillic
|
||||||
|
|
||||||
Requires
|
Requires
|
||||||
========
|
--------
|
||||||
|
|
||||||
- Cython: `http://www.cython.org/ <http://www.cython.org/>`_
|
- Cython: `http://www.cython.org/ <http://www.cython.org/>`_
|
||||||
|
|
||||||
|
@ -57,18 +53,14 @@ e.g.) Ubuntu 12.04
|
||||||
$ sudo apt-get install build-essential python-dev cython
|
$ sudo apt-get install build-essential python-dev cython
|
||||||
|
|
||||||
Installation
|
Installation
|
||||||
============
|
------------
|
||||||
|
|
||||||
::
|
::
|
||||||
|
|
||||||
$ cd /tmp
|
$ cd /tmp
|
||||||
|
|
||||||
$ git clone git://github.com/PyYoshi/cChardet.git
|
$ git clone git://github.com/PyYoshi/cChardet.git
|
||||||
|
|
||||||
$ cd cChardet
|
$ cd cChardet
|
||||||
|
|
||||||
$ python setup.py build
|
$ python setup.py build
|
||||||
|
|
||||||
$ sudo python setup.py install
|
$ sudo python setup.py install
|
||||||
|
|
||||||
or
|
or
|
||||||
|
@ -77,19 +69,29 @@ or
|
||||||
|
|
||||||
$ sudo easy_install cchardet
|
$ sudo easy_install cchardet
|
||||||
|
|
||||||
|
Example
|
||||||
|
-------
|
||||||
|
|
||||||
|
::
|
||||||
|
|
||||||
|
# -*- coding: utf-8 -*-
|
||||||
|
import cchardet as chardet
|
||||||
|
with open(r"test/testdata/wikipediaJa_One_Thousand_and_One_Nights_SJIS.txt") as f:
|
||||||
|
msg = f.read()
|
||||||
|
result = chardet.detect(msg)
|
||||||
|
print(result)
|
||||||
|
|
||||||
Test
|
Test
|
||||||
====
|
----
|
||||||
|
|
||||||
::
|
::
|
||||||
|
|
||||||
$ sudo easy_install or pip install -U chardet nose
|
$ sudo easy_install or pip install -U chardet nose
|
||||||
|
|
||||||
$ cd test
|
$ cd test
|
||||||
|
|
||||||
$ nosetests --nocapture tests.py
|
$ nosetests --nocapture tests.py
|
||||||
|
|
||||||
Benchmark
|
Benchmark
|
||||||
=========
|
---------
|
||||||
|
|
||||||
code:
|
code:
|
||||||
`tests.TestCchardetSpeed <https://github.com/PyYoshi/cChardet/blob/master/test/tests.py#L415>`_
|
`tests.TestCchardetSpeed <https://github.com/PyYoshi/cChardet/blob/master/test/tests.py#L415>`_
|
||||||
|
@ -104,36 +106,39 @@ CPU: Intel Core i7 860 2.8GHz
|
||||||
|
|
||||||
RAM: DDR3-1333 16GB
|
RAM: DDR3-1333 16GB
|
||||||
|
|
||||||
Platform: Windows 7 HP x64, Python 2.7.3 32-bit
|
Platform: Kubuntu 12.04 amd64, Python 2.7.3 64-bit
|
||||||
|
|
||||||
Result:
|
Result:
|
||||||
~~~~~~~
|
~~~~~~~
|
||||||
|
|
||||||
::
|
::
|
||||||
|
|
||||||
chardet: 0.25 (call/s)
|
chardet: 0.32 (call/s)
|
||||||
|
|
||||||
cchardet: 500.03 (call/s)
|
cchardet: 1012.97 (call/s)
|
||||||
|
|
||||||
License
|
License
|
||||||
=======
|
~~~~~~~
|
||||||
|
|
||||||
- This library files("cchardet.pyx","setup.py","tests.py") are "The MIT License".
|
- The MIT License:
|
||||||
|
`src/cchardet <https://github.com/PyYoshi/cChardet/tree/master/src/cchardet>`_
|
||||||
|
|
||||||
- Other Libraries License: Please, look at the
|
- Other Libraries License: Please, look at the
|
||||||
`ext <https://github.com/PyYoshi/cChardet/tree/master/src/ext>`_
|
`src/ext <https://github.com/PyYoshi/cChardet/tree/master/src/ext>`_
|
||||||
directory.
|
directory.
|
||||||
|
|
||||||
Thanks
|
Thanks
|
||||||
======
|
~~~~~~
|
||||||
|
|
||||||
- `https://bitbucket.org/medoc/uchardet-enhanced/overview <https://bitbucket.org/medoc/uchardet-enhanced/overview>`_
|
- `uchardet-enhanced <https://bitbucket.org/medoc/uchardet-enhanced/overview>`_
|
||||||
|
|
||||||
- `http://www.cython.org/ <http://www.cython.org/>`_
|
- `Cython <http://www.cython.org/>`_
|
||||||
|
|
||||||
Contact
|
Contact
|
||||||
=======
|
~~~~~~~
|
||||||
|
|
||||||
`My blog <http://blog.remu.biz>`_
|
`My blog <http://blog.remu.biz>`_
|
||||||
|
|
||||||
|
`Issues <https://github.com/PyYoshi/cChardet/issues?page=1&state=open>`_
|
||||||
|
|
||||||
Sorry for my poor English :)
|
Sorry for my poor English :)
|
120
readme.md
120
readme.md
|
@ -1,120 +0,0 @@
|
||||||
<!-- markdown to rst: http://johnmacfarlane.net/pandoc/try -->
|
|
||||||
|
|
||||||
# cChardet
|
|
||||||
This library is high speed universal character encoding detector. - binding to [charsetdetect](https://bitbucket.org/medoc/uchardet-enhanced/overview).
|
|
||||||
|
|
||||||
This library is faster than [chardet](http://pypi.python.org/pypi/chardet).
|
|
||||||
|
|
||||||
# Support codecs
|
|
||||||
* Big5
|
|
||||||
* EUC-JP
|
|
||||||
* EUC-KR
|
|
||||||
* GB18030
|
|
||||||
* gb18030
|
|
||||||
* HZ-GB-2312
|
|
||||||
* IBM855
|
|
||||||
* IBM866
|
|
||||||
* ISO-2022-CN
|
|
||||||
* ISO-2022-JP
|
|
||||||
* ISO-2022-KR
|
|
||||||
* ISO-8859-2
|
|
||||||
* ISO-8859-5
|
|
||||||
* ISO-8859-7
|
|
||||||
* ISO-8859-8
|
|
||||||
* KOI8-R
|
|
||||||
* Shift_JIS
|
|
||||||
* TIS-620
|
|
||||||
* UTF-8
|
|
||||||
* UTF-16BE
|
|
||||||
* UTF-16LE
|
|
||||||
* UTF-32BE
|
|
||||||
* UTF-32LE
|
|
||||||
* WINDOWS-1250
|
|
||||||
* WINDOWS-1251
|
|
||||||
* WINDOWS-1252
|
|
||||||
* WINDOWS-1253
|
|
||||||
* WINDOWS-1255
|
|
||||||
* EUC-TW
|
|
||||||
* X-ISO-10646-UCS-4-2143
|
|
||||||
* X-ISO-10646-UCS-4-3412
|
|
||||||
* x-mac-cyrillic
|
|
||||||
|
|
||||||
# Requires
|
|
||||||
* Cython: [http://www.cython.org/](http://www.cython.org/)
|
|
||||||
|
|
||||||
e.g.) Ubuntu 12.04
|
|
||||||
|
|
||||||
$sudo apt-get install build-essential python-dev cython
|
|
||||||
|
|
||||||
# Installation
|
|
||||||
$cd /tmp
|
|
||||||
|
|
||||||
$git clone git://github.com/PyYoshi/cChardet.git
|
|
||||||
|
|
||||||
$cd cChardet
|
|
||||||
|
|
||||||
$python setup.py build
|
|
||||||
|
|
||||||
$sudo python setup.py install
|
|
||||||
|
|
||||||
or
|
|
||||||
|
|
||||||
$sudo easy_install cchardet
|
|
||||||
|
|
||||||
# Example
|
|
||||||
|
|
||||||
```python
|
|
||||||
# coding: utf8
|
|
||||||
import cchardet
|
|
||||||
msg = file(r"test/testdata/wikipediaJa_One_Thousand_and_One_Nights_SJIS.txt").read()
|
|
||||||
result = cchardet.detect(msg)
|
|
||||||
print(result)
|
|
||||||
```
|
|
||||||
|
|
||||||
# Test
|
|
||||||
$sudo easy_install or pip install -U chardet nose
|
|
||||||
|
|
||||||
$cd test
|
|
||||||
|
|
||||||
$nosetests --nocapture tests.py
|
|
||||||
|
|
||||||
# Benchmark
|
|
||||||
code: [tests.TestCchardetSpeed](https://github.com/PyYoshi/cChardet/blob/master/test/tests.py#L415)
|
|
||||||
|
|
||||||
sample: [test/testdata/wikipediaJa_One_Thousand_and_One_Nights_SJIS.txt](https://github.com/PyYoshi/cChardet/blob/master/test/testdata/wikipediaJa_One_Thousand_and_One_Nights_SJIS.txt)
|
|
||||||
|
|
||||||
### Performance:
|
|
||||||
CPU: Intel Core i7 860 2.8GHz
|
|
||||||
|
|
||||||
RAM: DDR3-1333 16GB
|
|
||||||
|
|
||||||
Platform: Windows 7 HP x64, Python 2.7.3 32-bit
|
|
||||||
|
|
||||||
### Result:
|
|
||||||
|
|
||||||
<table>
|
|
||||||
<tr>
|
|
||||||
<th></th><th>Request (call/s)</th><th>Result of encoding</th>
|
|
||||||
</tr>
|
|
||||||
<tr>
|
|
||||||
<td>chardet</td><td>0.25</td><td>shift_jis</td>
|
|
||||||
</tr>
|
|
||||||
<tr>
|
|
||||||
<td>cchardet</td><td>500.03</td><td>shift_jis</td>
|
|
||||||
</tr>
|
|
||||||
</table>
|
|
||||||
|
|
||||||
# License
|
|
||||||
* This library files("cchardet.pyx","setup.py","tests.py") are "The MIT License".
|
|
||||||
|
|
||||||
* Other Libraries License: Please, look at the [ext](https://github.com/PyYoshi/cChardet/tree/master/src/ext) directory.
|
|
||||||
|
|
||||||
# Thanks
|
|
||||||
* [https://bitbucket.org/medoc/uchardet-enhanced/overview](https://bitbucket.org/medoc/uchardet-enhanced/overview)
|
|
||||||
|
|
||||||
* [http://www.cython.org/](http://www.cython.org/)
|
|
||||||
|
|
||||||
# Contact
|
|
||||||
[My blog](http://blog.remu.biz)
|
|
||||||
|
|
||||||
Sorry for my poor English :)
|
|
Loading…
Reference in a new issue