update readme

This commit is contained in:
PyYoshi 2013-05-08 12:23:26 +09:00
parent 63bbdf95b0
commit a69598fe75
3 changed files with 166 additions and 156 deletions

125
README.markdown Normal file
View file

@ -0,0 +1,125 @@
<!-- markdown to rst: http://johnmacfarlane.net/pandoc/try -->
cChardet
========
cChardet is high speed universal character encoding detector. - binding to [charsetdetect](https://bitbucket.org/medoc/uchardet-enhanced/overview).
## Support codecs
* Big5
* EUC-JP
* EUC-KR
* GB18030
* HZ-GB-2312
* IBM855
* IBM866
* ISO-2022-CN
* ISO-2022-JP
* ISO-2022-KR
* ISO-8859-2
* ISO-8859-5
* ISO-8859-7
* ISO-8859-8
* KOI8-R
* Shift_JIS
* TIS-620
* UTF-8
* UTF-16BE
* UTF-16LE
* UTF-32BE
* UTF-32LE
* WINDOWS-1250
* WINDOWS-1251
* WINDOWS-1252
* WINDOWS-1253
* WINDOWS-1255
* EUC-TW
* X-ISO-10646-UCS-4-2143
* X-ISO-10646-UCS-4-3412
* x-mac-cyrillic
## Requires
* Cython: [http://www.cython.org/](http://www.cython.org/)
e.g.) Ubuntu 12.04
```bash
$ sudo apt-get install build-essential python-dev cython
```
## Installation
```bash
$ cd /tmp
$ git clone git://github.com/PyYoshi/cChardet.git
$ cd cChardet
$ python setup.py build
$ sudo python setup.py install
```
or
```bash
$ sudo easy_install cchardet
```
## Example
```python
# -*- coding: utf-8 -*-
import cchardet as chardet
with open(r"test/testdata/wikipediaJa_One_Thousand_and_One_Nights_SJIS.txt") as f:
msg = f.read()
result = chardet.detect(msg)
print(result)
```
## Test
```bash
$ sudo easy_install or pip install -U chardet nose
$ cd test
$ nosetests --nocapture tests.py
```
## Benchmark
code: [tests.TestCchardetSpeed](https://github.com/PyYoshi/cChardet/blob/master/test/tests.py#L415)
sample: [test/testdata/wikipediaJa_One_Thousand_and_One_Nights_SJIS.txt](https://github.com/PyYoshi/cChardet/blob/master/test/testdata/wikipediaJa_One_Thousand_and_One_Nights_SJIS.txt)
### Performance:
CPU: Intel Core i7 860 2.8GHz
RAM: DDR3-1333 16GB
Platform: Kubuntu 12.04 amd64, Python 2.7.3 64-bit
### Result:
<table>
<tr>
<th></th><th>Request (call/s)</th>
</tr>
<tr>
<td>chardet</td><td>0.32</td>
</tr>
<tr>
<td>cchardet</td><td>1012.97</td>
</tr>
</table>
### License
* The MIT License: [src/cchardet](https://github.com/PyYoshi/cChardet/tree/master/src/cchardet)
* Other Libraries License: Please, look at the [src/ext](https://github.com/PyYoshi/cChardet/tree/master/src/ext) directory.
### Thanks
* [uchardet-enhanced](https://bitbucket.org/medoc/uchardet-enhanced/overview)
* [Cython](http://www.cython.org/)
### Contact
[My blog](http://blog.remu.biz)
[Issues](https://github.com/PyYoshi/cChardet/issues?page=1&state=open)
Sorry for my poor English :)

View file

@ -2,21 +2,17 @@
cChardet cChardet
======== ========
This library is high speed universal character encoding detector. - cChardet is high speed universal character encoding detector. - binding
binding to to
`charsetdetect <https://bitbucket.org/medoc/uchardet-enhanced/overview>`_. `charsetdetect <https://bitbucket.org/medoc/uchardet-enhanced/overview>`_.
This library is faster than
`chardet <http://pypi.python.org/pypi/chardet>`_.
Support codecs Support codecs
============== --------------
- Big5 - Big5
- EUC-JP - EUC-JP
- EUC-KR - EUC-KR
- GB18030 - GB18030
- gb18030
- HZ-GB-2312 - HZ-GB-2312
- IBM855 - IBM855
- IBM866 - IBM866
@ -46,7 +42,7 @@ Support codecs
- x-mac-cyrillic - x-mac-cyrillic
Requires Requires
======== --------
- Cython: `http://www.cython.org/ <http://www.cython.org/>`_ - Cython: `http://www.cython.org/ <http://www.cython.org/>`_
@ -57,18 +53,14 @@ e.g.) Ubuntu 12.04
$ sudo apt-get install build-essential python-dev cython $ sudo apt-get install build-essential python-dev cython
Installation Installation
============ ------------
:: ::
$ cd /tmp $ cd /tmp
$ git clone git://github.com/PyYoshi/cChardet.git $ git clone git://github.com/PyYoshi/cChardet.git
$ cd cChardet $ cd cChardet
$ python setup.py build $ python setup.py build
$ sudo python setup.py install $ sudo python setup.py install
or or
@ -77,19 +69,29 @@ or
$ sudo easy_install cchardet $ sudo easy_install cchardet
Example
-------
::
# -*- coding: utf-8 -*-
import cchardet as chardet
with open(r"test/testdata/wikipediaJa_One_Thousand_and_One_Nights_SJIS.txt") as f:
msg = f.read()
result = chardet.detect(msg)
print(result)
Test Test
==== ----
:: ::
$ sudo easy_install or pip install -U chardet nose $ sudo easy_install or pip install -U chardet nose
$ cd test $ cd test
$ nosetests --nocapture tests.py $ nosetests --nocapture tests.py
Benchmark Benchmark
========= ---------
code: code:
`tests.TestCchardetSpeed <https://github.com/PyYoshi/cChardet/blob/master/test/tests.py#L415>`_ `tests.TestCchardetSpeed <https://github.com/PyYoshi/cChardet/blob/master/test/tests.py#L415>`_
@ -104,36 +106,39 @@ CPU: Intel Core i7 860 2.8GHz
RAM: DDR3-1333 16GB RAM: DDR3-1333 16GB
Platform: Windows 7 HP x64, Python 2.7.3 32-bit Platform: Kubuntu 12.04 amd64, Python 2.7.3 64-bit
Result: Result:
~~~~~~~ ~~~~~~~
:: ::
chardet: 0.25 (call/s) chardet: 0.32 (call/s)
cchardet: 500.03 (call/s) cchardet: 1012.97 (call/s)
License License
======= ~~~~~~~
- This library files("cchardet.pyx","setup.py","tests.py") are "The MIT License". - The MIT License:
`src/cchardet <https://github.com/PyYoshi/cChardet/tree/master/src/cchardet>`_
- Other Libraries License: Please, look at the - Other Libraries License: Please, look at the
`ext <https://github.com/PyYoshi/cChardet/tree/master/src/ext>`_ `src/ext <https://github.com/PyYoshi/cChardet/tree/master/src/ext>`_
directory. directory.
Thanks Thanks
====== ~~~~~~
- `https://bitbucket.org/medoc/uchardet-enhanced/overview <https://bitbucket.org/medoc/uchardet-enhanced/overview>`_ - `uchardet-enhanced <https://bitbucket.org/medoc/uchardet-enhanced/overview>`_
- `http://www.cython.org/ <http://www.cython.org/>`_ - `Cython <http://www.cython.org/>`_
Contact Contact
======= ~~~~~~~
`My blog <http://blog.remu.biz>`_ `My blog <http://blog.remu.biz>`_
`Issues <https://github.com/PyYoshi/cChardet/issues?page=1&state=open>`_
Sorry for my poor English :) Sorry for my poor English :)

120
readme.md
View file

@ -1,120 +0,0 @@
<!-- markdown to rst: http://johnmacfarlane.net/pandoc/try -->
# cChardet
This library is high speed universal character encoding detector. - binding to [charsetdetect](https://bitbucket.org/medoc/uchardet-enhanced/overview).
This library is faster than [chardet](http://pypi.python.org/pypi/chardet).
# Support codecs
* Big5
* EUC-JP
* EUC-KR
* GB18030
* gb18030
* HZ-GB-2312
* IBM855
* IBM866
* ISO-2022-CN
* ISO-2022-JP
* ISO-2022-KR
* ISO-8859-2
* ISO-8859-5
* ISO-8859-7
* ISO-8859-8
* KOI8-R
* Shift_JIS
* TIS-620
* UTF-8
* UTF-16BE
* UTF-16LE
* UTF-32BE
* UTF-32LE
* WINDOWS-1250
* WINDOWS-1251
* WINDOWS-1252
* WINDOWS-1253
* WINDOWS-1255
* EUC-TW
* X-ISO-10646-UCS-4-2143
* X-ISO-10646-UCS-4-3412
* x-mac-cyrillic
# Requires
* Cython: [http://www.cython.org/](http://www.cython.org/)
e.g.) Ubuntu 12.04
$sudo apt-get install build-essential python-dev cython
# Installation
$cd /tmp
$git clone git://github.com/PyYoshi/cChardet.git
$cd cChardet
$python setup.py build
$sudo python setup.py install
or
$sudo easy_install cchardet
# Example
```python
# coding: utf8
import cchardet
msg = file(r"test/testdata/wikipediaJa_One_Thousand_and_One_Nights_SJIS.txt").read()
result = cchardet.detect(msg)
print(result)
```
# Test
$sudo easy_install or pip install -U chardet nose
$cd test
$nosetests --nocapture tests.py
# Benchmark
code: [tests.TestCchardetSpeed](https://github.com/PyYoshi/cChardet/blob/master/test/tests.py#L415)
sample: [test/testdata/wikipediaJa_One_Thousand_and_One_Nights_SJIS.txt](https://github.com/PyYoshi/cChardet/blob/master/test/testdata/wikipediaJa_One_Thousand_and_One_Nights_SJIS.txt)
### Performance:
CPU: Intel Core i7 860 2.8GHz
RAM: DDR3-1333 16GB
Platform: Windows 7 HP x64, Python 2.7.3 32-bit
### Result:
<table>
<tr>
<th></th><th>Request (call/s)</th><th>Result of encoding</th>
</tr>
<tr>
<td>chardet</td><td>0.25</td><td>shift_jis</td>
</tr>
<tr>
<td>cchardet</td><td>500.03</td><td>shift_jis</td>
</tr>
</table>
# License
* This library files("cchardet.pyx","setup.py","tests.py") are "The MIT License".
* Other Libraries License: Please, look at the [ext](https://github.com/PyYoshi/cChardet/tree/master/src/ext) directory.
# Thanks
* [https://bitbucket.org/medoc/uchardet-enhanced/overview](https://bitbucket.org/medoc/uchardet-enhanced/overview)
* [http://www.cython.org/](http://www.cython.org/)
# Contact
[My blog](http://blog.remu.biz)
Sorry for my poor English :)