update readme
This commit is contained in:
parent
b9f5a14ef9
commit
696f9b3449
1 changed files with 63 additions and 50 deletions
113
README.markdown
113
README.markdown
|
@ -3,46 +3,42 @@ cChardet
|
|||
cChardet is high speed universal character encoding detector. - binding to [charsetdetect](https://bitbucket.org/medoc/uchardet-enhanced/overview).
|
||||
|
||||
## Support codecs
|
||||
* Big5
|
||||
* EUC-JP
|
||||
* EUC-KR
|
||||
* GB18030
|
||||
* HZ-GB-2312
|
||||
* IBM855
|
||||
* IBM866
|
||||
* ISO-2022-CN
|
||||
* ISO-2022-JP
|
||||
* ISO-2022-KR
|
||||
* ISO-8859-2
|
||||
* ISO-8859-5
|
||||
* ISO-8859-7
|
||||
* ISO-8859-8
|
||||
* KOI8-R
|
||||
* Shift_JIS
|
||||
* TIS-620
|
||||
* UTF-8
|
||||
* UTF-16BE
|
||||
* UTF-16LE
|
||||
* UTF-32BE
|
||||
* UTF-32LE
|
||||
* WINDOWS-1250
|
||||
* WINDOWS-1251
|
||||
* WINDOWS-1252
|
||||
* WINDOWS-1253
|
||||
* WINDOWS-1255
|
||||
* EUC-TW
|
||||
* X-ISO-10646-UCS-4-2143
|
||||
* X-ISO-10646-UCS-4-3412
|
||||
* x-mac-cyrillic
|
||||
|
||||
- Big5
|
||||
- EUC-JP
|
||||
- EUC-KR
|
||||
- GB18030
|
||||
- HZ-GB-2312
|
||||
- IBM855
|
||||
- IBM866
|
||||
- ISO-2022-CN
|
||||
- ISO-2022-JP
|
||||
- ISO-2022-KR
|
||||
- ISO-8859-2
|
||||
- ISO-8859-5
|
||||
- ISO-8859-7
|
||||
- ISO-8859-8
|
||||
- KOI8-R
|
||||
- Shift_JIS
|
||||
- TIS-620
|
||||
- UTF-8
|
||||
- UTF-16BE
|
||||
- UTF-16LE
|
||||
- UTF-32BE
|
||||
- UTF-32LE
|
||||
- WINDOWS-1250
|
||||
- WINDOWS-1251
|
||||
- WINDOWS-1252
|
||||
- WINDOWS-1253
|
||||
- WINDOWS-1255
|
||||
- EUC-TW
|
||||
- X-ISO-10646-UCS-4-2143
|
||||
- X-ISO-10646-UCS-4-3412
|
||||
- x-mac-cyrillic
|
||||
|
||||
## Requires
|
||||
* Cython: [http://www.cython.org/](http://www.cython.org/)
|
||||
|
||||
e.g.) Ubuntu 12.04
|
||||
|
||||
```bash
|
||||
$ sudo apt-get install build-essential python-dev cython
|
||||
```
|
||||
- Cython: [http://www.cython.org/](http://www.cython.org/)
|
||||
|
||||
## Installation
|
||||
|
||||
|
@ -50,7 +46,6 @@ $ sudo apt-get install build-essential python-dev cython
|
|||
$ cd /tmp
|
||||
$ git clone git://github.com/PyYoshi/cChardet.git
|
||||
$ cd cChardet
|
||||
$ python setup.py build
|
||||
$ python setup.py install
|
||||
```
|
||||
|
||||
|
@ -65,35 +60,53 @@ $ pip install -U cchardet
|
|||
```python
|
||||
# -*- coding: utf-8 -*-
|
||||
import cchardet as chardet
|
||||
with open(r"tests/testdata/wikipediaJa_One_Thousand_and_One_Nights_SJIS.txt", "rb") as f:
|
||||
with open(r"src/tests/testdata/wikipediaJa_One_Thousand_and_One_Nights_SJIS.txt", "rb") as f:
|
||||
msg = f.read()
|
||||
result = chardet.detect(msg)
|
||||
print(result)
|
||||
result = chardet.detect(msg)
|
||||
print(result)
|
||||
```
|
||||
|
||||
## Benchmark
|
||||
code: [tests.TestCchardetSpeed](https://github.com/PyYoshi/cChardet/blob/master/src/tests/bench.py)
|
||||
|
||||
sample: [tests/testdata/wikipediaJa_One_Thousand_and_One_Nights_SJIS.txt](https://github.com/PyYoshi/cChardet/blob/master/src/tests/testdata/wikipediaJa_One_Thousand_and_One_Nights_SJIS.txt)
|
||||
```bash
|
||||
$ cd src/
|
||||
$ pip install chardet
|
||||
$ python tests/bench.py
|
||||
```
|
||||
|
||||
### Performance:
|
||||
CPU: Intel Core i7 860 2.8GHz
|
||||
### Performance
|
||||
|
||||
RAM: DDR3-1333 16GB
|
||||
CPU: Intel(R) Core(TM) i3-4170 CPU @ 3.70GHz
|
||||
|
||||
Platform: Kubuntu 12.04 amd64, Python 2.7.3 64-bit
|
||||
RAM: DDR3 1600Mhz 16GB
|
||||
|
||||
### Result:
|
||||
Platform: Ubuntu 16.04 amd64
|
||||
|
||||
#### Python 2.7.12
|
||||
|
||||
<table>
|
||||
<tr>
|
||||
<th></th><th>Request (call/s)</th>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>chardet</td><td>0.32</td>
|
||||
<td>chardet</td><td>0.26</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>cchardet</td><td>975.46</td>
|
||||
<td>cchardet</td><td>1408.73</td>
|
||||
</tr>
|
||||
</table>
|
||||
|
||||
#### Python 3.5.2
|
||||
|
||||
<table>
|
||||
<tr>
|
||||
<th></th><th>Request (call/s)</th>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>chardet</td><td>0.28</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>cchardet</td><td>1380.40</td>
|
||||
</tr>
|
||||
</table>
|
||||
|
||||
|
|
Loading…
Reference in a new issue