2013-05-08 11:23:26 +08:00
|
|
|
cChardet
|
|
|
|
========
|
|
|
|
cChardet is high speed universal character encoding detector. - binding to [charsetdetect](https://bitbucket.org/medoc/uchardet-enhanced/overview).
|
|
|
|
|
|
|
|
## Support codecs
|
2016-10-17 13:02:06 +08:00
|
|
|
|
|
|
|
- Big5
|
|
|
|
- EUC-JP
|
|
|
|
- EUC-KR
|
|
|
|
- GB18030
|
|
|
|
- HZ-GB-2312
|
|
|
|
- IBM855
|
|
|
|
- IBM866
|
|
|
|
- ISO-2022-CN
|
|
|
|
- ISO-2022-JP
|
|
|
|
- ISO-2022-KR
|
|
|
|
- ISO-8859-2
|
|
|
|
- ISO-8859-5
|
|
|
|
- ISO-8859-7
|
|
|
|
- ISO-8859-8
|
|
|
|
- KOI8-R
|
|
|
|
- Shift_JIS
|
|
|
|
- TIS-620
|
|
|
|
- UTF-8
|
|
|
|
- UTF-16BE
|
|
|
|
- UTF-16LE
|
|
|
|
- UTF-32BE
|
|
|
|
- UTF-32LE
|
|
|
|
- WINDOWS-1250
|
|
|
|
- WINDOWS-1251
|
|
|
|
- WINDOWS-1252
|
|
|
|
- WINDOWS-1253
|
|
|
|
- WINDOWS-1255
|
|
|
|
- EUC-TW
|
|
|
|
- X-ISO-10646-UCS-4-2143
|
|
|
|
- X-ISO-10646-UCS-4-3412
|
|
|
|
- x-mac-cyrillic
|
2013-05-08 11:23:26 +08:00
|
|
|
|
|
|
|
## Requires
|
2015-09-07 15:40:27 +08:00
|
|
|
|
2016-10-17 13:02:06 +08:00
|
|
|
- Cython: [http://www.cython.org/](http://www.cython.org/)
|
2013-05-08 11:23:26 +08:00
|
|
|
|
|
|
|
## Installation
|
|
|
|
|
|
|
|
```bash
|
|
|
|
$ cd /tmp
|
|
|
|
$ git clone git://github.com/PyYoshi/cChardet.git
|
|
|
|
$ cd cChardet
|
2015-09-07 15:40:27 +08:00
|
|
|
$ python setup.py install
|
2013-05-08 11:23:26 +08:00
|
|
|
```
|
|
|
|
|
|
|
|
or
|
|
|
|
|
|
|
|
```bash
|
2015-09-07 15:40:27 +08:00
|
|
|
$ pip install -U cchardet
|
2013-05-08 11:23:26 +08:00
|
|
|
```
|
|
|
|
|
|
|
|
## Example
|
|
|
|
|
|
|
|
```python
|
|
|
|
# -*- coding: utf-8 -*-
|
|
|
|
import cchardet as chardet
|
2016-10-17 13:02:06 +08:00
|
|
|
with open(r"src/tests/testdata/wikipediaJa_One_Thousand_and_One_Nights_SJIS.txt", "rb") as f:
|
2013-05-08 11:23:26 +08:00
|
|
|
msg = f.read()
|
2016-10-17 13:02:06 +08:00
|
|
|
result = chardet.detect(msg)
|
|
|
|
print(result)
|
2013-05-08 11:23:26 +08:00
|
|
|
```
|
|
|
|
|
|
|
|
## Benchmark
|
|
|
|
|
2016-10-17 13:02:06 +08:00
|
|
|
```bash
|
|
|
|
$ cd src/
|
|
|
|
$ pip install chardet
|
|
|
|
$ python tests/bench.py
|
|
|
|
```
|
|
|
|
|
|
|
|
### Performance
|
|
|
|
|
|
|
|
CPU: Intel(R) Core(TM) i3-4170 CPU @ 3.70GHz
|
|
|
|
|
|
|
|
RAM: DDR3 1600Mhz 16GB
|
2013-05-08 11:23:26 +08:00
|
|
|
|
2016-10-17 13:02:06 +08:00
|
|
|
Platform: Ubuntu 16.04 amd64
|
2013-05-08 11:23:26 +08:00
|
|
|
|
2016-10-17 13:02:06 +08:00
|
|
|
#### Python 2.7.12
|
2013-05-08 11:23:26 +08:00
|
|
|
|
2016-10-17 13:02:06 +08:00
|
|
|
<table>
|
|
|
|
<tr>
|
|
|
|
<th></th><th>Request (call/s)</th>
|
|
|
|
</tr>
|
|
|
|
<tr>
|
|
|
|
<td>chardet</td><td>0.26</td>
|
|
|
|
</tr>
|
|
|
|
<tr>
|
|
|
|
<td>cchardet</td><td>1408.73</td>
|
|
|
|
</tr>
|
|
|
|
</table>
|
2013-05-08 11:23:26 +08:00
|
|
|
|
2016-10-17 13:02:06 +08:00
|
|
|
#### Python 3.5.2
|
2013-05-08 11:23:26 +08:00
|
|
|
|
|
|
|
<table>
|
|
|
|
<tr>
|
|
|
|
<th></th><th>Request (call/s)</th>
|
|
|
|
</tr>
|
|
|
|
<tr>
|
2016-10-17 13:02:06 +08:00
|
|
|
<td>chardet</td><td>0.28</td>
|
2013-05-08 11:23:26 +08:00
|
|
|
</tr>
|
|
|
|
<tr>
|
2016-10-17 13:02:06 +08:00
|
|
|
<td>cchardet</td><td>1380.40</td>
|
2013-05-08 11:23:26 +08:00
|
|
|
</tr>
|
|
|
|
</table>
|
|
|
|
|
2013-05-08 11:31:40 +08:00
|
|
|
## License
|
2013-05-08 11:23:26 +08:00
|
|
|
* The MIT License: [src/cchardet](https://github.com/PyYoshi/cChardet/tree/master/src/cchardet)
|
|
|
|
|
|
|
|
* Other Libraries License: Please, look at the [src/ext](https://github.com/PyYoshi/cChardet/tree/master/src/ext) directory.
|
|
|
|
|
2013-05-08 11:31:40 +08:00
|
|
|
## Thanks
|
2013-05-08 11:23:26 +08:00
|
|
|
* [uchardet-enhanced](https://bitbucket.org/medoc/uchardet-enhanced/overview)
|
|
|
|
|
|
|
|
* [Cython](http://www.cython.org/)
|
|
|
|
|
2013-05-08 11:31:40 +08:00
|
|
|
## Contact
|
2013-05-08 11:23:26 +08:00
|
|
|
|
|
|
|
[Issues](https://github.com/PyYoshi/cChardet/issues?page=1&state=open)
|