cChardet/README.markdown

113 lines
2.3 KiB
Markdown
Raw Normal View History

2013-05-08 11:23:26 +08:00
cChardet
========
cChardet is high speed universal character encoding detector. - binding to [charsetdetect](https://bitbucket.org/medoc/uchardet-enhanced/overview).
## Support codecs
* Big5
* EUC-JP
* EUC-KR
* GB18030
* HZ-GB-2312
* IBM855
* IBM866
* ISO-2022-CN
* ISO-2022-JP
* ISO-2022-KR
* ISO-8859-2
* ISO-8859-5
* ISO-8859-7
* ISO-8859-8
* KOI8-R
* Shift_JIS
* TIS-620
* UTF-8
* UTF-16BE
* UTF-16LE
* UTF-32BE
* UTF-32LE
* WINDOWS-1250
* WINDOWS-1251
* WINDOWS-1252
* WINDOWS-1253
* WINDOWS-1255
* EUC-TW
* X-ISO-10646-UCS-4-2143
* X-ISO-10646-UCS-4-3412
* x-mac-cyrillic
## Requires
* Cython: [http://www.cython.org/](http://www.cython.org/)
2015-09-07 15:40:27 +08:00
2013-05-08 11:23:26 +08:00
e.g.) Ubuntu 12.04
2015-09-07 15:40:27 +08:00
2013-05-08 11:23:26 +08:00
```bash
$ sudo apt-get install build-essential python-dev cython
```
## Installation
```bash
$ cd /tmp
$ git clone git://github.com/PyYoshi/cChardet.git
$ cd cChardet
$ python setup.py build
2015-09-07 15:40:27 +08:00
$ python setup.py install
2013-05-08 11:23:26 +08:00
```
or
```bash
2015-09-07 15:40:27 +08:00
$ pip install -U cchardet
2013-05-08 11:23:26 +08:00
```
## Example
```python
# -*- coding: utf-8 -*-
import cchardet as chardet
2016-10-17 10:42:28 +08:00
with open(r"tests/testdata/wikipediaJa_One_Thousand_and_One_Nights_SJIS.txt", "rb") as f:
2013-05-08 11:23:26 +08:00
msg = f.read()
result = chardet.detect(msg)
print(result)
```
## Benchmark
2016-10-17 10:42:28 +08:00
code: [tests.TestCchardetSpeed](https://github.com/PyYoshi/cChardet/blob/master/src/tests/bench.py)
2013-05-08 11:23:26 +08:00
2016-10-17 10:42:28 +08:00
sample: [tests/testdata/wikipediaJa_One_Thousand_and_One_Nights_SJIS.txt](https://github.com/PyYoshi/cChardet/blob/master/src/tests/testdata/wikipediaJa_One_Thousand_and_One_Nights_SJIS.txt)
2013-05-08 11:23:26 +08:00
### Performance:
CPU: Intel Core i7 860 2.8GHz
RAM: DDR3-1333 16GB
Platform: Kubuntu 12.04 amd64, Python 2.7.3 64-bit
### Result:
<table>
<tr>
<th></th><th>Request (call/s)</th>
</tr>
<tr>
<td>chardet</td><td>0.32</td>
</tr>
<tr>
<td>cchardet</td><td>975.46</td>
2013-05-08 11:23:26 +08:00
</tr>
</table>
2013-05-08 11:31:40 +08:00
## License
2013-05-08 11:23:26 +08:00
* The MIT License: [src/cchardet](https://github.com/PyYoshi/cChardet/tree/master/src/cchardet)
* Other Libraries License: Please, look at the [src/ext](https://github.com/PyYoshi/cChardet/tree/master/src/ext) directory.
2013-05-08 11:31:40 +08:00
## Thanks
2013-05-08 11:23:26 +08:00
* [uchardet-enhanced](https://bitbucket.org/medoc/uchardet-enhanced/overview)
* [Cython](http://www.cython.org/)
2013-05-08 11:31:40 +08:00
## Contact
2013-05-08 11:23:26 +08:00
[Issues](https://github.com/PyYoshi/cChardet/issues?page=1&state=open)