cChardet/README.rst

145 lines
2.4 KiB
ReStructuredText
Raw Normal View History

2012-07-07 03:30:02 +00:00
2012-06-26 01:54:57 +00:00
cChardet
========
2013-05-08 03:23:26 +00:00
cChardet is high speed universal character encoding detector. - binding
to
2012-07-07 03:30:02 +00:00
`charsetdetect <https://bitbucket.org/medoc/uchardet-enhanced/overview>`_.
2012-06-26 01:54:57 +00:00
Support codecs
2013-05-08 03:23:26 +00:00
--------------
2012-06-26 01:54:57 +00:00
- Big5
- EUC-JP
- EUC-KR
- GB18030
- HZ-GB-2312
- IBM855
- IBM866
- ISO-2022-CN
- ISO-2022-JP
- ISO-2022-KR
- ISO-8859-2
- ISO-8859-5
- ISO-8859-7
- ISO-8859-8
- KOI8-R
- Shift\_JIS
- TIS-620
- UTF-8
- UTF-16BE
- UTF-16LE
- UTF-32BE
- UTF-32LE
2012-07-07 03:30:02 +00:00
- WINDOWS-1250
- WINDOWS-1251
- WINDOWS-1252
- WINDOWS-1253
- WINDOWS-1255
- EUC-TW
2012-06-26 01:54:57 +00:00
- X-ISO-10646-UCS-4-2143
- X-ISO-10646-UCS-4-3412
- x-mac-cyrillic
Requires
2013-05-08 03:23:26 +00:00
--------
2012-06-26 01:54:57 +00:00
2012-07-07 03:30:02 +00:00
- Cython: `http://www.cython.org/ <http://www.cython.org/>`_
2012-06-26 01:54:57 +00:00
2012-07-07 03:30:02 +00:00
e.g.) Ubuntu 12.04
::
2013-05-08 03:23:26 +00:00
$ sudo apt-get install build-essential python-dev cython
2012-07-07 03:30:02 +00:00
Installation
2013-05-08 03:23:26 +00:00
------------
2012-07-07 03:30:02 +00:00
::
2013-05-08 03:23:26 +00:00
$ cd /tmp
$ git clone git://github.com/PyYoshi/cChardet.git
$ cd cChardet
$ python setup.py build
$ sudo python setup.py install
2012-07-07 03:30:02 +00:00
2013-05-08 03:23:26 +00:00
or
2012-06-26 01:54:57 +00:00
2013-05-08 03:23:26 +00:00
::
2012-06-26 01:54:57 +00:00
2013-05-08 03:23:26 +00:00
$ sudo easy_install cchardet
2012-06-26 01:54:57 +00:00
2013-05-08 03:23:26 +00:00
Example
-------
2012-06-26 01:54:57 +00:00
2012-07-07 03:30:02 +00:00
::
2012-06-26 01:54:57 +00:00
2013-05-08 03:23:26 +00:00
# -*- coding: utf-8 -*-
import cchardet as chardet
with open(r"test/testdata/wikipediaJa_One_Thousand_and_One_Nights_SJIS.txt") as f:
msg = f.read()
result = chardet.detect(msg)
print(result)
2012-06-26 01:54:57 +00:00
Test
2013-05-08 03:23:26 +00:00
----
2012-06-26 01:54:57 +00:00
2012-07-07 03:30:02 +00:00
::
2012-06-26 01:54:57 +00:00
2013-05-08 03:23:26 +00:00
$ sudo easy_install or pip install -U chardet nose
$ cd test
$ nosetests --nocapture tests.py
2012-06-26 01:54:57 +00:00
Benchmark
2013-05-08 03:23:26 +00:00
---------
2012-06-26 01:54:57 +00:00
2012-07-07 03:30:02 +00:00
code:
`tests.TestCchardetSpeed <https://github.com/PyYoshi/cChardet/blob/master/test/tests.py#L415>`_
2012-06-26 01:54:57 +00:00
2012-07-07 03:30:02 +00:00
sample:
`test/testdata/wikipediaJa\_One\_Thousand\_and\_One\_Nights\_SJIS.txt <https://github.com/PyYoshi/cChardet/blob/master/test/testdata/wikipediaJa_One_Thousand_and_One_Nights_SJIS.txt>`_
2012-06-26 01:54:57 +00:00
2012-07-07 03:30:02 +00:00
Performance:
~~~~~~~~~~~~
2012-06-26 01:54:57 +00:00
2012-07-07 03:30:02 +00:00
CPU: Intel Core i7 860 2.8GHz
2012-06-26 01:54:57 +00:00
2012-07-07 03:30:02 +00:00
RAM: DDR3-1333 16GB
2012-06-26 01:54:57 +00:00
2013-05-08 03:23:26 +00:00
Platform: Kubuntu 12.04 amd64, Python 2.7.3 64-bit
2012-06-26 01:54:57 +00:00
Result:
~~~~~~~
2012-07-07 05:37:11 +00:00
::
2012-07-07 03:30:02 +00:00
2013-05-08 03:23:26 +00:00
chardet: 0.32 (call/s)
2012-07-07 03:30:02 +00:00
2013-05-08 03:23:26 +00:00
cchardet: 1012.97 (call/s)
2012-06-26 01:54:57 +00:00
License
2013-05-08 03:23:26 +00:00
~~~~~~~
2012-06-26 01:54:57 +00:00
2013-05-08 03:23:26 +00:00
- The MIT License:
`src/cchardet <https://github.com/PyYoshi/cChardet/tree/master/src/cchardet>`_
2012-06-26 01:54:57 +00:00
2012-07-07 03:30:02 +00:00
- Other Libraries License: Please, look at the
2013-05-08 03:23:26 +00:00
`src/ext <https://github.com/PyYoshi/cChardet/tree/master/src/ext>`_
2012-07-07 03:30:02 +00:00
directory.
2012-06-26 01:54:57 +00:00
Thanks
2013-05-08 03:23:26 +00:00
~~~~~~
2012-06-26 01:54:57 +00:00
2013-05-08 03:23:26 +00:00
- `uchardet-enhanced <https://bitbucket.org/medoc/uchardet-enhanced/overview>`_
2012-06-26 01:54:57 +00:00
2013-05-08 03:23:26 +00:00
- `Cython <http://www.cython.org/>`_
2012-06-26 01:54:57 +00:00
Contact
2013-05-08 03:23:26 +00:00
~~~~~~~
2012-06-26 01:54:57 +00:00
2012-07-07 03:30:02 +00:00
`My blog <http://blog.remu.biz>`_
2012-06-26 01:54:57 +00:00
2013-05-08 03:23:26 +00:00
`Issues <https://github.com/PyYoshi/cChardet/issues?page=1&state=open>`_
2012-06-26 01:54:57 +00:00
Sorry for my poor English :)