universal character encoding detector
Find a file
2017-02-26 18:07:23 +09:00
dockerfiles Support Python 3.6 2017-01-08 12:06:33 +09:00
src version 1.1.3 2017-02-26 18:00:36 +09:00
.gitignore use tox 2016-10-17 12:19:24 +09:00
.travis.yml add TOXENV=py37 2017-01-08 12:32:54 +09:00
appveyor.yml Support Python 3.6 2017-01-08 12:06:33 +09:00
build.cmd add build.cmd 2016-10-17 14:55:31 +09:00
CHANGES.rst version 1.1.3 2017-02-26 18:07:23 +09:00
Makefile add build cmd 2017-01-08 12:17:58 +09:00
MANIFEST.in include tests 2016-10-18 10:02:51 +09:00
README.rst add rst documents 2016-10-17 19:39:55 +09:00
setup.cfg use tox 2016-10-17 12:19:24 +09:00
setup.py update setup.py 2017-01-08 12:48:59 +09:00
tox.ini add TOXENV=py37 2017-01-08 12:32:54 +09:00

cChardet
========

cChardet is high speed universal character encoding detector. - binding to `charsetdetect`_.

.. image:: https://badge.fury.io/py/cchardet.svg
   :target: https://badge.fury.io/py/cchardet
   :alt: PyPI version
.. image:: https://travis-ci.org/PyYoshi/cChardet.svg?branch=master
   :target: https://travis-ci.org/PyYoshi/cChardet
   :alt: Travis Ci build status
.. image:: https://ci.appveyor.com/api/projects/status/lwkc4rgf3gncb1ne/branch/master?svg=true
   :target: https://ci.appveyor.com/project/PyYoshi/cchardet/branch/master
   :alt: AppVeyor build status

Support codecs
--------------

- Big5
- EUC-JP
- EUC-KR
- GB18030
- HZ-GB-2312
- IBM855
- IBM866
- ISO-2022-CN
- ISO-2022-JP
- ISO-2022-KR
- ISO-8859-2
- ISO-8859-5
- ISO-8859-7
- ISO-8859-8
- KOI8-R
- Shift_JIS
- TIS-620
- UTF-8
- UTF-16BE
- UTF-16LE
- UTF-32BE
- UTF-32LE
- WINDOWS-1250
- WINDOWS-1251
- WINDOWS-1252
- WINDOWS-1253
- WINDOWS-1255
- EUC-TW
- X-ISO-10646-UCS-4-2143
- X-ISO-10646-UCS-4-3412
- x-mac-cyrillic

Requirements
------------

- `Cython`_

Example
-------

.. code-block:: python

    # -*- coding: utf-8 -*-
    import cchardet as chardet
    with open(r"src/tests/testdata/wikipediaJa_One_Thousand_and_One_Nights_SJIS.txt", "rb") as f:
        msg = f.read()
        result = chardet.detect(msg)
        print(result)


Benchmark
---------

.. code-block:: bash

    $ cd src/
    $ pip install chardet
    $ python tests/bench.py


Results
~~~~~~~

CPU: Intel(R) Core(TM) i3-4170 CPU @ 3.70GHz

RAM: DDR3 1600Mhz 16GB

Platform: Ubuntu 16.04 amd64

Python 2.7.12
^^^^^^^^^^^^^

+----------+------------------+
|          | Request (call/s) |
+==========+==================+
| chardet  | 0.26             |
+----------+------------------+
| cchardet | 1408.73          |
+----------+------------------+

Python 3.5.2
^^^^^^^^^^^^

+----------+------------------+
|          | Request (call/s) |
+==========+==================+
| chardet  | 0.28             |
+----------+------------------+
| cchardet | 1380.40          |
+----------+------------------+

License
-------

-  The MIT License: `src/cchardet`_
-  Other Libraries License: Please, look at the `src/ext`_ directory.

Thanks
------

-  `uchardet-enhanced`_
-  `Cython`_

Contact
-------

`Issues`_


.. _charsetdetect: https://bitbucket.org/medoc/uchardet-enhanced/overview
.. _Cython: http://www.cython.org/
.. _src/cchardet: https://github.com/PyYoshi/cChardet/tree/master/src/cchardet
.. _src/ext: https://github.com/PyYoshi/cChardet/tree/master/src/ext
.. _uchardet-enhanced: https://bitbucket.org/medoc/uchardet-enhanced/overview
.. _Issues: https://github.com/PyYoshi/cChardet/issues?page=1&state=open