universal character encoding detector
Find a file
2013-05-08 12:31:40 +09:00
src added debugging codes. 2012-09-19 19:14:49 +09:00
test tests: change speed test and the way to open file 2013-05-08 12:24:28 +09:00
.gitignore update .gitignore 2013-05-08 11:33:49 +09:00
ez_setup.py add ezsetup 2012-06-23 12:27:38 +09:00
MANIFEST.in change the way to build in the absence of Cython 2013-05-08 11:33:32 +09:00
pandoc_markdown2rst.bat add md2rst_converter 2012-06-26 11:02:21 +09:00
README.markdown readme: fix link, change few design 2013-05-08 12:31:40 +09:00
README.rst readme: fix link, change few design 2013-05-08 12:31:40 +09:00
setup.py change the way to build in the absence of Cython 2013-05-08 11:33:32 +09:00

cChardet
========

cChardet is high speed universal character encoding detector. - binding
to
`charsetdetect <https://bitbucket.org/medoc/uchardet-enhanced/overview>`_.

Support codecs
--------------

-  Big5
-  EUC-JP
-  EUC-KR
-  GB18030
-  HZ-GB-2312
-  IBM855
-  IBM866
-  ISO-2022-CN
-  ISO-2022-JP
-  ISO-2022-KR
-  ISO-8859-2
-  ISO-8859-5
-  ISO-8859-7
-  ISO-8859-8
-  KOI8-R
-  Shift\_JIS
-  TIS-620
-  UTF-8
-  UTF-16BE
-  UTF-16LE
-  UTF-32BE
-  UTF-32LE
-  WINDOWS-1250
-  WINDOWS-1251
-  WINDOWS-1252
-  WINDOWS-1253
-  WINDOWS-1255
-  EUC-TW
-  X-ISO-10646-UCS-4-2143
-  X-ISO-10646-UCS-4-3412
-  x-mac-cyrillic

Requires
--------

-  Cython: `http://www.cython.org/ <http://www.cython.org/>`_

e.g.) Ubuntu 12.04

::

    $ sudo apt-get install build-essential python-dev cython

Installation
------------

::

    $ cd /tmp
    $ git clone git://github.com/PyYoshi/cChardet.git
    $ cd cChardet
    $ python setup.py build
    $ sudo python setup.py install

or

::

    $ sudo easy_install cchardet

Example
-------

::

    # -*- coding: utf-8 -*-
    import cchardet as chardet
    with open(r"test/testdata/wikipediaJa_One_Thousand_and_One_Nights_SJIS.txt") as f:
        msg = f.read()
    result = chardet.detect(msg)
    print(result)

Test
----

::

    $ sudo easy_install or pip install -U chardet nose
    $ cd test
    $ nosetests --nocapture tests.py

Benchmark
---------

code:
`tests.TestCchardetSpeed <https://github.com/PyYoshi/cChardet/blob/master/test/tests.py#L461>`_

sample:
`test/testdata/wikipediaJa\_One\_Thousand\_and\_One\_Nights\_SJIS.txt <https://github.com/PyYoshi/cChardet/blob/master/test/testdata/wikipediaJa_One_Thousand_and_One_Nights_SJIS.txt>`_

Performance:
~~~~~~~~~~~~

CPU: Intel Core i7 860 2.8GHz

RAM: DDR3-1333 16GB

Platform: Kubuntu 12.04 amd64, Python 2.7.3 64-bit

Result:
~~~~~~~

::

	chardet:	0.32 (call/s)

	cchardet:	1012.97 (call/s)

License
--------------

-  The MIT License:
   `src/cchardet <https://github.com/PyYoshi/cChardet/tree/master/src/cchardet>`_

-  Other Libraries License: Please, look at the
   `src/ext <https://github.com/PyYoshi/cChardet/tree/master/src/ext>`_
   directory.

Thanks
--------------

-  `uchardet-enhanced <https://bitbucket.org/medoc/uchardet-enhanced/overview>`_

-  `Cython <http://www.cython.org/>`_

Contact
--------------

`My blog <http://blog.remu.biz>`_

`Issues <https://github.com/PyYoshi/cChardet/issues?page=1&state=open>`_

Sorry for my poor English :)