update README

This commit is contained in:
PyYoshi 2017-03-28 00:54:50 +09:00
parent 91b30267b0
commit 32ec01e6aa

View file

@ -1,57 +1,209 @@
cChardet cChardet
======== ========
cChardet is high speed universal character encoding detector. - binding to `charsetdetect`_. :exclamation: :exclamation: **Work In Progress Branch** :exclamation: :exclamation:
cChardet is high speed universal character encoding detector. - binding to `uchardet`_.
.. image:: https://badge.fury.io/py/cchardet.svg .. image:: https://badge.fury.io/py/cchardet.svg
:target: https://badge.fury.io/py/cchardet :target: https://badge.fury.io/py/cchardet
:alt: PyPI version :alt: PyPI version
.. image:: https://travis-ci.org/PyYoshi/cChardet.svg?branch=master .. image:: https://travis-ci.org/PyYoshi/cChardet.svg?branch=v2
:target: https://travis-ci.org/PyYoshi/cChardet :target: https://travis-ci.org/PyYoshi/cChardet
:alt: Travis Ci build status :alt: Travis Ci build status
.. image:: https://ci.appveyor.com/api/projects/status/lwkc4rgf3gncb1ne/branch/master?svg=true .. image:: https://ci.appveyor.com/api/projects/status/lwkc4rgf3gncb1ne/branch/v2?svg=true
:target: https://ci.appveyor.com/project/PyYoshi/cchardet/branch/master :target: https://ci.appveyor.com/project/PyYoshi/cchardet/branch/v2
:alt: AppVeyor build status :alt: AppVeyor build status
Support codecs Supported Languages/Encodings
-------------- -----------------------------
- Big5 - International (Unicode)
- EUC-JP
- EUC-KR - UTF-8
- UTF-16BE / UTF-16LE
- UTF-32BE / UTF-32LE / X-ISO-10646-UCS-4-34121 /
X-ISO-10646-UCS-4-21431
- Arabic
- ISO-8859-6
- WINDOWS-1256
- Bulgarian
- ISO-8859-5
- WINDOWS-1251
- Chinese
- ISO-2022-CN
- BIG5
- EUC-TW
- GB18030 - GB18030
- HZ-GB-2312 - HZ-GB-2312
- IBM855
- IBM866 - Croatian:
- ISO-2022-CN
- ISO-2022-JP
- ISO-2022-KR
- ISO-8859-2 - ISO-8859-2
- ISO-8859-5 - ISO-8859-13
- ISO-8859-7 - ISO-8859-16
- ISO-8859-8 - Windows-1250
- KOI8-R - IBM852
- Shift_JIS - MAC-CENTRALEUROPE
- TIS-620
- UTF-8 - Czech
- UTF-16BE
- UTF-16LE - Windows-1250
- UTF-32BE - ISO-8859-2
- UTF-32LE - IBM852
- WINDOWS-1250 - MAC-CENTRALEUROPE
- WINDOWS-1251
- Danish
- ISO-8859-1
- ISO-8859-15
- WINDOWS-1252 - WINDOWS-1252
- English
- ASCII
- Esperanto
- ISO-8859-3
- Estonian
- ISO-8859-4
- ISO-8859-13
- ISO-8859-13
- Windows-1252
- Windows-1257
- Finnish
- ISO-8859-1
- ISO-8859-4
- ISO-8859-9
- ISO-8859-13
- ISO-8859-15
- WINDOWS-1252
- French
- ISO-8859-1
- ISO-8859-15
- WINDOWS-1252
- German
- ISO-8859-1
- WINDOWS-1252
- Greek
- ISO-8859-7
- WINDOWS-1253 - WINDOWS-1253
- Hebrew
- ISO-8859-8
- WINDOWS-1255 - WINDOWS-1255
- EUC-TW
- X-ISO-10646-UCS-4-2143
- X-ISO-10646-UCS-4-3412
- x-mac-cyrillic
Requirements - Hungarian:
------------
- `Cython`_ - ISO-8859-2
- WINDOWS-1250
- Irish Gaelic
- ISO-8859-1
- ISO-8859-9
- ISO-8859-15
- WINDOWS-1252
- Italian
- ISO-8859-1
- ISO-8859-3
- ISO-8859-9
- ISO-8859-15
- WINDOWS-1252
- Japanese
- ISO-2022-JP
- SHIFT\_JIS
- EUC-JP
- Korean
- ISO-2022-KR
- EUC-KR / UHC
- Lithuanian
- ISO-8859-4
- ISO-8859-10
- ISO-8859-13
- Latvian
- ISO-8859-4
- ISO-8859-10
- ISO-8859-13
- Maltese
- ISO-8859-3
- Polish:
- ISO-8859-2
- ISO-8859-13
- ISO-8859-16
- Windows-1250
- IBM852
- MAC-CENTRALEUROPE
- Portuguese
- ISO-8859-1
- ISO-8859-9
- ISO-8859-15
- WINDOWS-1252
- Romanian:
- ISO-8859-2
- ISO-8859-16
- Windows-1250
- IBM852
- Russian
- ISO-8859-5
- KOI8-R
- WINDOWS-1251
- MAC-CYRILLIC
- IBM866
- IBM855
- Slovak
- Windows-1250
- ISO-8859-2
- IBM852
- MAC-CENTRALEUROPE
- Slovene
- ISO-8859-2
- ISO-8859-16
- Windows-1250
- IBM852
- M
Example Example
------- -------
@ -65,69 +217,16 @@ Example
result = chardet.detect(msg) result = chardet.detect(msg)
print(result) print(result)
LICENSE
Benchmark
---------
.. code-block:: bash
$ cd src/
$ pip install chardet
$ python tests/bench.py
Results
~~~~~~~
CPU: Intel(R) Core(TM) i3-4170 CPU @ 3.70GHz
RAM: DDR3 1600Mhz 16GB
Platform: Ubuntu 16.04 amd64
Python 2.7.12
^^^^^^^^^^^^^
+----------+------------------+
| | Request (call/s) |
+==========+==================+
| chardet | 0.26 |
+----------+------------------+
| cchardet | 1408.73 |
+----------+------------------+
Python 3.5.2
^^^^^^^^^^^^
+----------+------------------+
| | Request (call/s) |
+==========+==================+
| chardet | 0.28 |
+----------+------------------+
| cchardet | 1380.40 |
+----------+------------------+
License
------- -------
- The MIT License: `src/cchardet`_ See **COPYING** file.
- Other Libraries License: Please, look at the `src/ext`_ directory.
Thanks
------
- `uchardet-enhanced`_
- `Cython`_
Contact Contact
------- -------
`Issues`_ - `Issues`_
.. _charsetdetect: https://bitbucket.org/medoc/uchardet-enhanced/overview .. _uchardet: https://github.com/PyYoshi/uchardet
.. _Cython: http://www.cython.org/
.. _src/cchardet: https://github.com/PyYoshi/cChardet/tree/master/src/cchardet
.. _src/ext: https://github.com/PyYoshi/cChardet/tree/master/src/ext
.. _uchardet-enhanced: https://bitbucket.org/medoc/uchardet-enhanced/overview
.. _Issues: https://github.com/PyYoshi/cChardet/issues?page=1&state=open .. _Issues: https://github.com/PyYoshi/cChardet/issues?page=1&state=open