cChardet/README.rst

273 lines
4 KiB
ReStructuredText
Raw Normal View History

2016-10-17 18:39:55 +08:00
cChardet
========
2017-03-27 23:54:50 +08:00
cChardet is high speed universal character encoding detector. - binding to `uchardet`_.
2016-10-17 18:39:55 +08:00
.. image:: https://badge.fury.io/py/cchardet.svg
:target: https://badge.fury.io/py/cchardet
:alt: PyPI version
2017-04-06 10:39:59 +08:00
.. image:: https://travis-ci.org/PyYoshi/cChardet.svg?branch=master
2016-10-17 18:39:55 +08:00
:target: https://travis-ci.org/PyYoshi/cChardet
:alt: Travis Ci build status
2017-04-06 10:39:59 +08:00
.. image:: https://ci.appveyor.com/api/projects/status/lwkc4rgf3gncb1ne/branch/master?svg=true
:target: https://ci.appveyor.com/project/PyYoshi/cchardet/branch/master
2016-10-17 18:39:55 +08:00
:alt: AppVeyor build status
2017-03-27 23:54:50 +08:00
Supported Languages/Encodings
-----------------------------
2016-10-17 18:39:55 +08:00
2017-03-27 23:54:50 +08:00
- International (Unicode)
2016-10-17 18:39:55 +08:00
2017-03-27 23:54:50 +08:00
- UTF-8
- UTF-16BE / UTF-16LE
- UTF-32BE / UTF-32LE / X-ISO-10646-UCS-4-34121 /
X-ISO-10646-UCS-4-21431
2016-10-17 18:39:55 +08:00
2017-03-27 23:54:50 +08:00
- Arabic
- ISO-8859-6
- WINDOWS-1256
- Bulgarian
- ISO-8859-5
- WINDOWS-1251
- Chinese
- ISO-2022-CN
- BIG5
- EUC-TW
- GB18030
- HZ-GB-2312
- Croatian:
- ISO-8859-2
- ISO-8859-13
- ISO-8859-16
- Windows-1250
- IBM852
- MAC-CENTRALEUROPE
- Czech
- Windows-1250
- ISO-8859-2
- IBM852
- MAC-CENTRALEUROPE
- Danish
- ISO-8859-1
- ISO-8859-15
- WINDOWS-1252
- English
- ASCII
- Esperanto
- ISO-8859-3
- Estonian
- ISO-8859-4
- ISO-8859-13
- ISO-8859-13
- Windows-1252
- Windows-1257
- Finnish
- ISO-8859-1
- ISO-8859-4
- ISO-8859-9
- ISO-8859-13
- ISO-8859-15
- WINDOWS-1252
- French
- ISO-8859-1
- ISO-8859-15
- WINDOWS-1252
2016-10-17 18:39:55 +08:00
2017-03-27 23:54:50 +08:00
- German
2016-10-17 18:39:55 +08:00
2017-03-27 23:54:50 +08:00
- ISO-8859-1
- WINDOWS-1252
2016-10-17 18:39:55 +08:00
2017-03-27 23:54:50 +08:00
- Greek
2016-10-17 18:39:55 +08:00
2017-03-27 23:54:50 +08:00
- ISO-8859-7
- WINDOWS-1253
2016-10-17 18:39:55 +08:00
2017-03-27 23:54:50 +08:00
- Hebrew
2016-10-17 18:39:55 +08:00
2017-03-27 23:54:50 +08:00
- ISO-8859-8
- WINDOWS-1255
2016-10-17 18:39:55 +08:00
2017-03-27 23:54:50 +08:00
- Hungarian:
2016-10-17 18:39:55 +08:00
2017-03-27 23:54:50 +08:00
- ISO-8859-2
- WINDOWS-1250
2016-10-17 18:39:55 +08:00
2017-03-27 23:54:50 +08:00
- Irish Gaelic
2016-10-17 18:39:55 +08:00
2017-03-27 23:54:50 +08:00
- ISO-8859-1
- ISO-8859-9
- ISO-8859-15
- WINDOWS-1252
2016-10-17 18:39:55 +08:00
2017-03-27 23:54:50 +08:00
- Italian
2016-10-17 18:39:55 +08:00
2017-03-27 23:54:50 +08:00
- ISO-8859-1
- ISO-8859-3
- ISO-8859-9
- ISO-8859-15
- WINDOWS-1252
2016-10-17 18:39:55 +08:00
2017-03-27 23:54:50 +08:00
- Japanese
2016-10-17 18:39:55 +08:00
2017-03-27 23:54:50 +08:00
- ISO-2022-JP
- SHIFT\_JIS
- EUC-JP
- Korean
- ISO-2022-KR
- EUC-KR / UHC
- Lithuanian
- ISO-8859-4
- ISO-8859-10
- ISO-8859-13
- Latvian
- ISO-8859-4
- ISO-8859-10
- ISO-8859-13
- Maltese
- ISO-8859-3
- Polish:
- ISO-8859-2
- ISO-8859-13
- ISO-8859-16
- Windows-1250
- IBM852
- MAC-CENTRALEUROPE
- Portuguese
- ISO-8859-1
- ISO-8859-9
- ISO-8859-15
- WINDOWS-1252
- Romanian:
- ISO-8859-2
- ISO-8859-16
- Windows-1250
- IBM852
- Russian
- ISO-8859-5
- KOI8-R
- WINDOWS-1251
- MAC-CYRILLIC
- IBM866
- IBM855
- Slovak
- Windows-1250
- ISO-8859-2
- IBM852
- MAC-CENTRALEUROPE
- Slovene
- ISO-8859-2
- ISO-8859-16
- Windows-1250
- IBM852
- M
Example
2016-10-17 18:39:55 +08:00
-------
2017-03-27 23:54:50 +08:00
.. code-block:: python
2016-10-17 18:39:55 +08:00
2017-03-27 23:54:50 +08:00
# -*- coding: utf-8 -*-
import cchardet as chardet
2017-03-28 09:29:19 +08:00
with open(r"src/tests/samples/wikipediaJa_One_Thousand_and_One_Nights_SJIS.txt", "rb") as f:
2017-03-27 23:54:50 +08:00
msg = f.read()
result = chardet.detect(msg)
print(result)
2017-03-28 09:29:19 +08:00
Benchmark
---------
.. code-block:: bash
$ cd src/
$ pip install chardet
$ python tests/bench.py
Results
~~~~~~~
CPU: Intel(R) Core(TM) i5-4690 CPU @ 3.50GHz
RAM: DDR3 1600Mhz 16GB
Platform: Ubuntu 16.04 amd64
2017-04-25 10:48:11 +08:00
Python 2.7.13
2017-03-28 09:29:19 +08:00
^^^^^^^^^^^^^
2017-04-25 10:48:11 +08:00
+-----------------+------------------+
| | Request (call/s) |
+=================+==================+
| chardet v3.0.2 | 0.36 |
+-----------------+------------------+
| cchardet v2.0.1 | 1396.42 |
+-----------------+------------------+
2017-03-28 09:29:19 +08:00
2017-04-25 10:48:11 +08:00
Python 3.6.1
2017-03-28 09:29:19 +08:00
^^^^^^^^^^^^
2017-04-25 10:48:11 +08:00
+-----------------+------------------+
| | Request (call/s) |
+=================+==================+
| chardet v3.0.2 | 0.35 |
+-----------------+------------------+
| cchardet v2.0.1 | 1467.77 |
+-----------------+------------------+
2017-03-28 09:29:19 +08:00
2017-03-27 23:54:50 +08:00
LICENSE
-------
2016-10-17 18:39:55 +08:00
2017-03-27 23:54:50 +08:00
See **COPYING** file.
2016-10-17 18:39:55 +08:00
Contact
-------
2017-03-27 23:54:50 +08:00
- `Issues`_
2016-10-17 18:39:55 +08:00
2017-03-27 23:54:50 +08:00
.. _uchardet: https://github.com/PyYoshi/uchardet
2016-10-17 18:39:55 +08:00
.. _Issues: https://github.com/PyYoshi/cChardet/issues?page=1&state=open