2012-06-20 01:41:36 +00:00
|
|
|
# cChardet
|
|
|
|
This library is high speed universal character encoding detector. - binding to libcharsetdetect
|
|
|
|
|
|
|
|
# Requires
|
|
|
|
Cython: [http://www.cython.org/](http://www.cython.org/)
|
|
|
|
|
|
|
|
uchardet-enhanced: [https://bitbucket.org/medoc/uchardet-enhanced/overview](https://bitbucket.org/medoc/uchardet-enhanced/overview)
|
|
|
|
|
2012-06-20 13:18:38 +00:00
|
|
|
# Install
|
|
|
|
### Build uchardet-enhanced
|
|
|
|
$cd /tmp
|
|
|
|
|
|
|
|
$hg clone https://bitbucket.org/medoc/uchardet-enhanced
|
|
|
|
|
|
|
|
$cd uchardet-enhanced/libcharsetdetect
|
|
|
|
|
|
|
|
$./configure
|
|
|
|
|
|
|
|
$make
|
|
|
|
|
|
|
|
$sudo make install
|
|
|
|
|
|
|
|
$ls -la /usr/local/lib
|
|
|
|
|
|
|
|
$ls -la /usr/local/include
|
|
|
|
|
|
|
|
### Build cChardet
|
|
|
|
$cd /tmp
|
|
|
|
|
|
|
|
$git clone git://github.com/PyYoshi/cChardet.git
|
|
|
|
|
|
|
|
$cd cChardet
|
|
|
|
|
|
|
|
$sudo pip install or easy_install -U cython. (If your os is Ubuntu, I recommend that you do "sudo apt-get install python-dev cython")
|
|
|
|
|
|
|
|
$python setup.py build
|
|
|
|
|
|
|
|
$sudo python setup.py install
|
2012-06-20 01:41:36 +00:00
|
|
|
|
2012-06-20 02:29:50 +00:00
|
|
|
# Benchmark
|
|
|
|
see tests.TestCchardetSpeed
|
2012-06-20 02:31:41 +00:00
|
|
|
|
2012-06-20 02:40:03 +00:00
|
|
|
### Sample(shift_jis):
|
2012-06-20 13:18:38 +00:00
|
|
|
testdata/wikipediaJa_One_Thousand_and_One_Nights_SJIS.txt
|
2012-06-20 02:31:41 +00:00
|
|
|
|
2012-06-20 02:40:03 +00:00
|
|
|
### PC Spec.:
|
|
|
|
CPU: Intel Core i7 860 2.8GHz
|
2012-06-20 02:31:41 +00:00
|
|
|
|
2012-06-20 02:40:03 +00:00
|
|
|
RAM: DDR3-1333 16GB
|
|
|
|
|
2012-06-20 13:18:38 +00:00
|
|
|
Platform: Windows 7 HP x64, Python 2.7.3 32-bit
|
|
|
|
|
2012-06-20 02:40:03 +00:00
|
|
|
### Result:
|
2012-06-20 02:29:50 +00:00
|
|
|
chardet: 4.009999990463257s, shift_jis
|
2012-06-20 02:31:41 +00:00
|
|
|
|
2012-06-20 02:40:03 +00:00
|
|
|
cchardet: 0.0009999275207519531s, shift_jis
|
2012-06-20 02:29:50 +00:00
|
|
|
|
2012-06-20 01:41:36 +00:00
|
|
|
# Contact
|
|
|
|
[My blog](http://blog.remu.biz)
|
|
|
|
|
|
|
|
Sorry for my poor English :)
|