cChardet/readme.md

61 lines
1.3 KiB
Markdown
Raw Normal View History

2012-06-20 01:41:36 +00:00
# cChardet
This library is high speed universal character encoding detector. - binding to libcharsetdetect
# Requires
Cython: [http://www.cython.org/](http://www.cython.org/)
uchardet-enhanced: [https://bitbucket.org/medoc/uchardet-enhanced/overview](https://bitbucket.org/medoc/uchardet-enhanced/overview)
2012-06-20 13:18:38 +00:00
# Install
### Build uchardet-enhanced
$cd /tmp
$hg clone https://bitbucket.org/medoc/uchardet-enhanced
$cd uchardet-enhanced/libcharsetdetect
$./configure
$make
$sudo make install
$ls -la /usr/local/lib
$ls -la /usr/local/include
### Build cChardet
$cd /tmp
$git clone git://github.com/PyYoshi/cChardet.git
$cd cChardet
$sudo pip install or easy_install -U cython. (If your os is Ubuntu, I recommend that you do "sudo apt-get install python-dev cython")
$python setup.py build
$sudo python setup.py install
2012-06-20 01:41:36 +00:00
2012-06-20 02:29:50 +00:00
# Benchmark
see tests.TestCchardetSpeed
2012-06-20 02:31:41 +00:00
2012-06-20 02:40:03 +00:00
### Sample(shift_jis):
2012-06-20 13:18:38 +00:00
testdata/wikipediaJa_One_Thousand_and_One_Nights_SJIS.txt
2012-06-20 02:31:41 +00:00
2012-06-20 02:40:03 +00:00
### PC Spec.:
CPU: Intel Core i7 860 2.8GHz
2012-06-20 02:31:41 +00:00
2012-06-20 02:40:03 +00:00
RAM: DDR3-1333 16GB
2012-06-20 13:18:38 +00:00
Platform: Windows 7 HP x64, Python 2.7.3 32-bit
2012-06-20 02:40:03 +00:00
### Result:
2012-06-20 02:29:50 +00:00
chardet: 4.009999990463257s, shift_jis
2012-06-20 02:31:41 +00:00
2012-06-20 02:40:03 +00:00
cchardet: 0.0009999275207519531s, shift_jis
2012-06-20 02:29:50 +00:00
2012-06-20 01:41:36 +00:00
# Contact
[My blog](http://blog.remu.biz)
Sorry for my poor English :)