Encoding detection in Ruby
I’ve just discovered Universal Encoding Detector. You give it a string and it returns its encoding and the confidence in the result! It is very useful. From what I’ve tested, it works very well.
To install it:
gem install chardet
Example :
require 'rubygems'
require 'UniversalDetector'
require 'net/http'
Net::HTTP.version_1_2
Net::HTTP.start( 'yahoo.co.jp' ) {|http|
data = http.get("/").body
p UniversalDetector::chardet(data)
#=> {"encoding"=>"EUC-JP", "confidence"=>0.99}
}