Encoding detection in Ruby

I’ve just discovered Universal Encoding Detector. You give it a string and it returns its encoding and the confidence in the result! It is very useful. From what I’ve tested, it works very well.

To install it:

gem install chardet

Example :


require 'rubygems'
require 'UniversalDetector'
require 'net/http'
Net::HTTP.version_1_2
Net::HTTP.start( 'yahoo.co.jp' ) {|http|
data = http.get("/").body
p UniversalDetector::chardet(data)
#=> {"encoding"=>"EUC-JP", "confidence"=>0.99}
}

Leave a Reply