/ / Converting gsub() pattern from ruby 1.8 to 2.0 - ruby, regex, unicode, gsub

Конвертиране на gsub () шаблон от рубин 1,8 до 2,0 - рубин, регекс, unicode, gsub

I have a ruby program that I"m trying to upgrade form ruby 1.8 to ruby 2.0.0-p247.

This works just fine in 1.8.7:

ARGF.each do |line|
# a collection of pecluliarlities, appended as they appear in data
line.gsub!("x92", """)
line.gsub!("x96", "-")
puts line
rescue => e
$stderr << "exception on line #{$.}:n"
$stderr << "#{e.message}:n"
$stderr << @line

But under ruby 2.0, this results in this an exxeption when encountering the 96 or 92 encoded into a data file that otherwise contains what appears to be ASCII:

 invalid byte sequence in utf-8

I have tried all manner of things: double backslashes, using a regex object instead of the string, force_encoding(), etc. and am stumped.

Can anybody fill in the missing puzzle piece for me?


=============== additions: 2013-09-25 ============

Changing x92 to u2019 did not fix the problem.

The program does not error until it actually hits a 92 or 96 in the input file, so I"m confused as to how the character pattern in the string is the problem when there are hundreds of thousands of lines of input data that are matched against the patterns without incident.


2 за отговор № 1

It"s not the regex that"s throwing the exception, it"s the Ruby compiler. x92 и x96 are how you would represent и in the windows-1252 encoding, but Ruby expects the string to be utf-8 encoded. You need to get out of the habit of putting raw byte values like x92 in your string literals. Non-ASCII characters should be specified by Unicode escape sequences (in this case, u2019 и u2013).

It"s a Unicode world now, stop thinking of text in terms of bytes and think in terms of characters instead.