-
-
Notifications
You must be signed in to change notification settings - Fork 933
Description
Chinese characters encoding in GB18030 can not be converted to UTF-8. If the source encoding is GB2312 everythink works properly.
Here is an example:
require 'base64'
chinese = Base64.decode64('yqHHrsqh0824/Mqh0MQgt+e54jM2MMrmysqw5sL61+PE48v5sK4tsMK93Mb7s7XN+A0K')
puts chinese.force_encoding('GB2312').encode('UTF-8', invalid: :replace)chinese = Base64.decode64('yqHHrsqh0824/Mqh0MQgt+e54jM2MMrmysqw5sL61+PE48v5sK4tsMK93Mb7s7XN+A0K')
puts chinese.force_encoding('GB18030').encode('UTF-8', invalid: :replace)
Linux system:
Distributor ID: Ubuntu
Description: Ubuntu 14.04.3 LTS
Release: 14.04
Codename: trusty
When I run these lines on ruby 2.1.6p336 (2015-04-13 revision 50298) [x86_64-linux] I get following results:
省钱省油更省心 风光360舒适版满足你所爱-奥杰汽车网
省钱省油更省心 风光360舒适版满足你所爱-奥杰汽车网
As you can see both lines returns the same content.
But if I run these lines on jruby 9.0.1.0 (2.2.2) 2015-09-02 583f336 Java HotSpot(TM) 64-Bit Server VM 25.60-b23 on 1.8.0_60-b27 +jit [linux-amd64] I get a different result:
省钱省油更省心 风光360舒适版满足你所爱-奥杰汽车网
������� ��360��������-�����
As you can see the second line is not able to encode the string in UTF-8.
I would be very thankful for any help on this issue.