Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Convert chinese encoding GB18030 to UTF-8 doesn't work #3411

@tpuddi

Description

@tpuddi

Chinese characters encoding in GB18030 can not be converted to UTF-8. If the source encoding is GB2312 everythink works properly.

Here is an example:

require 'base64'

chinese = Base64.decode64('yqHHrsqh0824/Mqh0MQgt+e54jM2MMrmysqw5sL61+PE48v5sK4tsMK93Mb7s7XN+A0K')
puts chinese.force_encoding('GB2312').encode('UTF-8', invalid: :replace)

chinese = Base64.decode64('yqHHrsqh0824/Mqh0MQgt+e54jM2MMrmysqw5sL61+PE48v5sK4tsMK93Mb7s7XN+A0K')
puts chinese.force_encoding('GB18030').encode('UTF-8', invalid: :replace)

Linux system:

Distributor ID: Ubuntu
Description: Ubuntu 14.04.3 LTS
Release: 14.04
Codename: trusty

When I run these lines on ruby 2.1.6p336 (2015-04-13 revision 50298) [x86_64-linux] I get following results:

省钱省油更省心 风光360舒适版满足你所爱-奥杰汽车网
省钱省油更省心 风光360舒适版满足你所爱-奥杰汽车网

As you can see both lines returns the same content.

But if I run these lines on jruby 9.0.1.0 (2.2.2) 2015-09-02 583f336 Java HotSpot(TM) 64-Bit Server VM 25.60-b23 on 1.8.0_60-b27 +jit [linux-amd64] I get a different result:

省钱省油更省心 风光360舒适版满足你所爱-奥杰汽车网
������� ��360��������-�����

As you can see the second line is not able to encode the string in UTF-8.

I would be very thankful for any help on this issue.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions