Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Normalization differences between IDNA::Native and IDNA::Pure #408

@brasic

Description

@brasic

Hello and thanks for your work building and maintaining this useful library!

We use Addressable and IDNA::Pure at GitHub for a number of URL parsing and generating tasks. The pure ruby IDN implementation is a bottleneck in some areas (see #407) so we are currently evaluating a switch to libidn via IDNA::Native. Our test suite found a few interesting differences between the two implementations when it comes to path normalization of percent-encoded NUL bytes. Here's an example:

Addressable::URI.parse("http://github.com/foo/bar/.%00./lol").normalized_path # Addressable::IDNA::Pure
# => "/foo/bar/.%00./lol"

Addressable::URI.parse("http://github.com/foo/bar/.%00./lol").normalized_path # Addressable::IDNA::Native
# => "/foo/bar/lol"

The behavior change is ultimately due to the following lower-level difference:

irb(main):004:0> Addressable::IDNA.unicode_normalize_kc(".\u0000.") # libidn
=> "."
irb(main):006:0> Addressable::IDNA.unicode_normalize_kc(".\u0000.") # pure
=> ".\u0000."

Unfortunately in our testing it seems browsers are split on which is the right way to deal with NUL bytes. RFC3986 has a discussion of %00 but leaves it up to the application (emphasis mine):

Percent-encoded octets must be decoded at some point during the dereference process. Applications must split the URI into its components and subcomponents prior to decoding the octets, as otherwise the decoded octets might be mistaken for delimiters. Security checks of the data within a URI should be applied after decoding the octets. Note, however, that the "%00" percent-encoding (NUL) may require special handling and should be rejected if the application is not expecting to receive raw data within a component.

Are you interested in harmonizing this difference in normalization between the two IDNA backends and which do you think is the appropriate behavior?

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions