Hello and thanks for your work building and maintaining this useful library!
We use Addressable and IDNA::Pure at GitHub for a number of URL parsing and generating tasks. The pure ruby IDN implementation is a bottleneck in some areas (see #407) so we are currently evaluating a switch to libidn via IDNA::Native. Our test suite found a few interesting differences between the two implementations when it comes to path normalization of percent-encoded NUL bytes. Here's an example:
Addressable::URI.parse("http://github.com/foo/bar/.%00./lol").normalized_path # Addressable::IDNA::Pure
# => "/foo/bar/.%00./lol"
Addressable::URI.parse("http://github.com/foo/bar/.%00./lol").normalized_path # Addressable::IDNA::Native
# => "/foo/bar/lol"
The behavior change is ultimately due to the following lower-level difference:
irb(main):004:0> Addressable::IDNA.unicode_normalize_kc(".\u0000.") # libidn
=> "."
irb(main):006:0> Addressable::IDNA.unicode_normalize_kc(".\u0000.") # pure
=> ".\u0000."
Unfortunately in our testing it seems browsers are split on which is the right way to deal with NUL bytes. RFC3986 has a discussion of %00 but leaves it up to the application (emphasis mine):
Percent-encoded octets must be decoded at some point during the dereference process. Applications must split the URI into its components and subcomponents prior to decoding the octets, as otherwise the decoded octets might be mistaken for delimiters. Security checks of the data within a URI should be applied after decoding the octets. Note, however, that the "%00" percent-encoding (NUL) may require special handling and should be rejected if the application is not expecting to receive raw data within a component.
Are you interested in harmonizing this difference in normalization between the two IDNA backends and which do you think is the appropriate behavior?
Hello and thanks for your work building and maintaining this useful library!
We use Addressable and
IDNA::Pureat GitHub for a number of URL parsing and generating tasks. The pure ruby IDN implementation is a bottleneck in some areas (see #407) so we are currently evaluating a switch to libidn viaIDNA::Native. Our test suite found a few interesting differences between the two implementations when it comes to path normalization of percent-encodedNULbytes. Here's an example:The behavior change is ultimately due to the following lower-level difference:
Unfortunately in our testing it seems browsers are split on which is the right way to deal with NUL bytes. RFC3986 has a discussion of
%00but leaves it up to the application (emphasis mine):Are you interested in harmonizing this difference in normalization between the two IDNA backends and which do you think is the appropriate behavior?