Normalization differences between IDNA::Native and IDNA::Pure

Hello and thanks for your work building and maintaining this useful library!

We use Addressable and `IDNA::Pure` at GitHub for a number of URL parsing and generating tasks.  The pure ruby IDN implementation is a bottleneck in some areas (see https://github.com/sporkmonger/addressable/pull/407) so we are currently evaluating a switch to libidn via `IDNA::Native`.  Our test suite found a few interesting differences between the two implementations when it comes to path normalization of percent-encoded `NUL` bytes.  Here's an example:

```ruby
Addressable::URI.parse("http://github.com/foo/bar/.%00./lol").normalized_path # Addressable::IDNA::Pure
# => "/foo/bar/.%00./lol"

Addressable::URI.parse("http://github.com/foo/bar/.%00./lol").normalized_path # Addressable::IDNA::Native
# => "/foo/bar/lol"
```

The behavior change is ultimately due to the following lower-level difference:

```ruby
irb(main):004:0> Addressable::IDNA.unicode_normalize_kc(".\u0000.") # libidn
=> "."
irb(main):006:0> Addressable::IDNA.unicode_normalize_kc(".\u0000.") # pure
=> ".\u0000."
```

Unfortunately in our testing it seems browsers are split on which is the right way to deal with NUL bytes.  [RFC3986](https://tools.ietf.org/html/rfc3986#section-7.3) has a discussion of `%00` but leaves it up to the application (emphasis mine):
>    Percent-encoded octets must be decoded at some point during the dereference process.  Applications must split the URI into its components and subcomponents prior to decoding the octets, as otherwise the decoded octets might be mistaken for delimiters. Security checks of the data within a URI should be applied after decoding the octets.  **Note, however, that the "%00" percent-encoding (NUL) may require special handling and should be rejected if the application is not expecting to receive raw data within a component.**

Are you interested in harmonizing this difference in normalization between the two IDNA backends and which do you think is the appropriate behavior?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Normalization differences between IDNA::Native and IDNA::Pure #408

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Normalization differences between IDNA::Native and IDNA::Pure #408

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions