Check URI Scheme validity according to RFC3986#8090
Open
GwynethLlewelyn wants to merge 3 commits intogogs:mainfrom
Open
Check URI Scheme validity according to RFC3986#8090GwynethLlewelyn wants to merge 3 commits intogogs:mainfrom
GwynethLlewelyn wants to merge 3 commits intogogs:mainfrom
Conversation
Nova is a Mac-only natively-compiled, sophisticated code editor.
1 task
Member
|
Hey @GwynethLlewelyn, thanks for the PR! Could you:
|
Add Nova and IDEA editor configurations to .gitignore
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Issue description
XMPP URI Schemes were defined and standardised by RFC5122, duly registered with IANA since 2008 (with permanent status), and fully conforming with the generic URI Scheme as defined by RFC3986, Section 3.1.
However, the current code base only checks for a subset of possible valid URIs using the regular expression:
This ad hoc implementation unfortunately does not conform with the URI Schemes specification.
Most notably, it explicitly excludes all URIs without an explicit authority element (i.e., what follows
//). But such schemes are perfectly valid; in fact, the above regular expression requires an additional element to be checked, namelymailto:, which, by convention, skips the authority element, like many others, including XMPP.Furthermore, by using
\w, the regex as implemented allows not only mixing lowercase and uppercase letters (URI Schemes are not case sensitive, although all RFCs strongly recommend that canonicalised URIs should be lowercase only), but also allows the underscore character_, which is, however, not allowed by the RFCs specifying the URI Schemes. This means that (imaginary) URI Schemes asfake_scheme://this.blows.up.everythingare accepted with the regular expression in the code, although such schemes are invalid according to the RFCs; and, on the other hand, perfectly valid URI schemes such astel:+1-555-1234orxmpp:[email protected]are rejected (both of which, incidentally, have been fully registered with IANA for a long time).As such, this simple PR proposes to correct the regular expression so that it conforms to the actual rules specified by the RFCs, accepting all IANA-registered schemes, while at the same time guaranteeing that invalid schemes are properly rejected.
This PR closes #6732.
(Special thanks to @Neustradamus for first reporting this issue exactly four years ago and having patiently waited for a solution since then!)
Checklist
Proposed solution and test plan
The ABNF specifying all valid URI Schemes is:
References:
This corresponds to the regular expression1:
which was tested on Regex101: https://regex101.com/r/HEpzIM/1
and the Go Playground: https://go.dev/play/p/vtYEugsNAfo
No further changes beyond changing the existing regular expression in
markdown.goare necessary to fully validate URI Schemes according to RFC3986.Footnotes
Note that, although the specifications allow for a mix of lowercase and uppercase US-ASCII letters, it also strongly recommends that URIs are canonicalised in lowercase, which is assumed here to be the case. ↩