-
-
Notifications
You must be signed in to change notification settings - Fork 32.9k
gh-69589, gh-84774: Fix path normalization in urllib.parse.urljoin() #126679
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
gh-69589, gh-84774: Fix path normalization in urllib.parse.urljoin() #126679
Conversation
…e.urljoin() * Preserve double slashes in path. * Fix the case when the base path is relative and the relative reference path starts with '..'.
659a910
to
26f0b9e
Compare
self.checkJoin('http://a', scheme + './//', 'http://a//') | ||
|
||
self.checkJoin('b/c', '', 'b/c') | ||
self.checkJoin('b/c', '//', 'b/c') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
RFC 3986 carefully distinguishes between undefined and empty, and //
has an empty authority, not undefined, so we should hit the if defined(R.authority)
branch in §5.2.2. The result should be //
.
(This is independent of the discussion of #96015, I think.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I know. I left them non-distinguished for compatibility. We will likely change this in a separate issue.
self.checkJoin('b/c', '//v', '//v') | ||
self.checkJoin('b/c', '//v/w', '//v/w') | ||
self.checkJoin('b/c', '/w', '/w') | ||
self.checkJoin('b/c', '///w', '/w') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same; the result should be ///w
.
self.checkJoin('b/c', '../../w', 'w') | ||
self.checkJoin('b/c', '../../../w', 'w') | ||
self.checkJoin('b/c', 'w/.', 'b/w/') | ||
self.checkJoin('b/c', '../w/.', 'w/') | ||
self.checkJoin('b/c', '../../w/.', 'w/') | ||
self.checkJoin('b/c', '../../../w/.', 'w/') | ||
self.checkJoin('b/c', '..', '') | ||
self.checkJoin('b/c', '../..', '') | ||
self.checkJoin('b/c', '../../..', '') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Although these fall outside the direct scope of the pseudocode defined in RFC 3986 because b/c
is not an absolute base URI, they violate the obvious expectation that urljoin
should be associative. See
Given non–RFC 3986 input where the base URI is path-relative (undefined scheme, undefined authority, and path not beginning with /
), we should preserve extra initial ..
components in the output:
self.checkJoin('b/c', '../../w', 'w') | |
self.checkJoin('b/c', '../../../w', 'w') | |
self.checkJoin('b/c', 'w/.', 'b/w/') | |
self.checkJoin('b/c', '../w/.', 'w/') | |
self.checkJoin('b/c', '../../w/.', 'w/') | |
self.checkJoin('b/c', '../../../w/.', 'w/') | |
self.checkJoin('b/c', '..', '') | |
self.checkJoin('b/c', '../..', '') | |
self.checkJoin('b/c', '../../..', '') | |
self.checkJoin('b/c', '../../w', '../w') | |
self.checkJoin('b/c', '../../../w', '../../w') | |
self.checkJoin('b/c', 'w/.', 'b/w/') | |
self.checkJoin('b/c', '../w/.', 'w/') | |
self.checkJoin('b/c', '../../w/.', '../w/') | |
self.checkJoin('b/c', '../../../w/.', '../../w/') | |
self.checkJoin('b/c', '..', '') | |
self.checkJoin('b/c', '../..', '..') | |
self.checkJoin('b/c', '../../..', '../..') |
Uh oh!
There was an error while loading. Please reload this page.