Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

serhiy-storchaka
Copy link
Member

@serhiy-storchaka serhiy-storchaka commented Nov 11, 2024

  • Preserve double slashes in path.
  • Fix the case when the base path is relative and the relative reference path starts with '..'.

…e.urljoin()

* Preserve double slashes in path.
* Fix the case when the base path is relative and the relative reference
  path starts with '..'.
@serhiy-storchaka serhiy-storchaka force-pushed the urllib-urljoin-remove-dot-segments branch from 659a910 to 26f0b9e Compare November 11, 2024 11:35
@serhiy-storchaka serhiy-storchaka changed the title gh-69589: Fix path normalization in urllib.parse.urljoin() gh-69589, gh-84774: Fix path normalization in urllib.parse.urljoin() Nov 11, 2024
@serhiy-storchaka serhiy-storchaka changed the title gh-69589, gh-84774: Fix path normalization in urllib.parse.urljoin() gh-84774: Fix path normalization in urllib.parse.urljoin() Nov 11, 2024
@serhiy-storchaka serhiy-storchaka changed the title gh-84774: Fix path normalization in urllib.parse.urljoin() gh-69589, gh-84774: Fix path normalization in urllib.parse.urljoin() Nov 11, 2024
self.checkJoin('http://a', scheme + './//', 'http://a//')

self.checkJoin('b/c', '', 'b/c')
self.checkJoin('b/c', '//', 'b/c')
Copy link
Contributor

@andersk andersk Nov 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RFC 3986 carefully distinguishes between undefined and empty, and // has an empty authority, not undefined, so we should hit the if defined(R.authority) branch in §5.2.2. The result should be //.

(This is independent of the discussion of #96015, I think.)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I know. I left them non-distinguished for compatibility. We will likely change this in a separate issue.

self.checkJoin('b/c', '//v', '//v')
self.checkJoin('b/c', '//v/w', '//v/w')
self.checkJoin('b/c', '/w', '/w')
self.checkJoin('b/c', '///w', '/w')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same; the result should be ///w.

Comment on lines +760 to +768
self.checkJoin('b/c', '../../w', 'w')
self.checkJoin('b/c', '../../../w', 'w')
self.checkJoin('b/c', 'w/.', 'b/w/')
self.checkJoin('b/c', '../w/.', 'w/')
self.checkJoin('b/c', '../../w/.', 'w/')
self.checkJoin('b/c', '../../../w/.', 'w/')
self.checkJoin('b/c', '..', '')
self.checkJoin('b/c', '../..', '')
self.checkJoin('b/c', '../../..', '')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although these fall outside the direct scope of the pseudocode defined in RFC 3986 because b/c is not an absolute base URI, they violate the obvious expectation that urljoin should be associative. See

Given non–RFC 3986 input where the base URI is path-relative (undefined scheme, undefined authority, and path not beginning with /), we should preserve extra initial .. components in the output:

Suggested change
self.checkJoin('b/c', '../../w', 'w')
self.checkJoin('b/c', '../../../w', 'w')
self.checkJoin('b/c', 'w/.', 'b/w/')
self.checkJoin('b/c', '../w/.', 'w/')
self.checkJoin('b/c', '../../w/.', 'w/')
self.checkJoin('b/c', '../../../w/.', 'w/')
self.checkJoin('b/c', '..', '')
self.checkJoin('b/c', '../..', '')
self.checkJoin('b/c', '../../..', '')
self.checkJoin('b/c', '../../w', '../w')
self.checkJoin('b/c', '../../../w', '../../w')
self.checkJoin('b/c', 'w/.', 'b/w/')
self.checkJoin('b/c', '../w/.', 'w/')
self.checkJoin('b/c', '../../w/.', '../w/')
self.checkJoin('b/c', '../../../w/.', '../../w/')
self.checkJoin('b/c', '..', '')
self.checkJoin('b/c', '../..', '..')
self.checkJoin('b/c', '../../..', '../..')

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants