-
-
Notifications
You must be signed in to change notification settings - Fork 32.2k
gh-76960: Fix urljoining with an empty query string. #5645
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Hello, and thanks for your contribution! I'm a bot set up to make sure that the project can legally accept your contribution by verifying you have signed the PSF contributor agreement (CLA). Unfortunately we couldn't find an account corresponding to your GitHub username on bugs.python.org (b.p.o) to verify you have signed the CLA (this might be simply due to a missing "GitHub Name" entry in your b.p.o account settings). This is necessary for legal reasons before we can look at your contribution. Please follow the steps outlined in the CPython devguide to rectify this issue. Thanks again to your contribution and we look forward to looking at it! |
Bump—I’ve signed the CLA. How do I get that check rerun? @the-knights-who-say-ni if i @ you will it do the thing |
82793e8
to
3ffb4b4
Compare
Previously, urllib.urljoin with a relative URL of the form '?' would result in no change to the URL, in spite of the fact that it should clear the query string. This solves that case and variations on it.
3ffb4b4
to
c6d2bfa
Compare
Removing reviewers who were added on account of a bad rebase. Sorry! |
Force pushing to do an update is almost never the right thing to do and too often ruins a PR by bringing in other commits. Removing the extraneous commits did not removed the reviewer requests, so I did the latter. With the PR back to your commits, I checked the box to run GHA tests so Senthil can see them. |
This PR is stale because it has been open for 30 days with no activity. |
The following commit authors need to sign the Contributor License Agreement: |
if not query: | ||
# since urlparse doesn't leave any evidence of whether there was a bare | ||
# '?' with an empty query string, we need to check whether it was there. | ||
has_empty_query = url[0] == '?' or url.startswith(scheme + ':?') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is this url.startswith(scheme + ':?')
special case requirement for and where is this referenced?
The behavior in Ruby and Golang was different for this scenario (but consistent).
require 'uri'
base_url = 'https://www.example.com/?a=b'
relative_url = 'https:?'
url = URI.join(base_url, relative_url).to_s
puts url
https:?
And with golang https://go.dev/play/p/lui16M9pFyo
package main
import (
"fmt"
"net/url"
)
func main() {
base, _ := url.Parse("https://example.com/?a=b")
u, _ := url.Parse("http:?")
fmt.Println(base.ResolveReference(u))
}
http:?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this condition is removed or url.startswith(scheme + ':?')
from this patch, we can consider this PR as it brings the expected behavior seen across other language libraries.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From https://datatracker.ietf.org/doc/html/rfc3986#section-5.2.2:
-- A non-strict parser may ignore a scheme in the reference
-- if it is identical to the base URI's scheme.
For now, urllib.parse
behaves as a non-strict parser (there is special test for this). We can add an option to switch this, but this is a different feature.
But testing only url.startswith(scheme + ':?')
is not enough, because http:?#z
should clear query as well. And there are similar cases with other empty components. I am working on larger and more general PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The addition of url.startswith(scheme + ':?')
doesn't look correct to me.
A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated. Once you have made the requested changes, please leave a comment on this pull request containing the phrase |
@@ -437,6 +437,12 @@ def test_urljoins(self): | |||
# issue 23703: don't duplicate filename | |||
self.checkJoin('a', 'b', 'b') | |||
|
|||
# issue 32779: clear the query string when joining with '?' | |||
self.checkJoin('http://a/b/c?d=e', '?', 'http://a/b/c') | |||
self.checkJoin('http://a/b/c?d=e', 'http:?', 'http://a/b/c') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is the test case that @orsenthil suggests is wrong as it doesn't match other languages which result in http:?
as the output.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that's correct. We need to remove this. Perhaps I will modify this and bring this PR to a close.
This PR is stale because it has been open for 30 days with no activity. |
This PR is stale because it has been open for 30 days with no activity. |
Thank you for your PR @thetorpedodog. But this issue was fixed in more general way by #123273. |
Previously, urllib.urljoin with a relative URL of the form '?' would
result in no change to the URL, in spite of the fact that it should
clear the query string. This solves that case and variations on it.
https://bugs.python.org/issue32779