Thanks to visit codestin.com
Credit goes to github.com

Skip to content

gh-76960: Fix urljoining with an empty query string. #5645

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 3 commits into from

Conversation

thetorpedodog
Copy link

@thetorpedodog thetorpedodog commented Feb 12, 2018

Previously, urllib.urljoin with a relative URL of the form '?' would
result in no change to the URL, in spite of the fact that it should
clear the query string. This solves that case and variations on it.

https://bugs.python.org/issue32779

@the-knights-who-say-ni
Copy link

Hello, and thanks for your contribution!

I'm a bot set up to make sure that the project can legally accept your contribution by verifying you have signed the PSF contributor agreement (CLA).

Unfortunately we couldn't find an account corresponding to your GitHub username on bugs.python.org (b.p.o) to verify you have signed the CLA (this might be simply due to a missing "GitHub Name" entry in your b.p.o account settings). This is necessary for legal reasons before we can look at your contribution. Please follow the steps outlined in the CPython devguide to rectify this issue.

Thanks again to your contribution and we look forward to looking at it!

@thetorpedodog
Copy link
Author

Bump—I’ve signed the CLA. How do I get that check rerun?

@the-knights-who-say-ni if i @ you will it do the thing

Paul Fisher added 2 commits May 28, 2021 11:03
Previously, urllib.urljoin with a relative URL of the form '?' would
result in no change to the URL, in spite of the fact that it should
clear the query string.  This solves that case and variations on it.
@thetorpedodog
Copy link
Author

Removing reviewers who were added on account of a bad rebase. Sorry!

@thetorpedodog thetorpedodog marked this pull request as draft May 28, 2021 15:06
@thetorpedodog thetorpedodog marked this pull request as ready for review May 28, 2021 15:07
@terryjreedy terryjreedy removed request for a team, gpshead, 1st1, tiran, abalkin and pganssle May 28, 2021 15:12
@terryjreedy
Copy link
Member

Force pushing to do an update is almost never the right thing to do and too often ruins a PR by bringing in other commits.
git merge upstream/main on the PR branch seems to never do this.

Removing the extraneous commits did not removed the reviewer requests, so I did the latter. With the PR back to your commits, I checked the box to run GHA tests so Senthil can see them.

@github-actions
Copy link

This PR is stale because it has been open for 30 days with no activity.

@github-actions github-actions bot added the stale Stale PR or inactive for long period of time. label Aug 15, 2022
@ghost
Copy link

ghost commented Feb 9, 2023

The following commit authors need to sign the Contributor License Agreement:

Click the button to sign:
CLA not signed

if not query:
# since urlparse doesn't leave any evidence of whether there was a bare
# '?' with an empty query string, we need to check whether it was there.
has_empty_query = url[0] == '?' or url.startswith(scheme + ':?')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is this url.startswith(scheme + ':?') special case requirement for and where is this referenced?

The behavior in Ruby and Golang was different for this scenario (but consistent).

require 'uri'

base_url = 'https://www.example.com/?a=b'
relative_url = 'https:?'

url = URI.join(base_url, relative_url).to_s
puts url

https:?

And with golang https://go.dev/play/p/lui16M9pFyo

package main

import (
	"fmt"
	"net/url"
)

func main() {
	base, _ := url.Parse("https://example.com/?a=b")
	u, _ := url.Parse("http:?")
	fmt.Println(base.ResolveReference(u))
}

http:?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this condition is removed or url.startswith(scheme + ':?') from this patch, we can consider this PR as it brings the expected behavior seen across other language libraries.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From https://datatracker.ietf.org/doc/html/rfc3986#section-5.2.2:

      -- A non-strict parser may ignore a scheme in the reference
      -- if it is identical to the base URI's scheme.

For now, urllib.parse behaves as a non-strict parser (there is special test for this). We can add an option to switch this, but this is a different feature.

But testing only url.startswith(scheme + ':?') is not enough, because http:?#z should clear query as well. And there are similar cases with other empty components. I am working on larger and more general PR.

Copy link
Member

@orsenthil orsenthil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The addition of url.startswith(scheme + ':?') doesn't look correct to me.

@bedevere-bot
Copy link

A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated.

Once you have made the requested changes, please leave a comment on this pull request containing the phrase I have made the requested changes; please review again. I will then notify any core developers who have left a review that you're ready for them to take another look at this pull request.

@github-actions github-actions bot removed the stale Stale PR or inactive for long period of time. label May 2, 2023
@gpshead gpshead changed the title bpo-32779: Fix urljoining with an empty query string. gh-76960: Fix urljoining with an empty query string. May 9, 2023
@@ -437,6 +437,12 @@ def test_urljoins(self):
# issue 23703: don't duplicate filename
self.checkJoin('a', 'b', 'b')

# issue 32779: clear the query string when joining with '?'
self.checkJoin('http://a/b/c?d=e', '?', 'http://a/b/c')
self.checkJoin('http://a/b/c?d=e', 'http:?', 'http://a/b/c')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is the test case that @orsenthil suggests is wrong as it doesn't match other languages which result in http:? as the output.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's correct. We need to remove this. Perhaps I will modify this and bring this PR to a close.

Copy link

github-actions bot commented Aug 9, 2024

This PR is stale because it has been open for 30 days with no activity.

@github-actions github-actions bot added the stale Stale PR or inactive for long period of time. label Aug 9, 2024
@github-actions github-actions bot removed the stale Stale PR or inactive for long period of time. label Aug 24, 2024
Copy link

This PR is stale because it has been open for 30 days with no activity.

@github-actions github-actions bot added the stale Stale PR or inactive for long period of time. label Sep 23, 2024
@serhiy-storchaka
Copy link
Member

Thank you for your PR @thetorpedodog. But this issue was fixed in more general way by #123273.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
awaiting review stale Stale PR or inactive for long period of time.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants