Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Wrong formatting of url in urlunsplit() function when used with _replace function to change scheme #99901

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
MKCompu opened this issue Nov 30, 2022 · 6 comments
Labels
stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error

Comments

@MKCompu
Copy link

MKCompu commented Nov 30, 2022

Bug report

from urllib.parse import urlparse
malformed = urlparse("8.8.8.8:1337")._replace(scheme='http').geturl()
print(malformed)

This prints ''http:///8.8.8.8:1337", but should have printed "http://8.8.8.8:1337".
Note the three slashes.
The reason is that the urlunsplit function in Lib/urllib/parse.py checks whether the original url contains a slash, which is not the case, adds a slash and then adds two more slashes.

Your environment

  • CPython versions tested on: 3.10.5

Linked PRs

@MKCompu MKCompu added the type-bug An unexpected behavior, bug, or error label Nov 30, 2022
@WolframAlph
Copy link
Contributor

WolframAlph commented Nov 30, 2022

urlparse("8.8.8.8:1337", scheme='http').geturl()

will also produce http:///8.8.8.8:1337
Testing with latest main.
Can I work on patch?

@serhiy-storchaka
Copy link
Member

I think that the issue is not in urlunparse(), but in urlparse().

>>> urllib.parse.urlparse('8.8.8.8:1337', scheme='http')
ParseResult(scheme='http', netloc='', path='8.8.8.8:1337', params='', query='', fragment='')

'8.8.8.8:1337' is expected to be a netlock, not path.

Or maybe it is incorrectly used.

@WolframAlph
Copy link
Contributor

Actually, it is explicitly mentioned in the docs.

Following the syntax specifications in RFC 1808, urlparse recognizes a netloc only if it is properly introduced by ‘//’. Otherwise the input is presumed to be a relative URL and thus to start with a path component.

And indeed it works if you prepend //

>>> urlparse('//8.8.8.8:1337', scheme='http')
ParseResult(scheme='http', netloc='8.8.8.8:1337', path='', params='', query='', fragment='')

So the question is if we want to handle cases when netloc is empty when urlunparse is used to recostruct URI.

@serhiy-storchaka
Copy link
Member

The question is how to handle URLs with empty netloc and relative path.

  1. scheme:path -- omit netloc.
  2. scheme:///path -- make the path absolute.
  3. Error.

Option 1 is used for schemes that do not use netloc, option 2 is currently used for schemes that use netloc (e.g. http). Error is never produced. In any case scheme://path is not correct result.

So I think that this issue should be closed as "not a bug". The question about empty netloc and relative path is discussed in #85110.

@serhiy-storchaka serhiy-storchaka closed this as not planned Won't fix, can't repro, duplicate, stale Dec 29, 2023
@VrindavanSanap
Copy link

When is this going to get fixed?

@serhiy-storchaka
Copy link
Member

See #85110.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error
Projects
None yet
Development

No branches or pull requests

5 participants