Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Path.from_uri() doesn't work if the URI contains host component #123599

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
pekkaklarck opened this issue Sep 2, 2024 · 14 comments
Closed

Path.from_uri() doesn't work if the URI contains host component #123599

pekkaklarck opened this issue Sep 2, 2024 · 14 comments
Labels
3.13 bugs and security fixes 3.14 bugs and security fixes topic-pathlib type-bug An unexpected behavior, bug, or error

Comments

@pekkaklarck
Copy link

pekkaklarck commented Sep 2, 2024

Bug report

Bug description:

Path.from_uri() introduced in Python 3.13 doesn't work properly if the URI contains a host component other than localhost. Following examples are run with Python 3.13 rc 1 on Linux with a machine having host name kone:

>>> print(Path().from_uri('file:///home/peke/test'))
/home/peke/test
>>> print(Path().from_uri('file://localhost/home/peke/test'))
/home/peke/test
>>> print(Path().from_uri(f'file://{socket.getfqdn()}/home/peke/test'))
//kone/home/peke/test

According to RFC 8089 including the host component as a fully qualified name is fine so this looks like a bug to me.

CPython versions tested on:

3.13

Operating systems tested on:

Linux

Linked PRs

@pekkaklarck pekkaklarck added the type-bug An unexpected behavior, bug, or error label Sep 2, 2024
@pekkaklarck
Copy link
Author

Accepting a host component, other than localhost, raises a question about validity of the used host name. For example, browsers seem to totally ignore the host component and accept paths like file://whatever/home/peke/test in my case. I believe Python should be more strict, though, and raise a ValueError is the host component doesn't match the system where the code is run. Although the RFC mandates the host component to be fully qualified, I believe accepting only the host name should be fine too. If someone wants to parse file URIs with different host names, they can use urilib.parse.urlparse instead.

It might be that UNC Windows file paths even further. I tested this on Windows and there this usage makes sense:

>>> p = Path(r'\\host\path')
>>> print(p.as_uri())
file://host/path/
>>> p == Path.from_uri(p.as_uri())
True

Perhaps from_uri behavior should depend on the operating system.

@barneygale
Copy link
Contributor

All URIs with non-empty, non-localhost authorities parse as Path objects that start with a double slash, so it should be straightforward to reject these paths:

path = Path.from_uri('file://server/share/foo.txt')
if path.as_posix().startswith('//'):
    raise ValueError('Non-local file URI')

We could add a local_authorities argument so that users can override the ['', 'localhost'] defaults

@pekkaklarck
Copy link
Author

Explicitly rejecting paths would certainly be better than returning invalid paths. Including socket.getfqdn() and possibly also socket.gethostname() in the list of local authorities could be more convenient than requiring users to pass them explicitly, though. If the list is made configurable, it probably should contain localhost and the empty string by default.

@pekkaklarck
Copy link
Author

This probably would anyway require special handling on Windows. On POSIX something like Path.from_uri('file://host/path') should yield Path('/path'), assuming that host is detected to be a local authority, but on Windows throwing the host part away would break UNC paths.

@barneygale
Copy link
Contributor

We should try to be consistent across OSs unless we really must diverge IMO.

Explicitly rejecting paths would certainly be better than returning invalid paths.

Paths starting with two slashes are valid on both Windows and POSIX. On Windows they're UNC paths, whereas on POSIX they're implementation-defined (ref).

@pekkaklarck
Copy link
Author

It can be hard to be totally consistent across OSes. On Windows Path(r'\\host\path').as_uri() yields file://host/path/, and it makes sense that round-trip works and Path.from_uri('file://host/path/') yields Path(r'\\host\path'). In other words, the host component is preserved on Windows. On the other hand, when Path.from_uri('file://host/path/') is used on POSIX, the host component can be validated but there's, AFAIK, no way to preserve it and the return value can only be Path('/path').

There's already other functionality in pathlib that's operating system dependent and I don't see why from_uri couldn't be as well. Someone needing, for example, Windows semantics on POSIX could then explicitly use PureWindowsPath.

I should have used "incorrect" instead of "invalid" in my earlier comment. Although Path('//hello/world') is valid, it certainly isn't the correct return value Path.from_uri('file://hello/world') on POSIX.

@barneygale barneygale added 3.13 bugs and security fixes 3.14 bugs and security fixes labels Sep 3, 2024
barneygale added a commit to barneygale/cpython that referenced this issue Sep 3, 2024
…()` on POSIX

Raise `ValueError` in `pathlib.Path.from_uri()` if the given `file:` URI
specifies a non-empty, non-`localhost` authority, and we're running on a
platform without support for UNC paths.
barneygale added a commit to barneygale/cpython that referenced this issue Oct 20, 2024
@barneygale
Copy link
Contributor

Upon further consideration, I've adjusted my PR so that:

  • If the authority is empty or 'localhost', we suppress it.
  • Otherwise, if we're on Windows, we return a UNC path
  • Otherwise, if the authority resolves to a loopback address, we suppress it.
  • Otherwise we raise ValueError

This is basically the same logic as urllib.request.FileHandler.file_open(), and so in a future patch I'm planning to slim down the urllib code.

barneygale added a commit to barneygale/cpython that referenced this issue Nov 24, 2024
Call `urllib.request.url2pathname()` from `pathlib.Path.from_uri()` rather
than re-implementing it. This paves the way for solving the main issue
(ignoring local authorities and rejecting non-local ones) in urllib, not
pathlib.
barneygale added a commit to barneygale/cpython that referenced this issue Nov 29, 2024
Call `urllib.request.pathname2url()` from `pathlib.Path.as_uri()`, and
deprecate the duplicate implementation in `PurePath`.
barneygale added a commit to barneygale/cpython that referenced this issue Mar 18, 2025
barneygale added a commit to barneygale/cpython that referenced this issue Mar 18, 2025
barneygale added a commit to barneygale/cpython that referenced this issue Mar 18, 2025
barneygale added a commit to barneygale/cpython that referenced this issue Mar 19, 2025
barneygale added a commit that referenced this issue Mar 19, 2025
Call `urllib.request.url2pathname()` from `pathlib.Path.from_uri()` rather
than re-implementing it. This paves the way for solving the main issue
(ignoring local authorities and rejecting non-local ones) in urllib, not
pathlib.
barneygale added a commit that referenced this issue Mar 20, 2025
Call `urllib.request.pathname2url()` from `pathlib.Path.as_uri()`, and
deprecate the duplicate implementation in `PurePath`.

Co-authored-by: Adam Turner <[email protected]>
barneygale added a commit that referenced this issue Apr 10, 2025
…26844)

In `urllib.request.url2pathname()`, if the authority resolves to the
current host, discard it. If an authority is present but resolves somewhere
else, then on Windows we return a UNC path (as before), and on other
platforms we raise `URLError`.

Affects `pathlib.Path.from_uri()` in the same way.

Co-authored-by: Adam Turner <[email protected]>
Co-authored-by: Bénédikt Tran <[email protected]>
@barneygale
Copy link
Contributor

Fixed in 3.14

@serhiy-storchaka
Copy link
Member

A new test added in #126844 fails on FreeBSD:

https://buildbot.python.org/#/builders/1255/builds/457/steps/6/logs/stdio

======================================================================
ERROR: test_url2pathname_posix (test.test_urllib.Pathname_Tests.test_url2pathname_posix)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/buildbot/buildarea/pull_request.opsec-fbsd14/build/Lib/test/test_urllib.py", line 1562, in test_url2pathname_posix
    self.assertEqual(fn(f'//{socket.gethostname()}/foo/bar'), '/foo/bar')
                     ~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/buildbot/buildarea/pull_request.opsec-fbsd14/build/Lib/urllib/request.py", line 1658, in url2pathname
    raise URLError("file:// scheme is supported only on localhost")
urllib.error.URLError: <urlopen error file:// scheme is supported only on localhost>
----------------------------------------------------------------------

Can reproduce locally in a VM.

@serhiy-storchaka
Copy link
Member

serhiy-storchaka commented Apr 14, 2025

socket.gethostbyname(socket.gethostname()) raises gaierror(8, 'Name does not resolve').

>>> socket.gethostbyname(socket.gethostname())
Traceback (most recent call last):
  File "<python-input-1>", line 1, in <module>
    socket.gethostbyname(socket.gethostname())
    ~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^
socket.gaierror: [Errno 8] Name does not resolve

barneygale added a commit to barneygale/cpython that referenced this issue Apr 14, 2025
…in urllib

In `_is_local_authority()`, return early if the authority matches the
machine hostname from `socket.gethostname()`, rather than resolving the
names and matching IP addresses.
barneygale added a commit that referenced this issue Apr 15, 2025
…lib (#132523)

In `_is_local_authority()`, return early if the authority matches the
machine hostname from `socket.gethostname()`, rather than resolving the
names and matching IP addresses.
@barneygale
Copy link
Contributor

The above test failure was fixed in #132523

seehwan pushed a commit to seehwan/cpython that referenced this issue Apr 16, 2025
…thon#127237)

Call `urllib.request.url2pathname()` from `pathlib.Path.from_uri()` rather
than re-implementing it. This paves the way for solving the main issue
(ignoring local authorities and rejecting non-local ones) in urllib, not
pathlib.
seehwan pushed a commit to seehwan/cpython that referenced this issue Apr 16, 2025
…ython#127380)

Call `urllib.request.pathname2url()` from `pathlib.Path.as_uri()`, and
deprecate the duplicate implementation in `PurePath`.

Co-authored-by: Adam Turner <[email protected]>
seehwan pushed a commit to seehwan/cpython that referenced this issue Apr 16, 2025
…RL (python#126844)

In `urllib.request.url2pathname()`, if the authority resolves to the
current host, discard it. If an authority is present but resolves somewhere
else, then on Windows we return a UNC path (as before), and on other
platforms we raise `URLError`.

Affects `pathlib.Path.from_uri()` in the same way.

Co-authored-by: Adam Turner <[email protected]>
Co-authored-by: Bénédikt Tran <[email protected]>
@barneygale
Copy link
Contributor

Re-opening to discuss this point from Serhiy on #126844:

url2pathname() now performs network requests (and hang for a time).

Having given it more thought, I reckon we should call gethostbyname() only when a new resolve_netloc keyword-only argument is set to true (default false). Existing users of this function probably don't expect it to perform network access, and it could have larger consequences in environments where DNS resolution is borked.

Thoughts / 👍 / 👎?

@barneygale barneygale reopened this Apr 16, 2025
@serhiy-storchaka
Copy link
Member

You just read my mind. I was thinking about suggesting adding a keyword argument for resolving hostname, but then I thought that url2pathname() may be not much used in third-party code, it may be not worth a hassle.

But adding an option is safer. In future we can even change its default value, but the users will be able to request the old behavior.

barneygale added a commit to barneygale/cpython that referenced this issue Apr 16, 2025
…fault

Follow-up to 0879ebc.

Add *resolve_netloc* keyword-only argument to `url2pathname()`, defaulting
to false. When set to true, we call `socket.gethostbyname()` to resolve
the URL authority (netloc).
barneygale added a commit to barneygale/cpython that referenced this issue Apr 16, 2025
…fault

Follow-up to 0879ebc.

Add *resolve_netloc* keyword-only argument to `url2pathname()`, defaulting
to false. When set to true, we call `socket.gethostbyname()` to resolve
the URL authority (netloc).
@zooba
Copy link
Member

zooba commented Apr 16, 2025

Also FYI, I'm seeing idna encoding errors on one of my private test machines from the address = socket.gethostbyname(authority) line. I'm not entirely sure what path is being passed in to cause it, but I suspect we ought to swallow UnicodeEncodeError in _is_local_authority anyway?

barneygale added a commit to barneygale/cpython that referenced this issue May 5, 2025
barneygale added a commit that referenced this issue May 5, 2025
…132610)

Follow-up to 66cdb2b.

Add *resolve_host* keyword-only argument to `url2pathname()`, defaulting to
false. When set to true, we call `socket.gethostbyname()` to resolve the
URL hostname.

Co-authored-by: Bénédikt Tran <[email protected]>
Co-authored-by: Adam Turner <[email protected]>
Co-authored-by: Steve Dower <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3.13 bugs and security fixes 3.14 bugs and security fixes topic-pathlib type-bug An unexpected behavior, bug, or error
Projects
None yet
Development

No branches or pull requests

5 participants