Thanks to visit codestin.com
Credit goes to github.com

Skip to content

gh-118441: Limit posixpath.realpath(..., strict=True) symlinks #119172

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

nineteendo
Copy link
Contributor

@nineteendo nineteendo commented May 19, 2024

Benchmark

Show script
# limit-posixpath.realpath-symlinks.sh

echo 40 relative-symlinks
for i in {0..40};
    do ln -nsfw . tmp/${i};
done
main/python.exe                              -m timeit -s "import os; path = os.getcwd() + '/tmp/' + '/'.join(map(str, range(40)))" "try: os.path.realpath(path, strict=True)" "except: pass"
speedup-symlink-lookup/python.exe            -m timeit -s "import os; path = os.getcwd() + '/tmp/' + '/'.join(map(str, range(40)))" "try: os.path.realpath(path, strict=True)" "except: pass"
limit-posixpath.realpath-symlinks/python.exe -m timeit -s "import os; path = os.getcwd() + '/tmp/' + '/'.join(map(str, range(40)))" "try: os.path.realpath(path, strict=True)" "except: pass"

echo 40 symlinks
for i in {0..40};
    do ln -nsfw `realpath tmp` tmp/${i};
done
main/python.exe                              -m timeit -s "import os; path = os.getcwd() + '/tmp/' + '/'.join(map(str, range(40)))" "try: os.path.realpath(path, strict=True)" "except: pass"
speedup-symlink-lookup/python.exe            -m timeit -s "import os; path = os.getcwd() + '/tmp/' + '/'.join(map(str, range(40)))" "try: os.path.realpath(path, strict=True)" "except: pass"
limit-posixpath.realpath-symlinks/python.exe -m timeit -s "import os; path = os.getcwd() + '/tmp/' + '/'.join(map(str, range(40)))" "try: os.path.realpath(path, strict=True)" "except: pass"

echo relative-symlink
ln -nsfw . tmp/0
speedup-symlink-lookup/python.exe            -m timeit -s "import os; path = os.getcwd() + '/tmp' + '/0' * 40" "os.path.realpath(path, strict=True)"
main/python.exe                              -m timeit -s "import os; path = os.getcwd() + '/tmp' + '/0' * 40" "os.path.realpath(path, strict=True)"
limit-posixpath.realpath-symlinks/python.exe -m timeit -s "import os; path = os.getcwd() + '/tmp' + '/0' * 40" "os.path.realpath(path, strict=True)"

echo symlink
ln -nsfw `realpath tmp` tmp/0
speedup-symlink-lookup/python.exe            -m timeit -s "import os; path = os.getcwd() + '/tmp' + '/0' * 40" "os.path.realpath(path, strict=True)"
main/python.exe                              -m timeit -s "import os; path = os.getcwd() + '/tmp' + '/0' * 40" "os.path.realpath(path, strict=True)"
limit-posixpath.realpath-symlinks/python.exe -m timeit -s "import os; path = os.getcwd() + '/tmp' + '/0' * 40" "os.path.realpath(path, strict=True)"
40 relative-symlinks
500 loops, best of 5: 398 usec per loop # speedup-symlink-lookup
500 loops, best of 5: 401 usec per loop # main
1000 loops, best of 5: 376 usec per loop # limit-posixpath.realpath-symlinks
# -> 1.07x faster
40 symlinks
200 loops, best of 5: 1.37 msec per loop # speedup-symlink-lookup
200 loops, best of 5: 1.45 msec per loop # main
200 loops, best of 5: 1.35 msec per loop # limit-posixpath.realpath-symlinks
# -> 1.07x faster
relative-symlink
5000 loops, best of 5: 44.2 usec per loop # speedup-symlink-lookup
1000 loops, best of 5: 227 usec per loop # main
500 loops, best of 5: 409 usec per loop # limit-posixpath.realpath-symlinks
# -> 1.80x slower
symlink
2000 loops, best of 5: 69.4 usec per loop # speedup-symlink-lookup
1000 loops, best of 5: 260 usec per loop # main
200 loops, best of 5: 1.38 msec per loop # limit-posixpath.realpath-symlinks
# -> 5.31x slower

@nineteendo nineteendo marked this pull request as ready for review May 19, 2024 10:44
@nineteendo
Copy link
Contributor Author

nineteendo commented May 19, 2024

@barneygale, I decided to exclude cached symlinks from the count as they don't matter. But 40 -> 39 -> ... -> 1 -> . is an error.

@barneygale
Copy link
Contributor

Hah, we seem to be working on almost the same thing :-)

I've opened another PR (#119178) that adds internal support for limiting the number of symlink traversals, but doesn't (yet) enable or expose it in posixpath.realpath() as you've done here. Maybe we can land that first, and then worry about how to expose this to users in posixpath.realpath()?

@barneygale
Copy link
Contributor

@barneygale, I decided to exclude cached symlinks from the count as they don't matter. But 40 -> 39 -> ... -> 1 -> . is an error.

I think they probably do matter, otherwise, when given a symlink that references a parent directory, realpath("link/link/link/[...]/link") will succeed no matter how many link/ tokens are in the path, whereas I think you'll find that Linux/macOS/coreutils realpath will bail out with "Too many symbolic links in path". If you try making the same change in #119178, you'll find that some pathlib ABC tests hang from infinitely expanding such a symlink tree.

@barneygale
Copy link
Contributor

On my Linux system, man path_resolution says:

In order to protect the kernel against stack overflow, and also to protect against denial of service, there are limits on the maximum recursion depth, and on the maximum number of symbolic links followed. An ELOOP error is returned when the maximum is exceeded ("Too many levels of symbolic links").

Note that it doesn't say anything about the number of unique symbolic links followed, and I doubt Linux has something resembling our seen dict anyway - it's not strictly necessary if you limit the number of traversals.

@nineteendo
Copy link
Contributor Author

Maybe we can land that first, and then worry about how to expose this to users in posixpath.realpath()?

Fine by me.

I think they probably do matter, otherwise, when given a symlink that references a parent directory, realpath("link/link/link/[...]/link") will succeed no matter how many link/ tokens are in the path, whereas I think you'll find that Linux/macOS/coreutils realpath will bail out with "Too many symbolic links in path".

Yeah, I'm very aware of that, that's the reason I created my bug report. But like I already outlined previously, we can't properly track the number of symlinks traversed. Take this symlink chain: 10->9->...->1->. 10/10/10/10 contains 40 symlinks, but we encounter only 13 of which we read 10.

I don't think there's much cost to looking up the already resolved symlinks compared to reading and resolving them.
Therefore, I think it would be silly to count cached symlinks.

If you try making the same change in #119178, you'll find that some pathlib ABC tests hang from infinitely expanding such a symlink tree.

Which tests hang from doing dictionary lookups? We have mostly the same cost from calling lstat() on regular files.

@nineteendo
Copy link
Contributor Author

We can also re-use seen to count the number of unique symlinks. :)

@barneygale
Copy link
Contributor

barneygale commented May 19, 2024

But like I already outlined previously, we can't properly track the number of symlinks traversed.

You're equating a "symlink traversal" with a readlink() call, but that's not correct. A lookup in seen is also a symlink traversal. A seen dict is not necessary to implement realpath() - it's an optimization that should not be user-visible.

Which tests hang from doing dictionary lookups?

test.test_pathlib.test_pathlib_abc.DummyPathWithSymlinksTest.test_glob_recurse_symlinks_common for example.

They hang from attempting to recursively walk an infinite-depth virtual filesystem, which is the denial-of-service issue mentioned in the Linux docs.

@barneygale
Copy link
Contributor

We can also re-use seen to count the number of unique symlinks. :)

Which doesn't help you, because you need to guard against repeatedly traversing the same symlink.

@nineteendo
Copy link
Contributor Author

You're equating a "symlink traversal" with a readlink() call, but that's not correct.

I never said that. On macOS the symlink limit is set to 32. If I call realpath 10/10/10/10 I get an error, but your implementation won't raise an error as it only encounters 13 symlinks.

A lookup in seen is also a symlink traversal. A seen dict is not necessary to implement realpath() - it's an optimization that should not be user-visible.

Sadly, the implementation is going to get a lot uglier if we want to make it an implementation detail as we need to store how many symlinks would be traversed without caching. Unless you want to disable caching in strict mode? Please don't.

@barneygale
Copy link
Contributor

barneygale commented May 19, 2024

I never said that. On macOS the symlink limit is set to 32. If I call realpath 10/10/10/10 I get an error, but your implementation won't raise an error as it only encounters 13 symlinks.

Ah I see, because each seen lookup can save multiple readlink()s if the symlink target itself contains symlinks?

@nineteendo
Copy link
Contributor Author

nineteendo commented May 19, 2024

Exactly, so we can't use caching in strict mode (with maxlinks).

@nineteendo nineteendo marked this pull request as draft May 19, 2024 18:02
@nineteendo nineteendo force-pushed the limit-posixpath.realpath-symlinks branch from a6f1869 to 4c60431 Compare June 5, 2024 18:14
@nineteendo
Copy link
Contributor Author

nineteendo commented Jun 5, 2024

Sorry for the force push, I accidentally had my linter on during the merge.

@nineteendo nineteendo marked this pull request as ready for review June 5, 2024 18:58
@nineteendo
Copy link
Contributor Author

@eryksun, how do we figure out the limit set by the OS? I don't like a hardcoded constant.

@nineteendo

This comment was marked as resolved.

@barneygale
Copy link
Contributor

Does this count as a security issue?

It might. See if you can find a simple-ish setup where os.path.realpath() takes a long time (~seconds) to return, whereas the OS errors out quickly?

@nineteendo

This comment was marked as resolved.

@eryksun
Copy link
Contributor

eryksun commented Jun 6, 2024

how do we figure out the limit set by the OS? I don't like a hardcoded constant.

On Linux, I think it's going to end up being a hard-coded constant equal to 40, which is what glibc returns from its internal __eloop_threshold() function.

#ifndef MIN_ELOOP_THRESHOLD
# define MIN_ELOOP_THRESHOLD    40
#endif

/* Return the maximum number of symlink traversals to permit
   before diagnosing ELOOP.  */
static inline unsigned int __attribute__ ((const))
__eloop_threshold (void)
{
#ifdef SYMLOOP_MAX
  const int symloop_max = SYMLOOP_MAX;
#else
  /* The function is marked 'const' even though we use memory and
     call a function, because sysconf is required to return the
     same value in every call and so it must always be safe to
     call __eloop_threshold exactly once and reuse the value.  */
  static long int sysconf_symloop_max;
  if (sysconf_symloop_max == 0)
    sysconf_symloop_max = __sysconf (_SC_SYMLOOP_MAX);
  const unsigned int symloop_max = (sysconf_symloop_max <= 0
                                    ? _POSIX_SYMLOOP_MAX
                                    : sysconf_symloop_max);
#endif

  return MAX (symloop_max, MIN_ELOOP_THRESHOLD);
}

For the glibc build on Linux, SYMLOOP_MAX isn't defined, and the configured value sysconf(_SC_SYMLOOP_MAX) is -1. So glibc sets symloop_max to the constant _POSIX_SYMLOOP_MAX (8), and __eloop_threshold() returns the constant MIN_ELOOP_THRESHOLD (40).

Note that Python hasn't updated its support for os.sysconf_names in a long while. It should be updated to define "SC_SYMLOOP_MAX" (the value is 173 on Linux), along with a few other new values specified for sysconf() in POSIX issues 6 and 7.

@nineteendo

This comment was marked as resolved.

@nineteendo
Copy link
Contributor Author

Can this be reviewed now?

@nineteendo
Copy link
Contributor Author

cc @zooba

@zooba
Copy link
Member

zooba commented Jun 21, 2024

PR seems reasonable to me, but I don't have any strong feelings. Can I leave it with you, @barneygale?

Not a security issue, by the way. If an attacker is creating a massive symlink chain remotely on your computer, Python being slow is the least of your worries.

@barneygale
Copy link
Contributor

Sorry it's taken me ages to look at this.

I suppose this will slow down detection of direct symlink loops, right? If you have link --> link and try to resolve link, the old algorithm would quickly spot the self-referential symlink, whereas the new one needs to do 40 iterations before it realises?

Could you remind me why this isn't used in non-strict mode too? (Sorry for forgetting).

@nineteendo
Copy link
Contributor Author

I suppose this will slow down detection of direct symlink loops, right?

Yes, 1.64x for relative symlinks and 5.50x for absolute symlinks. See benchmark at the top.

Could you remind me why this isn't used in non-strict mode too?

Because it's non-strict. :)

Copy link
Member

@picnixz picnixz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't "Fix error message for :func:`os.path.realpath` on Unix." be a separate PR even if its a very small one? (and possibly without any NEWS entry?)

@nineteendo
Copy link
Contributor Author

Shouldn't "Fix error message for :func:os.path.realpath on Unix." be a separate PR even if its a very small one?

You could probably argue that loop is more precise than e.g. ./link. But if we're counting symlinks, we need to use the original path.

@@ -0,0 +1,2 @@
Fix error message for :func:`os.path.realpath` on Unix.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we remove this from the news entry?

@barneygale
Copy link
Contributor

I'm -1 on this change. It's extremely unusual to traverse ~40 symlinks in a path, and on the occasions it does occur, it's almost always due to a loop that would be quickly detected by the current algorithm.

If there's something I'm missing about usage patterns please let me know @nineteendo

@nineteendo
Copy link
Contributor Author

It's mostly to match the behaviour of realpath on POSIX. (and it's around 7% faster) But if that's not worth it, feel free to close this.

@nineteendo
Copy link
Contributor Author

@barneygale, I'm still waiting for you response.

@barneygale
Copy link
Contributor

I don't care too much for matching coreutils in this particular instance

@nineteendo nineteendo closed this Oct 8, 2024
@nineteendo nineteendo deleted the limit-posixpath.realpath-symlinks branch October 8, 2024 21:13
@nineteendo
Copy link
Contributor Author

Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants