-
-
Notifications
You must be signed in to change notification settings - Fork 32.1k
gh-118441: Limit posixpath.realpath(..., strict=True)
symlinks
#119172
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
gh-118441: Limit posixpath.realpath(..., strict=True)
symlinks
#119172
Conversation
@barneygale, I decided to exclude cached symlinks from the count as they don't matter. But |
Hah, we seem to be working on almost the same thing :-) I've opened another PR (#119178) that adds internal support for limiting the number of symlink traversals, but doesn't (yet) enable or expose it in |
I think they probably do matter, otherwise, when given a symlink that references a parent directory, |
On my Linux system,
Note that it doesn't say anything about the number of unique symbolic links followed, and I doubt Linux has something resembling our |
Fine by me.
Yeah, I'm very aware of that, that's the reason I created my bug report. But like I already outlined previously, we can't properly track the number of symlinks traversed. Take this symlink chain: I don't think there's much cost to looking up the already resolved symlinks compared to reading and resolving them.
Which tests hang from doing dictionary lookups? We have mostly the same cost from calling |
We can also re-use |
You're equating a "symlink traversal" with a
They hang from attempting to recursively walk an infinite-depth virtual filesystem, which is the denial-of-service issue mentioned in the Linux docs. |
Which doesn't help you, because you need to guard against repeatedly traversing the same symlink. |
I never said that. On macOS the symlink limit is set to 32. If I call
Sadly, the implementation is going to get a lot uglier if we want to make it an implementation detail as we need to store how many symlinks would be traversed without caching. Unless you want to disable caching in strict mode? Please don't. |
Ah I see, because each |
Exactly, so we can't use caching in strict mode (with |
a6f1869
to
4c60431
Compare
Sorry for the force push, I accidentally had my linter on during the merge. |
@eryksun, how do we figure out the limit set by the OS? I don't like a hardcoded constant. |
This comment was marked as resolved.
This comment was marked as resolved.
It might. See if you can find a simple-ish setup where |
This comment was marked as resolved.
This comment was marked as resolved.
On Linux, I think it's going to end up being a hard-coded constant equal to 40, which is what glibc returns from its internal #ifndef MIN_ELOOP_THRESHOLD
# define MIN_ELOOP_THRESHOLD 40
#endif
/* Return the maximum number of symlink traversals to permit
before diagnosing ELOOP. */
static inline unsigned int __attribute__ ((const))
__eloop_threshold (void)
{
#ifdef SYMLOOP_MAX
const int symloop_max = SYMLOOP_MAX;
#else
/* The function is marked 'const' even though we use memory and
call a function, because sysconf is required to return the
same value in every call and so it must always be safe to
call __eloop_threshold exactly once and reuse the value. */
static long int sysconf_symloop_max;
if (sysconf_symloop_max == 0)
sysconf_symloop_max = __sysconf (_SC_SYMLOOP_MAX);
const unsigned int symloop_max = (sysconf_symloop_max <= 0
? _POSIX_SYMLOOP_MAX
: sysconf_symloop_max);
#endif
return MAX (symloop_max, MIN_ELOOP_THRESHOLD);
} For the glibc build on Linux, Note that Python hasn't updated its support for |
This comment was marked as resolved.
This comment was marked as resolved.
Can this be reviewed now? |
…0-12-58.gh-issue-118441.QpzMKV.rst
cc @zooba |
PR seems reasonable to me, but I don't have any strong feelings. Can I leave it with you, @barneygale? Not a security issue, by the way. If an attacker is creating a massive symlink chain remotely on your computer, Python being slow is the least of your worries. |
Sorry it's taken me ages to look at this. I suppose this will slow down detection of direct symlink loops, right? If you have Could you remind me why this isn't used in non-strict mode too? (Sorry for forgetting). |
Yes, 1.64x for relative symlinks and 5.50x for absolute symlinks. See benchmark at the top.
Because it's non-strict. :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't "Fix error message for :func:`os.path.realpath`
on Unix." be a separate PR even if its a very small one? (and possibly without any NEWS entry?)
You could probably argue that |
@@ -0,0 +1,2 @@ | |||
Fix error message for :func:`os.path.realpath` on Unix. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we remove this from the news entry?
I'm -1 on this change. It's extremely unusual to traverse ~40 symlinks in a path, and on the occasions it does occur, it's almost always due to a loop that would be quickly detected by the current algorithm. If there's something I'm missing about usage patterns please let me know @nineteendo |
It's mostly to match the behaviour of realpath on POSIX. (and it's around 7% faster) But if that's not worth it, feel free to close this. |
@barneygale, I'm still waiting for you response. |
I don't care too much for matching coreutils in this particular instance |
Thanks. |
Benchmark
Show script
posixpath.realpath(..., strict=True)
doesn't limit symlinks #118441