Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

tonycoz
Copy link
Contributor

@tonycoz tonycoz commented Nov 26, 2024

Fix mishandling of re-use of TARG in trim.

Fixes #22784

Note that this code could simply have done sv_setpvn(), but trim() goes it's own way in handling taint, inconsistent with the rest of perl, as implemented by sv_setpvn(), so we see this bug.

I did consider just replacing this code with sv_setpvn(), but I don't know if the difference from normal perl taint usage was intentional, I didn't see any mention of it in #19433, the PR that added trim().


  • This set of changes requires a perldelta entry, and it is included.

perldelta:

builtin::trim() didn't properly clear C<TARG> which could result in out of date cached numeric versions of the value
being used on a second evaluation.  Properly clear any cached values. [GH #22784]

@richardleach
Copy link
Contributor

richardleach commented Nov 26, 2024

Should builtin::trim() do some flavour of SV_CHECK_THINKFIRST(TARG)?

I'm wondering if the likes of this could happen:

  • Say you have something like: foreach my $i (@strs) { $i = trim $i }
  • Assume that in an iteration, the TARG's PV buffer finds itself in the sweet zone for COW, so sassign COWs TARG's buffer
  • On the next iteration, the shared buffer will get clobbered

@tonycoz
Copy link
Contributor Author

tonycoz commented Nov 27, 2024

Should builtin::trim() do some flavour of SV_CHECK_THINKFIRST(TARG)?

I think you're right, though I didn't come up with a test case off hand.

And of course sv_setpvn() would fix it.

@richardleach
Copy link
Contributor

Should builtin::trim() do some flavour of SV_CHECK_THINKFIRST(TARG)?

Hmm, I couldn't come up with a test case either. Quickly looking at Perl_sv_setsv_flags, maybe it will always either swipe the buffer or copy it, never COW? Even if so, may be worth guarding against that logic ever changing.

@leonerd
Copy link
Contributor

leonerd commented Nov 27, 2024

I did consider just replacing this code with sv_setpvn(), but I don't know if the difference from normal perl taint usage was intentional, I didn't see any mention of it in #19433, the PR that added trim().

I don't believe there's any reason for trim() to taint differently, no.

builtin.c Outdated
dest = TARG;
SV_CHECK_THINKFIRST(TARG);
SvUPGRADE(TARG, SVt_PV);
SvGROW(TARG, len + 1);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is grow required here? Fairly sure that trim can't make an SV bigger, only smaller, so surely it must already have enough storage for the new size.

Copy link
Contributor Author

@tonycoz tonycoz Nov 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TARG and source are not the same SV.

If this were a pp func and implemented TARGLEX, then we'd retain the branch I removed and do some special handling for that.

But it's XS, so TARG shouldn't be source (though source might be the TARG from another entersub calling trim, or from the same entersub in recursion, in either case it's not the TARG here.

I tried adding an assert(0) to the original "same SV" handling code and it never triggered.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ohright of course, this isn't an in-place trim, it's a substr copy.

builtin.c Outdated
SvPVX(dest)[len] = '\0';
SvPOK_on(dest);
SvCUR_set(dest, len);
Copy(start, SvPVX(TARG), len, U8);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is start always still valid after the THINKFIRST? Is it possible that could copy out a CoW string to a new buffer?

Oh but then I suppose start will still point at the original untouched buffer so it's probably all fine.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh but then I suppose start will still point at the original untouched buffer so it's probably all fine.

That's what I expect.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

About 4539fb8

Not saying the fix in blead/this ticket is wrong, but there might a better way to do it, or in general for XS "do it".

Note the old code removed in this commit used the VERY underused (core/cpan) SvOOK flag optimization. The new code does not have sv_chop() optimization to prevent double buffering and maybe a trip through re/malloc().

The 5.25 newish sv_set_undef() to get a no-leak SvOK_off() then sv_chop(), or SvCUR() lowering, which is changing 1 or 2 integers only, vs Move()/memcpy(), is the better implementation for white space trimming a string. Perl has the API to do it efficiently (string manipulation), SvOOK needs to be used more often by the community.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note the old code removed in this commit used the VERY underused (core/cpan) SvOOK flag optimization. The new code does not have sv_chop() optimization to prevent double buffering and maybe a trip through re/malloc().

The sv_chop() branch was removed because the input and output SVs are never the same SV.

The 5.25 newish sv_set_undef() ...

I have a clang-tidy check for that (though I believe loadable clang-tidy checks don't work on Windows).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you discreetly slip that magic potion into EU::ParseXS when nobody is looking??!?! 🥺

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I expect most of the API use comes from the typemap.

It can be run on the generated XS code, in pretty much the same way you use it on the perl sources, though it may point you at APIs that aren't covered by ppport.h and so won't work with older perls.

@tonycoz
Copy link
Contributor Author

tonycoz commented Nov 27, 2024

I expect to squash this before merging.

Copy link
Contributor

@leonerd leonerd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

Fixes a whole lot of issues, and simplifies much of the logic.

refactor a lot of custom "set SV to a string code" away to
sv_setpvn(), this:

 - fixed the original problem reported for Perl#22784, where TARG wasn't
   being reset properly and contained a cached numeric version of the
   result from the previous call.

 - removed some never executed code, since builtin::trim is only XS
   and is not an OP with the TARGLEX optimization

 - fixes a possible problem if the result of the first call to trim()
   is COWed.

This does slightly change the taint behaviour, rather than making TARG
tainted iff source is tainted, it changes to the behaviour of the rest
of perl, making TARG tainted if any tainted input is seen in the
current expression.

See thr PR Perl#22788 for some discussion on how we got here.

Fixes Perl#22784
@jkeenan jkeenan requested a review from Grinnz December 2, 2024 00:57
@tonycoz tonycoz merged commit 4539fb8 into Perl:blead Dec 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

builtin::trim return value numifies incorrectly in a loop
4 participants