David Mitchell [Wed, 12 Nov 2025 17:02:16 +0000 (17:02 +0000)]
don't warn on 'undef $^W'
Ironically, this warns:
$ perl -e'use warnings; undef $^W'
Use of uninitialized value in undef operator at -e line 1.
$
The magic-setting code was treating the new value of $^W as an integer.
This commit makes it treat the value as a boolean.
I suppose this commit could in theory break code if that code is
doing something like:
$^W = "0 but true";
In the past that would have disabled warnings, but will now enable them.
But it seems unlikely that anyone would have written such code. The
variable is documented in perlvar as having a value which is interpreted
as a boolean.
Note that this commit stops a test in t/op/reset.t from expecting a
warning when resetting $^W. This test was added by issue GH #20763, and
AFAIKT that ticket was concerned with 'reset $^W' not actually
resetting the variable; the test for the warning was purely a
side-effect of the fact that it happened to warn.
While the (typically compile-time) -Dx detailed optree dumping didn't.
Here are some 'before this commit' and 'after' examples of how some of
those ops were/are now displayed. Note the changes in the 'TARG = N'
lines for the following code:
David Mitchell [Wed, 12 Nov 2025 13:37:55 +0000 (13:37 +0000)]
dump.c: add S_get_cv_from_op() function
The existing S_get_sv_from_pad() function does two things: it finds the
CV associated with the specified op, and then extracts something from
that CV's pad.
This commit splits the 'find the associated CV' part of the code out
into a separate function, which will be used elsewhere shortly.
David Mitchell [Wed, 12 Nov 2025 13:11:40 +0000 (13:11 +0000)]
dump.c: add S_deb_padvar_cv() function
Make S_deb_padvar() a thin wrapper over a new S_deb_padvar_cv() function,
which does the same thing (display a pad variable name), but does it for
a specified CV rather than only for the currently executing CV. This
function will be used shorty.
Karl Williamson [Thu, 6 Nov 2025 02:39:55 +0000 (19:39 -0700)]
grok_bin_oct_hex: fix broken return flags
Apparently no one has tried to use these before. These flags are to
suppress the display of certain warnings, but to instead return that the
suppression happened in output flags. The output flags were not getting
set.
I'm not adding a separate test, because a future commit will cause this
feature to be used regularly.
Karl Williamson [Wed, 5 Nov 2025 13:34:19 +0000 (06:34 -0700)]
grok_bin_oct_hex: Shortcut leading zeros
I noticed that leading zeros are quite common for octal and hex
constants. This code is structured for speed, with a partially unrolled
loop structured so that it is impossible to overflow the unrolled part.
If we get to the end of the unrolled portion, and the accumulated value
is still zero, it's because there have been only leading zeroes so far,
and instead of dropping into the loop, we can re-enter the unrolled part
without having to consider the possibility of overflowing. This allows
the next chunk of digits to be processed without branching.
David Mitchell [Tue, 11 Nov 2025 11:57:14 +0000 (11:57 +0000)]
Perl_doref(): eliminate duplicated code
This compile-time function propagates lvalue ref context down a chain of
ops. It does the same thing (setting OPf_MOD and OPpDEREF_XV flags) in
three places. Consolidate this code into a single place.
Should be no functional changes.
Technically the code is slightly different in that OP_[AH]ELEM now
checks for kids before following them, but since they always have kids,
this makes no difference (except being infinitesimally slower during
compilation).
David Mitchell [Tue, 11 Nov 2025 11:06:50 +0000 (11:06 +0000)]
propagate correct ref context to both ?: branches
GH #18669
In something like
@{ expr } = ...
the expression is expected to return an array ref. If the expression
is something like $h{foo}, then the helem op needs to know both that:
- it is in lvalue context, so should autovivify the foo element if not
present;
- it is in array ref context, so it should autovivify the value to an
empty array ref, rather than just to undef.
The function Perl_doref() is used to propagate this ref context at
compile time, e.g. by setting the OPf_MOD and OPpDEREF_AV flags on the
OP_HELEM op.
My commit v5.31.1-87-ge9b0092a10 made this function non-recursive
(so that deep expressions wouldn't SEGV during compilation), but
introduced a bug when the expression included the ternary condition
operator, '?:'.
In particular, since '?:' is the only OP where doref() needs to recurse
down *two* branches, I made the function just iterate down the tree, and
then have special handling for OP_COND_EXPR. This involved, once having
finished iterating down the tree, to work back up the tree looking for
OP_COND_EXPR nodes, and if found, iterate back down the second branch.
This had a fatal flaw: a 'type' variable indicated what context to
apply. For example in @{$h{expr}} = ..., type would start off as
OP_RV2AV, but as the tree was walked, would change to OP_HELEM and then
to OP_RV2HV. When walking back up the tree, this value wasn't being restored.
The specific bug in the ticket boiled down to something like
@{ $cond ? $h{p} : $h{q} } = ...;
where the correct OPpDEREF_AV flag was being set on the first helem op,
but an incorrect OPpDEREF_HV on the second.
Since I can't think of anything better, the fix in this commit restores
some limited recursion to doref(). Namely, for an OP_COND_EXPR op, it
now recurses down that op's first branch, then after it returns,
iterates as normal down the second branch.
Karl Williamson [Tue, 4 Nov 2025 17:21:33 +0000 (10:21 -0700)]
grok_bin_oct_hex: Use upper bound, not length remaining
Creating an upper limit to parse allows us to write while (s < e) for
example, and that limit is constant, requiring fewer operations than the
other way, where the remaining length keeps getting changed.
It also allows this commit to move an 's++' a couple of lines to get rid
of comparing against the number '8' which could get out of sync.
Perl's -Dpv switch produces debugging output that also displays the top
few items on the parse stack. The token names are truncated for
compactness' sake. This currently leads to a display where its mostly
just the token name's prefix that is displayed, e.g.
Avoid redefining SvREADONLY_on in gv.c as that causes confusion
When reading the previous code, it was confusing to see calls to
`SvREADONLY_on`, as without being aware the macro was redefined just
above this function, the reader is unlikely to be aware that here it
doesn't set the SVf_PROTECT flag.
We should instead define a custom-purpose macro here and use that, to
make its operation much clearer.
Samuel Young [Tue, 4 Nov 2025 15:38:15 +0000 (09:38 -0600)]
do not use fresh_perl for #18669 TODO tests
This commit updates the #18669 TODO tests to use evals instead of
fresh_perl() calls, as fresh_perl() is not actually needed because
the behavior being tested does not cause the perl interpreter to
crash.
Karl Williamson [Sun, 2 Nov 2025 18:12:04 +0000 (11:12 -0700)]
win32/vmem.h: White-space only
Make this macro more legible, while silencing some compiler warnings
about the if statements looking like they are intended to apply to more
than they actually do.
Karl Williamson [Tue, 3 Jun 2025 11:18:07 +0000 (05:18 -0600)]
run/locale.t: Skip setting to illegal locale on z/OS
This test gets variously an illegal instruction or segfault from within
setlocale(). The default shell and bash on this system forbid the
changing of LC_ALL to an illegal value, so the setlocale(3) command is
insulated from getting this kind of input in actual operation.
This test was written in such a way as to get around such shell
restrictions, and it turns out that the z/OS setlocale can't cope. I
haven't written a ticket to IBM because this really can't happen without
a sneaky test that involves perl or something else very unlikely to
happen in real life.
There is no good single-stepping debugger available I was told by an IBM
person, so I tried narrowing the cause down by adding debug statements.
I had to flush the buffer each time, and the results were more like
going down a rabbit hole.
It turns out that when a test on the box I tested this with crashes, a
human readable dump is output. It showed that the failure is within
an internal function 'locale_init()' called from setlocale(3). That is
a pretty strong indication that setlocale(3) isn't validating its input,
and it isn't perl's fault.
This is a typical dump of the problem. (Note, the perl is compiled
without threads.) S_posix_setlocale_with_complications() is the lowest
perl function called.
CEE3DMP V3 R1.0: Condition processing resulted in the unhandled condition. Mon Jun 2 19:26:40 2025 Page: 1
ASID: 004B PID: 67175275 Parent PID: 33619980 User name: DEVUSER
Harald Jörg [Sun, 2 Nov 2025 17:32:27 +0000 (18:32 +0100)]
[PATCH] Docu suggestion for named parameters in signatures
Add a paragraph that callers need to specify all positional parameters
before named ones, even if the positional parameters have defaults.
Delete a sentence stating that there is no way for the caller to provide
a value for a named parameter after an optional positional parameter.
There _is_ a way, you just need to provide values for positional
parameters.
Karl Williamson [Thu, 23 Oct 2025 13:41:33 +0000 (07:41 -0600)]
Remove some special EBCDIC code
The 'variant_byte_number' function was written to find the byte number
in a word of the first byte whose meaning varies depending on if the
string it is part of is encoded in UTF-8 or not. On ASCII machines,
that is simply when the upper bit is set. On EBCDIC machines, there is
no similar pattern, so this function hasn't been compiled on those.
A long time ago, I realized that this function could also handle binary
data by coercing that binary data into having the form of having that
bit set or not depending on the pattern being looked for, and then
calling that function.
But I actually hadn't realized until now that it was binary data not
tied to a character set that was being worked on. This commit rectifies
that. A new alias is added for that function that emphasizes that it
works on binary data, the function is now compiled for EBCDIC, and the
EBCDIC-only code that avoided using it is now removed.
There are several places in the perl core that, for performance, use
word-at-a-time operations on byte data when the data to be processed is
long enough to overcome the extra setup overhead required.
The code that does this is not immediately obvious, and is currently
repeated at each such place.
This macro creates two macros that encapsulate this logic, making each
place that uses them easier to read.
One macro is for data that isn't dependent on the character set. The
other is for character data. EBCDIC data is not suitable for per-word
operation, so the this macro always returns false on an EBCDIC platform.
This allows for the removal of some EBCDIC #ifdefs in our code base.
Karl Williamson [Tue, 21 Oct 2025 13:16:59 +0000 (07:16 -0600)]
Create BYTES_REMAINING_IN_WORD()
This macro encapsulates the task of finding how far until the next word
boundary the passed-in address is.
There are several places that could use this, but instead of converting
use this in those places, the next commit will create macros that depend
on this one and those places will instead convert to use those other new
macros.
Karl Williamson [Fri, 24 Oct 2025 10:59:05 +0000 (04:59 -0600)]
mg.c: Add asserts
These two switch() statements handle magic names. We now have a quick
way to determine if the first character of a name is magic. Assert that
the cases of the switch match. This will tell us if something gets
out-of-sync.
Karl Williamson [Thu, 23 Oct 2025 22:05:06 +0000 (16:05 -0600)]
gv_magicalize: Refactor
This refactors to eliminate redundant code. Some things are magical
only if we are using the main stash; others in any stash; one only in
PL_debstash.
Previously, the switches were structured thusly:
1) if we aren't using the main stash, handle things not requiring the
main stash
2) if we are using the main stash, handle all len > 1 things that can
be in the main stash. This duplicates much of item 1)
3) if we are using the main stash, handle all len == 1 things that can
be in the main stash. This duplicates some of item 1)
The new structure is
if (len > 1) {
1) handle len > 1 things not requiring main stash, regardless of the
stash we are in
2) handle len > 1 things requiring main stash
} else {
3) handle len == 1 things, regardless of the stash we are in.
}
This removes the duplicated code.
The case for 'a' and 'b' are special. When 'a' stands for "args" it is
len > 1 and that is handled in 1).
But 'a' can also mean a single character, as 'b' always does. These
cases are handled in 3). These are the only two len == 1 characters
that don't have to be in the main package, so there is an extra
conditional clause to allow that.
Karl Williamson [Thu, 23 Oct 2025 21:43:18 +0000 (15:43 -0600)]
gv_magicalize: Quickly rule non-magical input out
This uses the data structure introduced in the previous commit to
quickly test the input first character. If it isn't a potential magical
one, it could apply to CORE, so move the block that checks for that to
here, eliminating a conditional. In either case, no need to look further.
Karl Williamson [Thu, 23 Oct 2025 17:49:18 +0000 (11:49 -0600)]
Add "magical" chars to l1_char_class_tab.h
Some characters have special meaning to gv_magicalize(). This commit
marks those in PL_charclass. This allows the next commit to more
quickly than currently rule them out during processing.
If you trace the execution of what happens to 0 length input, it relies
on a NUL terminator in the string, and does nothing. Simply return
false immediately instead.
Karl Williamson [Sat, 25 Oct 2025 23:33:05 +0000 (17:33 -0600)]
regcomp.c: Need to account for UTF group name
I found this by reading the code. Prior to this commit, the parse
pointer was advanced by one byte; it should be advanced by one
character. As long as the the character was ASCII, things worked.
I looked through the regcomp.c source for other mis-use of the macro
changed by this commit; none were obvious.
Karl Williamson [Sat, 25 Oct 2025 22:49:13 +0000 (16:49 -0600)]
reg_mesg.t: Only one error per test
This just fills out a couple of tests so that they don't prematurely
end. That makes it clear that the eorror that does get shown isn't also
due to other mistakes in the test.
Karl Williamson [Sun, 19 Oct 2025 22:21:00 +0000 (16:21 -0600)]
S_scan_ident: Add a check-only option
There is a bug here in which this function is called from S_intuit_more
just to see if there is an identifier in the string it is looking at.
But that call can have "subtle implications on parsing" (according to
the long-standing comments in it). We need a way to call scan_ident
without side-effects.
This commit adds that capability. The next will use it.