Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

arc
Copy link
Contributor

@arc arc commented Oct 30, 2019

This branch continues the work merged in 5015bd0, factoring code out of Perl_yylex() and its callees, in the hope of making the lexer easier to understand locally.

After these changes, the largest remaining piece of Perl_yylex() is just over 900 lines (down from originally >4100), and consists of a single switch statement, all of whose case groups are independent.

This branch also contains a note in perldelta that this major refactoring has taken place.

@arc arc requested review from iabyn and ilmari October 30, 2019 17:13
@jkeenan
Copy link
Contributor

jkeenan commented Oct 30, 2019

@arc, this would be a good branch for smoke-testing on non-Linux systems. But without smoke-me/ at the start of the branch name, that won't happen automatically. (Though I have manually started a smoke test run on FreeBSD.) Would you like to rename it?

Thank you very much.
Jim Keenan

@tonycoz
Copy link
Contributor

tonycoz commented Oct 30, 2019

Does this fix the recursion bug introduced in the last toke update? See #17220

@tonycoz
Copy link
Contributor

tonycoz commented Oct 30, 2019

I couldn't make this crash with the #17220 code, but assuming the recursion will be converted to iteration seems dangerous to me.

Reviewing the code it still looks like it's recursing.

@arc arc force-pushed the arc/smaller-toke-bis branch 3 times, most recently from 8214886 to e51b401 Compare November 1, 2019 15:49
@arc
Copy link
Contributor Author

arc commented Nov 1, 2019

@tonycoz the new version of this removes almost all the recursion, and I don't think it's possible to trigger unbounded recursion any longer (as only yyl_fake_eof() contains a recursive call, and I can't see how it could be called repeatedly within a single file). I believe this should fix #17220.

@arc
Copy link
Contributor Author

arc commented Nov 1, 2019

@jkeenan I'm not aware of a way to rename a branch without breaking any extant pull request pointing to it, so I've pushed an additional copy of this branch to smoke-me/arc/smaller-toke-bis.

@jkeenan
Copy link
Contributor

jkeenan commented Nov 1, 2019 via email

@xsawyerx
Copy link
Member

xsawyerx commented Nov 1, 2019

@arc, this is really impressive. Nice work.

@Leont
Copy link
Contributor

Leont commented Nov 1, 2019

After these changes, the largest remaining piece of Perl_yylex() is just over 900 lines (down from originally >4100), and consists of a single switch statement, all of whose case groups are independent.

This sounds excellent :-)

@tonycoz
Copy link
Contributor

tonycoz commented Nov 3, 2019

@tonycoz the new version of this removes almost all the recursion, and I don't think it's possible to trigger unbounded recursion any longer (as only yyl_fake_eof() contains a recursive call, and I can't see how it could be called repeatedly within a single file). I believe this should fix #17220.

It's still recursing deeply:

#0  yyl_try (s=0x555555de85ab ' ' <repeats 200 times>..., len=0) at toke.c:8535
#1  0x00005555556371df in yyl_try (
    s=0x555555de85ab ' ' <repeats 200 times>..., len=0) at toke.c:8657
#2  0x00005555556371df in yyl_try (
    s=0x555555de85aa ' ' <repeats 200 times>..., len=0) at toke.c:8657
#3  0x00005555556371df in yyl_try (
    s=0x555555de85a9 ' ' <repeats 200 times>..., len=0) at toke.c:8657
#4  0x00005555556371df in yyl_try (
    s=0x555555de85a8 ' ' <repeats 200 times>..., len=0) at toke.c:8657
#5  0x00005555556371df in yyl_try (
    s=0x555555de85a7 ' ' <repeats 200 times>..., len=0) at toke.c:8657
#6  0x00005555556371df in yyl_try (
    s=0x555555de85a6 ' ' <repeats 200 times>..., len=0) at toke.c:8657
#7  0x00005555556371df in yyl_try (
    s=0x555555de85a5 ' ' <repeats 200 times>..., len=0) at toke.c:8657
#8  0x00005555556371df in yyl_try (
    s=0x555555de85a4 ' ' <repeats 200 times>..., len=0) at toke.c:8657
#9  0x00005555556371df in yyl_try (
    s=0x555555de85a3 ' ' <repeats 200 times>..., len=0) at toke.c:8657
#10 0x00005555556371df in yyl_try (
    s=0x555555de85a2 ' ' <repeats 200 times>..., len=0) at toke.c:8657
#11 0x00005555556371df in yyl_try (
    s=0x555555de85a1 ' ' <repeats 200 times>..., len=0) at toke.c:8657
#12 0x00005555556371df in yyl_try (
    s=0x555555de85a0 ' ' <repeats 200 times>..., len=0) at toke.c:8657
#13 0x00005555556371df in yyl_try (
    s=0x555555de859f ' ' <repeats 200 times>..., len=0) at toke.c:8657
...
#74 0x00005555556371df in yyl_try (
    s=0x555555de8562 ' ' <repeats 200 times>..., len=0) at toke.c:8657
#75 0x00005555556371df in yyl_try (
    s=0x555555de8561 ' ' <repeats 200 times>..., len=0) at toke.c:8657
#76 0x000055555562be3a in yyl_fake_eof (fake_eof=0, bof=true, 
    s=0x555555de8560 ' ' <repeats 200 times>..., len=0) at toke.c:7104
#77 0x00005555556371b9 in yyl_try (s=0x555555de6343 "\367\377\177", len=0)
    at toke.c:8647
#78 0x000055555562a586 in yyl_eol (s=0x555555de6342 "\005\367\377\177", len=0)
    at toke.c:6839
#79 0x00005555556371fd in yyl_try (s=0x555555de6341 "\273\005\367\377\177", 
    len=0) at toke.c:8661
#80 0x0000555555639c9c in Perl_yylex () at toke.c:9295
#81 0x000055555564f874 in Perl_yyparse (gramtype=258) at perly.c:340
#82 0x00005555555d0ee5 in S_parse_body (env=0x0, 
    xsinit=0x5555559117fc <xs_init>) at perl.c:2527
#83 0x00005555555cf84f in perl_parse (my_perl=0x555555dbc010, 
    xsinit=0x5555559117fc <xs_init>, argc=2, argv=0x7fffffffe858, env=0x0)
    at perl.c:1818
#84 0x000055555591173f in main (argc=2, argv=0x7fffffffe858, 
    env=0x7fffffffe870) at miniperlmain.c:132

That's with "1\n" followed by many spaces followed by "\n" (and I didn't try to see how deep it went with my input)

@arc
Copy link
Contributor Author

arc commented Nov 3, 2019

Hi @tonycoz. Are you sure you're looking at the latest version (commit e51b401)? The symbol yyl_eol doesn't appear in my latest version, but it's at frame 78 in the backtrace you've posted. Also, I don't see a crash on the code you mention, even with 1e7 spaces between the newlines.

@tonycoz
Copy link
Contributor

tonycoz commented Nov 4, 2019

Sorry, I thought I had the updated PR checked out. I can't reproduce the recursion any more. Sorry about the noise.

@xsawyerx
Copy link
Member

xsawyerx commented Nov 4, 2019

Are we waiting for the last reviewer (@iabyn) before merging? I think we're good to go, but @arc, it's your call here.

arc added 12 commits November 4, 2019 10:32
This removes a goto label.
With the removal of another goto label!
This permits some additional pleasing simplifications.
I introduced these parameters as part of mechanically refactoring goto-heavy
logic into subroutines. However, they aren't actually needed through most of
the code. Even in the recursive case (in which yyl_try() or one of its
callees will call itself), we can reset the variables to zero.
This makes calls to it much easier to understand.
arc added 5 commits November 4, 2019 10:32
I thought I was going to end up using this for more stuff, but I've
found better approaches.

This commit also removes two more goto targets.
With this commit, yyl_try() has few enough arguments that the RETRY()
macro no longer serves any useful purpose; delete it too.
There's exactly one place where we need to consult it (and that only for
producing good error messages in a specific group of term-after-term
situations).

The reason for passing it around was so that it could be reset to false
early on in the process of lexing a token, while then allowing the three
separate cases that might need to set it true to do so independently.

Instead, centralise the logic of determining when it needs to be true.
@arc
Copy link
Contributor Author

arc commented Nov 4, 2019

I tagged Dave just because he's done some lexer work in the past, and expressed some interest at P5H. But I think this has had enough positive comments now, so I'm going to rebase and merge.

arc added 3 commits November 4, 2019 10:35
The downside of writing these calls recursively is that not all compilers
will compile the tail-position calls as jumps; that's especially true in
earlier versions of this refactoring process (where yyl_try() took a large
number of arguments), but it's not in general something we can expect to
happen — especially in the presence of `-O0` or similar compiler options.
This can lead to call-stack overflow in some circumstances.

Most recursive calls to yyl_try() occur within yyl_try() itself, so we can
easily replace them with an explicit `goto` (which is what most compilers
would use for the recursive calls anyway, now that yyl_try() takes ≤3
parameters).

There are only two other recursive-call cases. One is yyl_fake_eof(), which
as far as I can tell is never called repeatedly within a single file; this
seems safe.

The other is yyl_eol(). It has exactly two distinct return paths, so this
commit moves the retry logic into its yyl_try() caller.

With this change, we no longer seem to trigger call-stack overflow.

Closes #17220
@arc arc force-pushed the arc/smaller-toke-bis branch from e51b401 to 18828ce Compare November 4, 2019 10:59
@arc arc merged commit 18828ce into blead Nov 4, 2019
@arc arc deleted the arc/smaller-toke-bis branch November 4, 2019 11:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants