[Relandx2] Rewrite the guts of torch::jit::Lexer to speed it up #152372

swolchok · 2025-04-28T22:12:02Z

Stack from ghstack (oldest at bottom):

-> [Relandx2] Rewrite the guts of torch::jit::Lexer to speed it up #152372

Reapplying with fix for linux-manylinux-2_28-py3-cpu-s390x / build
failure
(https://github.com/pytorch/pytorch/actions/runs/14716285820/job/41300304223#logs),
which is to just update a pair of static_assert constants I got wrong.

cc @EikanWang @jgong5 @wenzhe-nrv @sanchitintel

[ghstack-poisoned]

pytorch-bot · 2025-04-28T22:12:05Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/152372

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

❌ 2 Cancelled Jobs, 1 Unrelated Failure

As of commit be38a15 with merge base e7c19f4 ():

CANCELLED JOBS - The following jobs were cancelled. Please retry:

FLAKY - The following job failed but was likely due to flakiness present on trunk:

pull / linux-focal-cuda11.8-py3.10-gcc9 / test (distributed, 2, 3, ephemeral.linux.g4dn.12xlarge.nvidia.gpu) (gh) (similar failure)
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! KeyboardInterrupt !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

This comment was automatically generated by Dr. CI and updates every 15 minutes.

…ed it up (#151850)" (#152250)"" Reapplying with fix for linux-manylinux-2_28-py3-cpu-s390x / build failure (https://github.com/pytorch/pytorch/actions/runs/14716285820/job/41300304223#logs), which is to just update a pair of static_assert constants I got wrong. ghstack-source-id: 4dd3e47 Pull-Request-resolved: #152372

swolchok · 2025-04-28T22:13:32Z

torch/csrc/jit/frontend/lexer.h

+    static_assert(stringToUint64("and") == 0x616e640000000000);
+    static_assert(stringToUint64("Ellipsis") == 0x456c6c6970736973);


original PR had these constants wrong. (I forgot the trailing zeros on the first one, and somehow had 8083 instead of 7073 in the second) Since no big-endian builds run by default, we didn't notice until land.

(this time we have ciflow/s390 to make sure we notice if the failure continues)

swolchok · 2025-04-28T22:25:16Z

@pytorchbot merge

pytorchmergebot · 2025-04-28T22:27:41Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

malfet

LGTM, though let me edit the title (3x revert/reland are hard to read)

malfet · 2025-04-28T23:53:42Z

@pytorchbot merge -f "Builds are green now"

pytorchmergebot · 2025-04-28T23:53:59Z

The merge job was canceled or timed out. This most often happen if two merge requests were issued for the same PR, or if merge job was waiting for more than 6 hours for tests to finish. In later case, please do not hesitate to reissue the merge command
For more information see pytorch-bot wiki.

pytorchmergebot · 2025-04-28T23:55:34Z

Merge started

Your change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use -f as last resort and instead consider -i/--ignore-current to continue the merge ignoring current failures. This will allow currently pending tests to finish and report signal before the merge.

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

malfet · 2025-04-29T04:36:00Z

@pytorchbot revert -m "Looks like it broke distributed this time around, see https://hud.pytorch.org/hud/pytorch/pytorch/f05d3e5019a8d0ee460bb96d490764977cc2d011/1?per_page=50&name_filter=distributed&mergeEphemeralLF=true" -c nosignal

malfet · 2025-04-29T04:36:25Z

Looks like another case when TD skipped the test during testing (adding ci-no-td should help)

pytorchmergebot · 2025-04-29T04:37:30Z

@pytorchbot successfully started a revert job. Check the current status here.
Questions? Feedback? Please reach out to the PyTorch DevX Team

…up (#152372)" This reverts commit 7ce6f63. Reverted #152372 on behalf of https://github.com/malfet due to Looks like it broke distributed this time around, see https://hud.pytorch.org/hud/pytorch/pytorch/f05d3e5019a8d0ee460bb96d490764977cc2d011/1?per_page=50&name_filter=distributed&mergeEphemeralLF=true ([comment](#152372 (comment)))

pytorchmergebot · 2025-04-29T04:37:43Z

@swolchok your PR has been successfully reverted.

) Reapplying with fix for linux-manylinux-2_28-py3-cpu-s390x / build failure (https://github.com/pytorch/pytorch/actions/runs/14716285820/job/41300304223#logs), which is to just update a pair of static_assert constants I got wrong. Pull Request resolved: #152372 Approved by: https://github.com/wdvr, https://github.com/malfet

swolchok · 2025-04-29T16:04:39Z

@pytorchbot revert -m "Looks like it broke distributed this time around, see https://hud.pytorch.org/hud/pytorch/pytorch/f05d3e5019a8d0ee460bb96d490764977cc2d011/1?per_page=50&name_filter=distributed&mergeEphemeralLF=true" -c nosignal

stack traces in distributed failure don't appear to have symbols (e.g., https://github.com/pytorch/pytorch/actions/runs/14720233524/job/41313777004#step:22:2283 ) . However, this PR is clearly associated with distributed failures both times it has landed. Best course of action is probably to make sure the new path has thorough tests, run those tests under ASAN/UBSAN, and see what comes up.

swolchok · 2025-04-29T17:37:52Z

ran the existing tests under sanitizers. no failure. absent any information about how to debug distributed tests, I might have to just give up on this PR.

if we do dust it off, I noticed the pos.has_next() checking in the DFA is unnecessary because there's an earlier exit in that function if !pos.has_next().

cyyever · 2025-04-30T00:18:41Z

@swolchok There is an easier way to detect lexer bugs. Can you use libfuzzer to generate a random code string and pass it to lexer. Normally it takes one or two minutes to detect typical memory bugs. There are many online examples that you can reference. Even better, integrate fuzzing tests into our current c++ tests.

The fact that current c++ tests pass doesn't mean that the test/branch coverage is near 100%...

Skylion007 · 2025-04-30T16:01:27Z

Looks like the tests were never run: #152440

swolchok · 2025-04-30T16:07:30Z

never run

I did run them locally via buck, but yeah they don't seem to have run in CI.

swolchok · 2025-04-30T18:42:02Z

libfuzzer to generate a random code string and pass it to lexer. Normally it takes one or two minutes to detect typical memory bugs

spent more time than I would've liked trying to get this to work (I had a lot of trouble using brew-installed LLVM on Mac to build PyTorch, which turned out to be necessary to use libfuzzer), and I wasn't able to build all of PyTorch with -fsanitize=address,undefined,fuzzer-no-link (though I built the fuzz binary with -fsanitize=address,undefined,fuzzer). It survived running for several minutes, even after I gave it a token dictionary.

cyyever · 2025-05-01T01:06:56Z

@swolchok May I try it on a Linux box? I plan to fuzz based on this git commit head. Hope to gain some experience and evaluate the difficulty of integrating libfuzzer into torch. Any findings will be reported in this thread.

Update

be38a15

[ghstack-poisoned]

pytorch-bot bot added ci-no-td Do not run TD on this PR release notes: jit release notes category labels Apr 28, 2025

facebook-github-bot added the oncall: jit Add this issue/PR to JIT oncall triage queue label Apr 28, 2025

swolchok added the ciflow/s390 s390x-related CI jobs label Apr 28, 2025

swolchok requested review from Skylion007, malfet and cyyever April 28, 2025 22:12

swolchok commented Apr 28, 2025

View reviewed changes

swolchok mentioned this pull request Apr 28, 2025

Reapply "Rewrite the guts of torch::jit::Lexer to speed it up (#151850)" #152250

Closed

wdvr approved these changes Apr 28, 2025

View reviewed changes

pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Apr 28, 2025

pytorchmergebot added the merging label Apr 28, 2025

malfet approved these changes Apr 28, 2025

View reviewed changes

malfet changed the title ~~Revert "Revert "Reapply "Rewrite the guts of torch::jit::Lexer to speed it up (#151850)" (#152250)""~~ [Relandx2] Rewrite the guts of torch::jit::Lexer to speed it up Apr 28, 2025

pytorchmergebot closed this in 7ce6f63 Apr 28, 2025

pytorchmergebot added Merged and removed merging labels Apr 28, 2025

pytorchmergebot added the Reverted label Apr 29, 2025

pytorchmergebot reopened this Apr 29, 2025

cyyever approved these changes Apr 29, 2025

View reviewed changes

Skylion007 approved these changes Apr 29, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Relandx2] Rewrite the guts of torch::jit::Lexer to speed it up #152372

[Relandx2] Rewrite the guts of torch::jit::Lexer to speed it up #152372

swolchok commented Apr 28, 2025 •

edited by pytorch-bot bot

Loading

pytorch-bot bot commented Apr 28, 2025 •

edited

Loading

swolchok Apr 28, 2025

swolchok Apr 28, 2025

swolchok commented Apr 28, 2025

pytorchmergebot commented Apr 28, 2025

malfet left a comment

malfet commented Apr 28, 2025

pytorchmergebot commented Apr 28, 2025

pytorchmergebot commented Apr 28, 2025

malfet commented Apr 29, 2025

malfet commented Apr 29, 2025

pytorchmergebot commented Apr 29, 2025

pytorchmergebot commented Apr 29, 2025

swolchok commented Apr 29, 2025

swolchok commented Apr 29, 2025

cyyever commented Apr 30, 2025 •

edited

Loading

Skylion007 commented Apr 30, 2025

swolchok commented Apr 30, 2025

swolchok commented Apr 30, 2025

cyyever commented May 1, 2025 •

edited

Loading

		static_assert(stringToUint64("and") == 0x616e640000000000);
		static_assert(stringToUint64("Ellipsis") == 0x456c6c6970736973);

[Relandx2] Rewrite the guts of torch::jit::Lexer to speed it up #152372

Are you sure you want to change the base?

[Relandx2] Rewrite the guts of torch::jit::Lexer to speed it up #152372

Conversation

swolchok commented Apr 28, 2025 • edited by pytorch-bot bot Loading

pytorch-bot bot commented Apr 28, 2025 • edited Loading

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/152372

❌ 2 Cancelled Jobs, 1 Unrelated Failure

swolchok Apr 28, 2025

Choose a reason for hiding this comment

swolchok Apr 28, 2025

Choose a reason for hiding this comment

swolchok commented Apr 28, 2025

pytorchmergebot commented Apr 28, 2025

Merge started

malfet left a comment

Choose a reason for hiding this comment

malfet commented Apr 28, 2025

pytorchmergebot commented Apr 28, 2025

pytorchmergebot commented Apr 28, 2025

Merge started

malfet commented Apr 29, 2025

malfet commented Apr 29, 2025

pytorchmergebot commented Apr 29, 2025

pytorchmergebot commented Apr 29, 2025

swolchok commented Apr 29, 2025

swolchok commented Apr 29, 2025

cyyever commented Apr 30, 2025 • edited Loading

Skylion007 commented Apr 30, 2025

swolchok commented Apr 30, 2025

swolchok commented Apr 30, 2025

cyyever commented May 1, 2025 • edited Loading

swolchok commented Apr 28, 2025 •

edited by pytorch-bot bot

Loading

pytorch-bot bot commented Apr 28, 2025 •

edited

Loading

cyyever commented Apr 30, 2025 •

edited

Loading

cyyever commented May 1, 2025 •

edited

Loading