Thanks to visit codestin.com
Credit goes to github.com

Skip to content

[Relandx2] Rewrite the guts of torch::jit::Lexer to speed it up #152372

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: gh/swolchok/758/base
Choose a base branch
from

Conversation

swolchok
Copy link
Contributor

@swolchok swolchok commented Apr 28, 2025

Stack from ghstack (oldest at bottom):

Reapplying with fix for linux-manylinux-2_28-py3-cpu-s390x / build
failure
(https://github.com/pytorch/pytorch/actions/runs/14716285820/job/41300304223#logs),
which is to just update a pair of static_assert constants I got wrong.

cc @EikanWang @jgong5 @wenzhe-nrv @sanchitintel

[ghstack-poisoned]
Copy link

pytorch-bot bot commented Apr 28, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/152372

Note: Links to docs will display an error until the docs builds have been completed.

❌ 2 Cancelled Jobs, 1 Unrelated Failure

As of commit be38a15 with merge base e7c19f4 (image):

CANCELLED JOBS - The following jobs were cancelled. Please retry:

FLAKY - The following job failed but was likely due to flakiness present on trunk:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@pytorch-bot pytorch-bot bot added ci-no-td Do not run TD on this PR release notes: jit release notes category labels Apr 28, 2025
swolchok added a commit that referenced this pull request Apr 28, 2025
…ed it up (#151850)" (#152250)""

Reapplying with fix for linux-manylinux-2_28-py3-cpu-s390x / build
failure
(https://github.com/pytorch/pytorch/actions/runs/14716285820/job/41300304223#logs),
which is to just update a pair of static_assert constants I got wrong.


ghstack-source-id: 4dd3e47
Pull-Request-resolved: #152372
@facebook-github-bot facebook-github-bot added the oncall: jit Add this issue/PR to JIT oncall triage queue label Apr 28, 2025
@swolchok swolchok added the ciflow/s390 s390x-related CI jobs label Apr 28, 2025
Comment on lines +556 to +557
static_assert(stringToUint64("and") == 0x616e640000000000);
static_assert(stringToUint64("Ellipsis") == 0x456c6c6970736973);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

original PR had these constants wrong. (I forgot the trailing zeros on the first one, and somehow had 8083 instead of 7073 in the second) Since no big-endian builds run by default, we didn't notice until land.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(this time we have ciflow/s390 to make sure we notice if the failure continues)

@swolchok
Copy link
Contributor Author

@pytorchbot merge

@pytorch-bot pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Apr 28, 2025
@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

Copy link
Contributor

@malfet malfet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, though let me edit the title (3x revert/reland are hard to read)

@malfet malfet changed the title Revert "Revert "Reapply "Rewrite the guts of torch::jit::Lexer to speed it up (#151850)" (#152250)"" [Relandx2] Rewrite the guts of torch::jit::Lexer to speed it up Apr 28, 2025
@malfet
Copy link
Contributor

malfet commented Apr 28, 2025

@pytorchbot merge -f "Builds are green now"

@pytorchmergebot
Copy link
Collaborator

The merge job was canceled or timed out. This most often happen if two merge requests were issued for the same PR, or if merge job was waiting for more than 6 hours for tests to finish. In later case, please do not hesitate to reissue the merge command
For more information see pytorch-bot wiki.

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use -f as last resort and instead consider -i/--ignore-current to continue the merge ignoring current failures. This will allow currently pending tests to finish and report signal before the merge.

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@malfet
Copy link
Contributor

malfet commented Apr 29, 2025

@pytorchbot revert -m "Looks like it broke distributed this time around, see https://hud.pytorch.org/hud/pytorch/pytorch/f05d3e5019a8d0ee460bb96d490764977cc2d011/1?per_page=50&name_filter=distributed&mergeEphemeralLF=true" -c nosignal

@malfet
Copy link
Contributor

malfet commented Apr 29, 2025

Looks like another case when TD skipped the test during testing (adding ci-no-td should help)

@pytorchmergebot
Copy link
Collaborator

@pytorchbot successfully started a revert job. Check the current status here.
Questions? Feedback? Please reach out to the PyTorch DevX Team

@pytorchmergebot
Copy link
Collaborator

@swolchok your PR has been successfully reverted.

pytorchmergebot pushed a commit that referenced this pull request Apr 29, 2025
)

Reapplying with fix for linux-manylinux-2_28-py3-cpu-s390x / build
failure
(https://github.com/pytorch/pytorch/actions/runs/14716285820/job/41300304223#logs),
which is to just update a pair of static_assert constants I got wrong.

Pull Request resolved: #152372
Approved by: https://github.com/wdvr, https://github.com/malfet
@swolchok
Copy link
Contributor Author

@pytorchbot revert -m "Looks like it broke distributed this time around, see https://hud.pytorch.org/hud/pytorch/pytorch/f05d3e5019a8d0ee460bb96d490764977cc2d011/1?per_page=50&name_filter=distributed&mergeEphemeralLF=true" -c nosignal

stack traces in distributed failure don't appear to have symbols (e.g., https://github.com/pytorch/pytorch/actions/runs/14720233524/job/41313777004#step:22:2283 ) . However, this PR is clearly associated with distributed failures both times it has landed. Best course of action is probably to make sure the new path has thorough tests, run those tests under ASAN/UBSAN, and see what comes up.

@swolchok
Copy link
Contributor Author

ran the existing tests under sanitizers. no failure. absent any information about how to debug distributed tests, I might have to just give up on this PR.

if we do dust it off, I noticed the pos.has_next() checking in the DFA is unnecessary because there's an earlier exit in that function if !pos.has_next().

@cyyever
Copy link
Collaborator

cyyever commented Apr 30, 2025

@swolchok There is an easier way to detect lexer bugs. Can you use libfuzzer to generate a random code string and pass it to lexer. Normally it takes one or two minutes to detect typical memory bugs. There are many online examples that you can reference. Even better, integrate fuzzing tests into our current c++ tests.

The fact that current c++ tests pass doesn't mean that the test/branch coverage is near 100%...

@Skylion007
Copy link
Collaborator

Looks like the tests were never run: #152440

@swolchok
Copy link
Contributor Author

never run

I did run them locally via buck, but yeah they don't seem to have run in CI.

@swolchok
Copy link
Contributor Author

libfuzzer to generate a random code string and pass it to lexer. Normally it takes one or two minutes to detect typical memory bugs

spent more time than I would've liked trying to get this to work (I had a lot of trouble using brew-installed LLVM on Mac to build PyTorch, which turned out to be necessary to use libfuzzer), and I wasn't able to build all of PyTorch with -fsanitize=address,undefined,fuzzer-no-link (though I built the fuzz binary with -fsanitize=address,undefined,fuzzer). It survived running for several minutes, even after I gave it a token dictionary.

@cyyever
Copy link
Collaborator

cyyever commented May 1, 2025

@swolchok May I try it on a Linux box? I plan to fuzz based on this git commit head. Hope to gain some experience and evaluate the difficulty of integrating libfuzzer into torch. Any findings will be reported in this thread.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ci-no-td Do not run TD on this PR ciflow/s390 s390x-related CI jobs ciflow/trunk Trigger trunk jobs on your pull request Merged oncall: jit Add this issue/PR to JIT oncall triage queue release notes: jit release notes category Reverted
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants