-
Notifications
You must be signed in to change notification settings - Fork 24.1k
[Relandx2] Rewrite the guts of torch::jit::Lexer to speed it up #152372
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: gh/swolchok/758/base
Are you sure you want to change the base?
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/152372
Note: Links to docs will display an error until the docs builds have been completed. ❌ 2 Cancelled Jobs, 1 Unrelated FailureAs of commit be38a15 with merge base e7c19f4 ( CANCELLED JOBS - The following jobs were cancelled. Please retry:
FLAKY - The following job failed but was likely due to flakiness present on trunk:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
…ed it up (#151850)" (#152250)"" Reapplying with fix for linux-manylinux-2_28-py3-cpu-s390x / build failure (https://github.com/pytorch/pytorch/actions/runs/14716285820/job/41300304223#logs), which is to just update a pair of static_assert constants I got wrong. ghstack-source-id: 4dd3e47 Pull-Request-resolved: #152372
static_assert(stringToUint64("and") == 0x616e640000000000); | ||
static_assert(stringToUint64("Ellipsis") == 0x456c6c6970736973); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
original PR had these constants wrong. (I forgot the trailing zeros on the first one, and somehow had 8083 instead of 7073 in the second) Since no big-endian builds run by default, we didn't notice until land.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(this time we have ciflow/s390 to make sure we notice if the failure continues)
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, though let me edit the title (3x revert/reland are hard to read)
@pytorchbot merge -f "Builds are green now" |
The merge job was canceled or timed out. This most often happen if two merge requests were issued for the same PR, or if merge job was waiting for more than 6 hours for tests to finish. In later case, please do not hesitate to reissue the merge command |
Merge startedYour change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
@pytorchbot revert -m "Looks like it broke distributed this time around, see https://hud.pytorch.org/hud/pytorch/pytorch/f05d3e5019a8d0ee460bb96d490764977cc2d011/1?per_page=50&name_filter=distributed&mergeEphemeralLF=true" -c nosignal |
Looks like another case when TD skipped the test during testing (adding |
@pytorchbot successfully started a revert job. Check the current status here. |
…up (#152372)" This reverts commit 7ce6f63. Reverted #152372 on behalf of https://github.com/malfet due to Looks like it broke distributed this time around, see https://hud.pytorch.org/hud/pytorch/pytorch/f05d3e5019a8d0ee460bb96d490764977cc2d011/1?per_page=50&name_filter=distributed&mergeEphemeralLF=true ([comment](#152372 (comment)))
@swolchok your PR has been successfully reverted. |
) Reapplying with fix for linux-manylinux-2_28-py3-cpu-s390x / build failure (https://github.com/pytorch/pytorch/actions/runs/14716285820/job/41300304223#logs), which is to just update a pair of static_assert constants I got wrong. Pull Request resolved: #152372 Approved by: https://github.com/wdvr, https://github.com/malfet
stack traces in distributed failure don't appear to have symbols (e.g., https://github.com/pytorch/pytorch/actions/runs/14720233524/job/41313777004#step:22:2283 ) . However, this PR is clearly associated with distributed failures both times it has landed. Best course of action is probably to make sure the new path has thorough tests, run those tests under ASAN/UBSAN, and see what comes up. |
ran the existing tests under sanitizers. no failure. absent any information about how to debug distributed tests, I might have to just give up on this PR. if we do dust it off, I noticed the |
@swolchok There is an easier way to detect lexer bugs. Can you use libfuzzer to generate a random code string and pass it to lexer. Normally it takes one or two minutes to detect typical memory bugs. There are many online examples that you can reference. Even better, integrate fuzzing tests into our current c++ tests. The fact that current c++ tests pass doesn't mean that the test/branch coverage is near 100%... |
Looks like the tests were never run: #152440 |
I did run them locally via buck, but yeah they don't seem to have run in CI. |
spent more time than I would've liked trying to get this to work (I had a lot of trouble using brew-installed LLVM on Mac to build PyTorch, which turned out to be necessary to use libfuzzer), and I wasn't able to build all of PyTorch with -fsanitize=address,undefined,fuzzer-no-link (though I built the fuzz binary with -fsanitize=address,undefined,fuzzer). It survived running for several minutes, even after I gave it a token dictionary. |
@swolchok May I try it on a Linux box? I plan to fuzz based on this git commit head. Hope to gain some experience and evaluate the difficulty of integrating libfuzzer into torch. Any findings will be reported in this thread. |
Stack from ghstack (oldest at bottom):
Reapplying with fix for linux-manylinux-2_28-py3-cpu-s390x / build
failure
(https://github.com/pytorch/pytorch/actions/runs/14716285820/job/41300304223#logs),
which is to just update a pair of static_assert constants I got wrong.
cc @EikanWang @jgong5 @wenzhe-nrv @sanchitintel