-
Notifications
You must be signed in to change notification settings - Fork 24.1k
Add detailed triton kernel logging to tlparse #152197
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: gh/jamesjwu/140/base
Are you sure you want to change the base?
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/152197
Note: Links to docs will display an error until the docs builds have been completed. ❌ 2 New FailuresAs of commit 795a706 with merge base 6efc572 ( NEW FAILURES - The following jobs have failed:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
@jamesjwu has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
@jamesjwu has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
@jamesjwu has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
autotune_cache_info["num_configs"] = len(configs) | ||
if inductor_meta.get("coordinate_descent_tuning"): | ||
autotune_cache_info["coordesc_tuning"] = True | ||
if len(configs) == 1: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
basic question: given that we are logging the results of autotuning, what does it actually mean for there to be more than one config here? (shouldn't autotuning always end in a single config we can log?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're logging the compile time "results", in that we're logging all the possible configs we need to actually autotune when the function is actually called. But we haven't run autotuning yet, so there can be more than one config.
We run autotuning later after dynamo returns, in CachingAutotuner.benchmark_all_configs. There, it should be possible to log just the best config.
@pytorchbot merge -i (Initiating merge automatically since Phabricator Diff has merged, merging with -i because oss signals were bypassed internally) |
Merge startedYour change will be merged while ignoring the following 0 checks: Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
@pytorchbot merge -f -c "Buggy lints not running, checked that lints passed locally" |
❌ 🤖 pytorchbot command failed:
Try |
@pytorchbot merge -f "Buggy lints not running, checked that lints passed locally" |
The merge job was canceled or timed out. This most often happen if two merge requests were issued for the same PR, or if merge job was waiting for more than 6 hours for tests to finish. In later case, please do not hesitate to reissue the merge command |
Merge startedYour change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
@pytorchmergebot revert -m 'failing python test/dynamo/test_structured_trace.py StructuredTraceTest.test_cudagraphs on trunk' -c nosignal dynamo/test_structured_trace.py::StructuredTraceTest::test_cudagraphs GH job link HUD commit link cc @jamesjwu |
@pytorchbot successfully started a revert job. Check the current status here. |
@jamesjwu your PR has been successfully reverted. |
This reverts commit 8303860. Reverted #152197 on behalf of https://github.com/wdvr due to failing python test/dynamo/test_structured_trace.py StructuredTraceTest.test_cudagraphs on trunk ([comment](#152197 (comment)))
Stack from ghstack (oldest at bottom):
This PR adds detailed logging of each triton kernel we compile, and its autotune result, to every kernel we compile with triton. We add these results to a global variable that we then clear after each triton kernel compile.
We can't keep these objects around after compile time, so we can't record the autotune cache save or coordinate descent tuning, unfortunately, but we can log at least:
Example triton kernel info: https://gist.github.com/jamesjwu/493bdd0f36b0b7e3ca327f87bd6c2c75
See internal diff for an example log for internal model.
cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @chenyang78 @kadeng @muchulee8 @amjames @chauhang @aakhundov
Differential Revision: D73674443