Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Add detailed triton kernel logging to tlparse #152197

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 6 commits into
base: gh/jamesjwu/140/base
Choose a base branch
from

Conversation

jamesjwu
Copy link
Contributor

@jamesjwu jamesjwu commented Apr 25, 2025

Stack from ghstack (oldest at bottom):

This PR adds detailed logging of each triton kernel we compile, and its autotune result, to every kernel we compile with triton. We add these results to a global variable that we then clear after each triton kernel compile.

We can't keep these objects around after compile time, so we can't record the autotune cache save or coordinate descent tuning, unfortunately, but we can log at least:

  • The duration of compilation
  • Whether or not autotune cache hit
  • The best autotuning config, if there's only one.

Example triton kernel info: https://gist.github.com/jamesjwu/493bdd0f36b0b7e3ca327f87bd6c2c75

See internal diff for an example log for internal model.

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @chenyang78 @kadeng @muchulee8 @amjames @chauhang @aakhundov

Differential Revision: D73674443

[ghstack-poisoned]
Copy link

pytorch-bot bot commented Apr 25, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/152197

Note: Links to docs will display an error until the docs builds have been completed.

❌ 2 New Failures

As of commit 795a706 with merge base 6efc572 (image):

NEW FAILURES - The following jobs have failed:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

jamesjwu added a commit that referenced this pull request Apr 25, 2025
ghstack-source-id: 15ca7c4
Pull Request resolved: #152197
[ghstack-poisoned]
jamesjwu added a commit that referenced this pull request Apr 25, 2025
ghstack-source-id: 686bf1b
Pull Request resolved: #152197
@jamesjwu
Copy link
Contributor Author

@jamesjwu has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@pytorch-bot pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Apr 25, 2025
[ghstack-poisoned]
jamesjwu added a commit that referenced this pull request Apr 25, 2025
ghstack-source-id: a0cd75c
Pull Request resolved: #152197
@jamesjwu
Copy link
Contributor Author

@jamesjwu has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

[ghstack-poisoned]
jamesjwu added a commit that referenced this pull request Apr 28, 2025
ghstack-source-id: 0d29593
Pull Request resolved: #152197
@jamesjwu jamesjwu changed the title Detailed triton kernel logging Add detailed triton kernel logging to tlparse Apr 28, 2025
[ghstack-poisoned]
jamesjwu added a commit that referenced this pull request Apr 28, 2025
ghstack-source-id: 10b4037
Pull Request resolved: #152197
@jamesjwu jamesjwu marked this pull request as ready for review April 28, 2025 15:38
@jamesjwu jamesjwu requested a review from bdhirsh as a code owner April 28, 2025 15:38
[ghstack-poisoned]
jamesjwu added a commit that referenced this pull request Apr 28, 2025
ghstack-source-id: c7fc4c0
Pull Request resolved: #152197
@jamesjwu
Copy link
Contributor Author

@jamesjwu has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

autotune_cache_info["num_configs"] = len(configs)
if inductor_meta.get("coordinate_descent_tuning"):
autotune_cache_info["coordesc_tuning"] = True
if len(configs) == 1:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

basic question: given that we are logging the results of autotuning, what does it actually mean for there to be more than one config here? (shouldn't autotuning always end in a single config we can log?)

Copy link
Contributor Author

@jamesjwu jamesjwu Apr 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're logging the compile time "results", in that we're logging all the possible configs we need to actually autotune when the function is actually called. But we haven't run autotuning yet, so there can be more than one config.

We run autotuning later after dynamo returns, in CachingAutotuner.benchmark_all_configs. There, it should be possible to log just the best config.

@facebook-github-bot
Copy link
Contributor

@pytorchbot merge -i

(Initiating merge automatically since Phabricator Diff has merged, merging with -i because oss signals were bypassed internally)

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged while ignoring the following 0 checks:

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@jamesjwu
Copy link
Contributor Author

@pytorchbot merge -f -c "Buggy lints not running, checked that lints passed locally"

Copy link

pytorch-bot bot commented Apr 29, 2025

❌ 🤖 pytorchbot command failed:

@pytorchbot merge: error: argument -f/--force: expected one argument

usage: @pytorchbot merge [-f MESSAGE | -i] [-ic] [-r [{viable/strict,main}]]

Try @pytorchbot --help for more info.

@jamesjwu
Copy link
Contributor Author

@pytorchbot merge -f "Buggy lints not running, checked that lints passed locally"

@pytorchmergebot
Copy link
Collaborator

The merge job was canceled or timed out. This most often happen if two merge requests were issued for the same PR, or if merge job was waiting for more than 6 hours for tests to finish. In later case, please do not hesitate to reissue the merge command
For more information see pytorch-bot wiki.

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use -f as last resort and instead consider -i/--ignore-current to continue the merge ignoring current failures. This will allow currently pending tests to finish and report signal before the merge.

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@wdvr
Copy link
Contributor

wdvr commented Apr 29, 2025

@pytorchmergebot revert -m 'failing python test/dynamo/test_structured_trace.py StructuredTraceTest.test_cudagraphs on trunk' -c nosignal

dynamo/test_structured_trace.py::StructuredTraceTest::test_cudagraphs GH job link HUD commit link

cc @jamesjwu

@pytorchmergebot
Copy link
Collaborator

@pytorchbot successfully started a revert job. Check the current status here.
Questions? Feedback? Please reach out to the PyTorch DevX Team

@pytorchmergebot
Copy link
Collaborator

@jamesjwu your PR has been successfully reverted.

pytorchmergebot added a commit that referenced this pull request Apr 29, 2025
This reverts commit 8303860.

Reverted #152197 on behalf of https://github.com/wdvr due to failing     python test/dynamo/test_structured_trace.py StructuredTraceTest.test_cudagraphs on trunk ([comment](#152197 (comment)))
@pytorchmergebot pytorchmergebot added Reverted ci-no-td Do not run TD on this PR labels Apr 29, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants