[Inductor] [bc-breaking] Node Level provenance tracking#144277
[Inductor] [bc-breaking] Node Level provenance tracking#144277yushangdi wants to merge 1 commit into
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/144277
Note: Links to docs will display an error until the docs builds have been completed. ❗ 1 Active SEVsThere are 1 currently active SEVs. If your PR is affected, please view them below: ✅ You can merge normally! (1 Unrelated Failure)As of commit f40b2d3 with merge base a5164a2 ( UNSTABLE - The following job failed but was likely due to flakiness present on trunk and has been marked as unstable:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
|
This pull request was exported from Phabricator. Differential Revision: D65006709 |
Summary: - use GraphTransformObserver + replace_node hooks to track node sources when they are replaced - add pre_grad_graph tracking to tlparse - add the node provenance information to tlparse artifact. This is for the frontend to create a mapping between pre_grad and post_grad graph. See an example frontend (this is just a prototype) here: https://drive.google.com/file/d/1cMHH_0y4FJUSS9tATwGQvA72O0Lth8eh/view?usp=sharing - change "action" of NodeSource from a single action to a list of actions. - Guard provenance tracking behind `_inductor.config.trace.enabled` flag to avoid slowing down compilation https://docs.google.com/document/d/1dGh9myqNhywmbfP0Quzx_f04bghDFlj8cawj8MopiO8/edit?tab=t.0 The front-end code that takes in the tlparse result is in https://github.com/yushangdi/compiler_explorer. ghstack-source-id: 260390519 Test Plan: ``` buck2 run mode/dev-nosan fbcode//caffe2/test:fx -- -r test_graph_transform_observer buck run mode/dev-nosan fbcode//caffe2/test:fx -- -r node_source buck run mode/dev-nosan fbcode//caffe2/test:fx -- -r graph_provenance buck2 run mode/dev-nosan fbcode//caffe2/test/inductor:auto_functionalize buck2 run mode/dev-nosan fbcode//caffe2/test/inductor:provenance_tracing ``` Front-end example screenshots on a real model, 93% coverage rate between pre_grad_graph and post_grad_graph {F1973584210}{F1973584209} ``` buck2 build --show-output mode/opt -c=python.package_style=inplace -c fbcode.enable_gpu_sections=true -c fbcode.platform=platform010 -c fbcode.split-dwarf=true -c fbcode.nvcc_arch=a100,h100 caffe2/torch/fb/model_transform/experimental/benchmark:mts_gpu_benchmark MODEL_ENTITY_ID=644688112 SNAPSHOT_ID=32 MODULE=merge TORCH_COMPILE_DEBUG=1 CUDA_VISIBLE_DEVICES=7 TORCH_LOGS="+inductor,+schedule,output_code,graph_code" TORCHINDUCTOR_MAX_AUTOTUNE=1 TORCHINDUCTOR_UNIQUE_KERNEL_NAMES=1 ../buck-out/v2/gen/fbcode/ec86b05dd59e84db/caffe2/torch/fb/model_transform/experimental/benchmark/__mts_gpu_benchmark__/mts_gpu_benchmark.par --local-model /home/bahuang/models/${MODEL_ENTITY_ID}/${SNAPSHOT_ID}/gpu_lowering/input.predictor.disagg.gpu.merge --lower-backend AOT_INDUCTOR_EP --gpu-trace --aot-inductor-config="{'max_autotune': True}" ``` Differential Revision: D65006709
83409b4 to
ef7925a
Compare
|
This pull request was exported from Phabricator. Differential Revision: D65006709 |
Summary: - use GraphTransformObserver + replace_node hooks to track node sources when they are replaced - add pre_grad_graph tracking to tlparse - add the node provenance information to tlparse artifact. This is for the frontend to create a mapping between pre_grad and post_grad graph. See an example frontend (this is just a prototype) here: https://drive.google.com/file/d/1cMHH_0y4FJUSS9tATwGQvA72O0Lth8eh/view?usp=sharing - change "action" of NodeSource from a single action to a list of actions. - Guard provenance tracking behind `_inductor.config.trace.enabled` flag to avoid slowing down compilation https://docs.google.com/document/d/1dGh9myqNhywmbfP0Quzx_f04bghDFlj8cawj8MopiO8/edit?tab=t.0 The front-end code that takes in the tlparse result is in https://github.com/yushangdi/compiler_explorer. ghstack-source-id: 260390519 Test Plan: ``` buck2 run mode/dev-nosan fbcode//caffe2/test:fx -- -r test_graph_transform_observer buck run mode/dev-nosan fbcode//caffe2/test:fx -- -r node_source buck run mode/dev-nosan fbcode//caffe2/test:fx -- -r graph_provenance buck2 run mode/dev-nosan fbcode//caffe2/test/inductor:auto_functionalize buck2 run mode/dev-nosan fbcode//caffe2/test/inductor:provenance_tracing python benchmarks/basic_modules_benchmarks.py ``` Front-end example screenshots on a real model, 93% coverage rate between pre_grad_graph and post_grad_graph {F1973584210}{F1973584209} ``` buck2 build --show-output mode/opt -c=python.package_style=inplace -c fbcode.enable_gpu_sections=true -c fbcode.platform=platform010 -c fbcode.split-dwarf=true -c fbcode.nvcc_arch=a100,h100 caffe2/torch/fb/model_transform/experimental/benchmark:mts_gpu_benchmark MODEL_ENTITY_ID=644688112 SNAPSHOT_ID=32 MODULE=merge TORCH_COMPILE_DEBUG=1 CUDA_VISIBLE_DEVICES=7 TORCH_LOGS="+inductor,+schedule,output_code,graph_code" TORCHINDUCTOR_MAX_AUTOTUNE=1 TORCHINDUCTOR_UNIQUE_KERNEL_NAMES=1 ../buck-out/v2/gen/fbcode/ec86b05dd59e84db/caffe2/torch/fb/model_transform/experimental/benchmark/__mts_gpu_benchmark__/mts_gpu_benchmark.par --local-model /home/bahuang/models/${MODEL_ENTITY_ID}/${SNAPSHOT_ID}/gpu_lowering/input.predictor.disagg.gpu.merge --lower-backend AOT_INDUCTOR_EP --gpu-trace --aot-inductor-config="{'max_autotune': True}" ``` Differential Revision: D65006709
ef7925a to
f40b2d3
Compare
|
This pull request was exported from Phabricator. Differential Revision: D65006709 |
desertfire
left a comment
There was a problem hiding this comment.
Please update the title (remove 142739).
| print_output=False, include_stride=True, include_device=True | ||
| ), | ||
| ) | ||
| if config.trace.enabled: |
There was a problem hiding this comment.
There was a discussion on deprecating TORCH_COMPILE_DEBUG. Ok for now, but we may want to switch to TORCH_LOGS in future.
|
@pytorchbot merge (Initiating merge automatically since Phabricator Diff has merged) |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
Summary:
use GraphTransformObserver + replace_node hooks to track node sources when they are replaced
add pre_grad_graph tracking to tlparse
add the node provenance information to post_grad_graph tlparse. This is for the frontend to create a mapping between pre_grad and post_grad graph. See an example frontend (this is just a prototype) here: https://drive.google.com/file/d/1cMHH_0y4FJUSS9tATwGQvA72O0Lth8eh/view?usp=sharing
change "action" of NodeSource from a single action to a list of actions.
It's BC-Breaking because we removed
GraphTransformObserver's class methodson_node_eraseandon_node_erase.https://docs.google.com/document/d/1dGh9myqNhywmbfP0Quzx_f04bghDFlj8cawj8MopiO8/edit?tab=t.0
The front-end code that takes in the tlparse result is in https://github.com/yushangdi/compiler_explorer.
ghstack-source-id: 260390519
Test Plan:
Front-end example screenshots on a real model, 93% coverage rate between pre_grad_graph and post_grad_graph
{F1973584210}{F1973584209}
Differential Revision: D65006709
cc @ezyang @SherlockNoMad @EikanWang @jgong5 @wenzhe-nrv @voznesenskym @penguinwu @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @jiayisunx @ipiszy @yf225 @chenyang78 @kadeng @muchulee8 @ColinPeppler @amjames @desertfire @chauhang @aakhundov