Fix CUDA Graph Capture Error in Tensor Argument Extraction #197

FindHao · 2025-11-11T00:40:11Z

Problem

When tracing kernel launches during CUDA graph capture (used by triton.testing.do_bench_cudagraph), TritonParse's tensor argument extraction caused crashes with:

torch.AcceleratorError: CUDA error: operation not permitted when stream is capturing

The error occurred because str(tensor_value) in extract_arg_info() triggered PyTorch's __repr__ method, which accesses tensor data. CUDA graph capture forbids certain memory operations, causing the entire capture to be invalidated.

Root Cause

In add_launch_metadata(), the code unconditionally called extract_arg_info() to collect detailed tensor metadata. During CUDA graph capture:

str(arg_value) triggers torch.Tensor.__repr__()
__repr__() calls torch.masked_select() and torch.isfinite() to format tensor data
These operations are prohibited during CUDA graph capture
CUDA raises cudaErrorStreamCaptureUnsupported, invalidating the entire graph

Solution

Detect and skip argument extraction during CUDA graph capture:

In add_launch_metadata(), check torch.cuda.is_current_stream_capturing() before calling extract_arg_info()
If capturing, return minimal metadata with a note that extraction was skipped
Remove redundant try-except blocks in extract_arg_info() since it's never called during capture

Key changes:

def add_launch_metadata(grid, metadata, arg_dict, inductor_args=None):
    # Check if we're in CUDA graph capture mode
    is_capturing = False
    if TORCH_INSTALLED:
        try:
            is_capturing = torch.cuda.is_current_stream_capturing()
        except (AttributeError, RuntimeError):
            pass  # Handle API unavailability in older PyTorch versions
    
    if is_capturing:
        # Return minimal metadata without argument extraction
        return {
            "launch_metadata_tritonparse": (
                grid,
                metadata._asdict(),
                {"_note": "argument extraction skipped during CUDA graph capture"},
                {},
            )
        }
    
    # Normal path: extract detailed argument information
    extracted_args = extract_arg_info(arg_dict)
    ...

Impact

Fixes: Kernel benchmarking with triton.testing.do_bench_cudagraph no longer crashes
Preserves: Full argument extraction for normal (non-capture) kernel launches
Safe: Gracefully handles older PyTorch versions that lack is_current_stream_capturing()
Clean: Removes redundant defensive code since capture detection happens at the entry point

Testing

Verified with Triton's Hopper GEMM benchmark (hopper-gemm-ws_test.py) which uses do_bench_cudagraph extensively. Previously failed with CUDA graph capture errors, now runs successfully.

Related Issues

Fixes: meta-pytorch/tritonbench#632

During CUDA graph capture, accessing tensor data (via str()) triggers cudaErrorStreamCaptureUnsupported, invalidating the entire capture. Changes: - Check torch.cuda.is_current_stream_capturing() in add_launch_metadata() - Skip argument extraction during capture and return minimal metadata - Remove redundant try-except blocks in extract_arg_info() Fixes: meta-pytorch/tritonbench#632

meta-codesync · 2025-11-11T01:01:39Z

@FindHao has imported this pull request. If you are a Meta employee, you can view this in D86722827.

meta-codesync · 2025-11-12T17:50:35Z

@FindHao merged this pull request in fb7197b.

meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Nov 11, 2025

xuzhao9 approved these changes Nov 12, 2025

View reviewed changes

meta-codesync bot closed this in fb7197b Nov 12, 2025

facebook-github-bot added the Merged label Nov 12, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix CUDA Graph Capture Error in Tensor Argument Extraction #197

Fix CUDA Graph Capture Error in Tensor Argument Extraction #197

FindHao commented Nov 11, 2025

Uh oh!

meta-codesync bot commented Nov 11, 2025

Uh oh!

meta-codesync bot commented Nov 12, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Fix CUDA Graph Capture Error in Tensor Argument Extraction #197

Fix CUDA Graph Capture Error in Tensor Argument Extraction #197

Conversation

FindHao commented Nov 11, 2025

Problem

Root Cause

Solution

Impact

Testing

Related Issues

Uh oh!

meta-codesync bot commented Nov 11, 2025

Uh oh!

meta-codesync bot commented Nov 12, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants