Bug Description
When using sageattn3_blackwell mode on RTX 5060 Ti (Blackwell architecture), it throws "CUDA error: misaligned address" error during inference.
Environment
- OS: Windows 11
- GPU: NVIDIA RTX 5060 Ti 16GB (Blackwell architecture, sm_120)
- Python: 3.14.0
- PyTorch: 2.12.0.dev20260318+cu130
- CUDA: 13.0
- SageAttention3: Latest code from GitHub (fresh clone and compile)
Steps to Reproduce
- Clone SageAttention3 repository from GitHub
- Compile with Visual Studio 2022 19.44 (added
-allow-unsupported-compiler flag to nvcc_flags)
- Install with
pip install -e .
- Use
sageattn3_blackwell mode in ComfyUI with Patch Sage Attention node
- Run inference
Expected Behavior
Attention computation should work without CUDA errors.
Actual Behavior
RuntimeError: CUDA error: misaligned address
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
Additional Context
Notes
I have successfully compiled SageAttention3 after adding -allow-unsupported-compiler flag to handle Visual Studio version compatibility, but the runtime misaligned address error persists. This suggests the issue is in the Blackwell kernel code itself rather than a compilation problem.
Bug Description
When using
sageattn3_blackwellmode on RTX 5060 Ti (Blackwell architecture), it throws "CUDA error: misaligned address" error during inference.Environment
Steps to Reproduce
-allow-unsupported-compilerflag to nvcc_flags)pip install -e .sageattn3_blackwellmode in ComfyUI with Patch Sage Attention nodeExpected Behavior
Attention computation should work without CUDA errors.
Actual Behavior
Additional Context
Notes
I have successfully compiled SageAttention3 after adding
-allow-unsupported-compilerflag to handle Visual Studio version compatibility, but the runtime misaligned address error persists. This suggests the issue is in the Blackwell kernel code itself rather than a compilation problem.