Add optional mask & bias inputs with adaptive computation skipping #162
+284
−50
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR implements optional
attn_mask
andattn_bias
inputs with adaptive computation skipping to improve performance and reduce unnecessary memory operations in Flash Dynamic Mask Attention.Problem
The current implementation always assumes both
attn_mask
andattn_bias
are active, causing:Solution
Added support for 4 explicit modes with conditional processing:
None
None
Tensor
None
None
Tensor
Tensor
Tensor
Key Changes
Python Interface
attn_mask
andattn_bias
parameters now acceptOptional[Tensor] = None
use_mask
anduse_bias
flags passed to CUDA kernelsdbias
returned only when bias providedCUDA Kernels
use_bias=False
Usage Example
Performance Benefits
Backward Compatibility
The implementation is fully backward compatible - existing code continues to work unchanged. Default parameter values maintain current behavior when not specified.
Fixes #161.
✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.