Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Tags: Dao-AILab/flash-attention

Tags

fa4-v4.0.0.beta14

Toggle fa4-v4.0.0.beta14's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
Enable split-kv for blocksparse tensors (#2536)

stack-info: PR: #2536, branch: drisspg/stack/38

fa4-v4.0.0.beta13

Toggle fa4-v4.0.0.beta13's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
fix varlen w/ paging split kv bug (#2550)

fa4-v4.0.0.beta12

Toggle fa4-v4.0.0.beta12's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
Deterministic backward for blocksparse impl (#2253)

fa4-v4.0.0.beta11

Toggle fa4-v4.0.0.beta11's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
ci: use /tmp for apptainer tmpdir to fix xattrerror on VAST (#2511)

fa4-v4.0.0.beta10

Toggle fa4-v4.0.0.beta10's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
fix (#2481)

Co-authored-by: wangziheng <[email protected]>

fa4-v4.0.0.beta9

Toggle fa4-v4.0.0.beta9's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
Handle linter for flash mla file (#2459)

* fix outstanding ruff check and exclude flash_fwd_mla_sm100.py from ci

* add fmt comments for ruff

fa4-v4.0.0.beta8

Toggle fa4-v4.0.0.beta8's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
CI: extend FA4 test matrix with causal/non-causal correctness and fwd…

…+bwd benchmark seqlen 1K-32K (#2428)

fa4-v4.0.0.beta7

Toggle fa4-v4.0.0.beta7's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
Allow compact block sparse index tensors (#2417)

* Allow compact block sparse index tensors

Relax validation in block_sparsity.py to allow idx.shape[3] <= expected_n_blocks
instead of requiring exact equality.

FA4 only accesses indices 0..cnt-1 per query tile, so the index tensor's last
dimension does not need to be as large as ceil(seqlen_k / block_size_n). This
enables memory-efficient compact index tensors that avoid O(N^2) memory at long
sequence lengths (e.g., 1M+ tokens for sparse attention / NSA workloads).

Changes:
- _check_and_expand_block: accept compact n-block dimension and expand only the
  batch/head/m-block dimensions
- infer_block_sparse_expected_shapes: change strict equality check to upper-bound
  check (error only when n-blocks exceeds expected, not when smaller)

Backward compatible: existing code that passes full-sized tensors is unaffected.

* Add test for compact block sparse index tensors

Verify that truncating block sparse index tensors to idx.shape[3] = max(cnt)
(instead of the full ceil(seqlen_k / block_size_n)) produces bit-identical
output to full-sized tensors. This validates the relaxed validation from
the previous commit.

fa4-v4.0.0.beta6

Toggle fa4-v4.0.0.beta6's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
Update flow to enable beta weekly releases (#2378)

fa4-v4.0.0.beta5

Toggle fa4-v4.0.0.beta5's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
Update flow to enable beta weekly releases (#2378)