Codestin Search App

fa4-v4.0.0.beta14

Enable split-kv for blocksparse tensors (#2536)

stack-info: PR: #2536, branch: drisspg/stack/38

May 19, 2026
4178915
zip
tar.gz
Notes
Downloads

fa4-v4.0.0.beta13

fix varlen w/ paging split kv bug (#2550)

May 12, 2026
9bad4be
zip
tar.gz
Notes
Downloads

fa4-v4.0.0.beta12

Deterministic backward for blocksparse impl (#2253)

May 4, 2026
2e53092
zip
tar.gz
Notes
Downloads

fa4-v4.0.0.beta11

ci: use /tmp for apptainer tmpdir to fix xattrerror on VAST (#2511)

Apr 28, 2026
ba59def
zip
tar.gz
Notes
Downloads

fa4-v4.0.0.beta10

fix (#2481)

Co-authored-by: wangziheng <[email protected]>

Apr 21, 2026
3a7694c
zip
tar.gz
Notes
Downloads

fa4-v4.0.0.beta9

Handle linter for flash mla file (#2459)

* fix outstanding ruff check and exclude flash_fwd_mla_sm100.py from ci

* add fmt comments for ruff

Apr 15, 2026
628452c
zip
tar.gz
Notes
Downloads

fa4-v4.0.0.beta8

CI: extend FA4 test matrix with causal/non-causal correctness and fwd…

…+bwd benchmark seqlen 1K-32K (#2428)

Apr 4, 2026
15270e6
zip
tar.gz
Notes
Downloads

fa4-v4.0.0.beta7

Allow compact block sparse index tensors (#2417)

* Allow compact block sparse index tensors

Relax validation in block_sparsity.py to allow idx.shape[3] <= expected_n_blocks
instead of requiring exact equality.

FA4 only accesses indices 0..cnt-1 per query tile, so the index tensor's last
dimension does not need to be as large as ceil(seqlen_k / block_size_n). This
enables memory-efficient compact index tensors that avoid O(N^2) memory at long
sequence lengths (e.g., 1M+ tokens for sparse attention / NSA workloads).

Changes:
- _check_and_expand_block: accept compact n-block dimension and expand only the
  batch/head/m-block dimensions
- infer_block_sparse_expected_shapes: change strict equality check to upper-bound
  check (error only when n-blocks exceeds expected, not when smaller)

Backward compatible: existing code that passes full-sized tensors is unaffected.

* Add test for compact block sparse index tensors

Verify that truncating block sparse index tensors to idx.shape[3] = max(cnt)
(instead of the full ceil(seqlen_k / block_size_n)) produces bit-identical
output to full-sized tensors. This validates the relaxed validation from
the previous commit.

Apr 1, 2026
f6a16e1
zip
tar.gz
Notes
Downloads

fa4-v4.0.0.beta6

Update flow to enable beta weekly releases (#2378)

Mar 23, 2026
6362bd3
zip
tar.gz
Notes
Downloads

fa4-v4.0.0.beta5

Update flow to enable beta weekly releases (#2378)

Mar 23, 2026
6362bd3
zip
tar.gz
Notes
Downloads

PreviousNext

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fa4-v4.0.0.beta14

fa4-v4.0.0.beta13

fa4-v4.0.0.beta12

fa4-v4.0.0.beta11

fa4-v4.0.0.beta10

fa4-v4.0.0.beta9

fa4-v4.0.0.beta8

fa4-v4.0.0.beta7

fa4-v4.0.0.beta6

fa4-v4.0.0.beta5

Tags: Dao-AILab/flash-attention