Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Pull requests: vllm-project/flash-attention

Author
Filter by author
Loading
Label
Filter by label
Loading
Use alt + click/return to exclude labels
or + click/return for logical OR
Projects
Filter by project
Loading
Milestones
Filter by milestone
Loading
Reviews
Assignee
Filter by who’s assigned
Assigned to nobody Loading
Sort

Pull requests list

Reapply #122
#137 opened May 6, 2026 by MatthewBonanni Member Loading…
SM100 tile size 64
#132 opened Apr 9, 2026 by MatthewBonanni Member Draft
Add TopK mask utils
#127 opened Mar 23, 2026 by MatthewBonanni Member Draft
add support for newer CUDA archs (Spark/Thor)
#121 opened Feb 13, 2026 by askliar Loading…
Fix issues with async TP
#117 opened Feb 7, 2026 by LucasWilkinson Collaborator Loading…
Sync to upstream main 20260121
#114 opened Jan 22, 2026 by LucasWilkinson Collaborator Loading…
FA2 support head sizes 40, 72, and 80
#108 opened Nov 14, 2025 by MatthewBonanni Member Draft
Add DCP parameters
#92 opened Sep 16, 2025 by MatthewBonanni Member Draft
Vllm_flash_attn_with_attention_weights
#88 opened Sep 11, 2025 by SiriusPaul Loading…
WIP stream k scheduling
#67 opened Apr 29, 2025 by LucasWilkinson Collaborator Draft
fix: add "typename" prior to dependent type name
#54 opened Feb 28, 2025 by zhiweij1 Loading…
AMD ROCm Build
#41 opened Jan 29, 2025 by ProExpertProg Draft
Add back flash_attn_func api (and support FA3) [Don't Merge Yet]
#40 opened Jan 26, 2025 by LucasWilkinson Collaborator Loading…
support KV-Compress paged KV cache
#27 opened Nov 27, 2024 by IsaacRe Loading…
Add CUDA 8.7 arch for Jetson Orin
#26 opened Nov 27, 2024 by conroy-cheers Loading…
Update torch to 2.5.1
#25 opened Nov 7, 2024 by ayakzob Loading…
Don't disable uneven k to support more headdims
#21 opened Sep 27, 2024 by njhill Member Loading…
Update .gitignore to ignore *env/ directories
#16 opened Aug 8, 2024 by wasertech Loading…
ProTip! What’s not been updated in a month: updated:<2026-05-05.