Hi,
Are there any plans to add support for DeepSeek Sparse Attention (DSA) used in DeepSeek V3.2?
DeepSeek V3.2 introduces a sparse attention mechanism that reduces KV computation during attention. In particular, it adds mechanisms such as:
Lightning Index for efficient key selection
Top-K token selection during attention
Reduced KV usage compared to full attention
This significantly reduces the amount of KV involved in attention computation while maintaining model quality.
It would be great if could support this type of sparse attention natively, as sparse attention mechanisms like DSA are becoming increasingly important for scaling large models efficiently.
Thanks!
Hi,
Are there any plans to add support for DeepSeek Sparse Attention (DSA) used in DeepSeek V3.2?
DeepSeek V3.2 introduces a sparse attention mechanism that reduces KV computation during attention. In particular, it adds mechanisms such as:
Lightning Index for efficient key selection
Top-K token selection during attention
Reduced KV usage compared to full attention
This significantly reduces the amount of KV involved in attention computation while maintaining model quality.
It would be great if could support this type of sparse attention natively, as sparse attention mechanisms like DSA are becoming increasingly important for scaling large models efficiently.
Thanks!