Support DeepSeek Sparse Attention (DSA) like DeepSeek V3.2

Hi,

Are there any plans to add support for DeepSeek Sparse Attention (DSA) used in DeepSeek V3.2?

DeepSeek V3.2 introduces a sparse attention mechanism that reduces KV computation during attention. In particular, it adds mechanisms such as:

Lightning Index for efficient key selection

Top-K token selection during attention

Reduced KV usage compared to full attention

This significantly reduces the amount of KV involved in attention computation while maintaining model quality.

It would be great if could support this type of sparse attention natively, as sparse attention mechanisms like DSA are becoming increasingly important for scaling large models efficiently.

Thanks!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support DeepSeek Sparse Attention (DSA) like DeepSeek V3.2 #978

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Support DeepSeek Sparse Attention (DSA) like DeepSeek V3.2 #978

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions