Codestin Search App

LucasWilkinson · 2026-02-07T06:00:30Z

FA3 always using PDL can cause a deadlock when combined with async TP which also uses PDL (in PyTorch's symmetric memory)

Signed-off-by: Lucas Wilkinson <[email protected]>

…mputed When scheduler metadata is computed separately (skip_scheduler_metadata_computation=true), there may be other PDL users (e.g., symmetric memory all-reduce for async TP) between the scheduler call and the attention call. These can interfere with FA3's PDL signaling chain, causing hangs. This extends the previous fix (disabling prepare_varlen PDL) to also disable the main kernel -> combine kernel PDL when using pre-computed scheduler metadata. Signed-off-by: Lucas Wilkinson <[email protected]>

ProExpertProg

Impressive find!

LucasWilkinson mentioned this pull request Feb 7, 2026

Reapply [Attention][FA3] Update FA3 to include new swizzle optimization vllm-project/vllm#34043

Merged

LucasWilkinson added 2 commits February 9, 2026 07:41

fix async tp

7882e23

Signed-off-by: Lucas Wilkinson <[email protected]>

LucasWilkinson force-pushed the lwllkinson/fix-async-tp branch from 5f86b74 to c427cae Compare February 9, 2026 15:41

ProExpertProg approved these changes Feb 10, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix issues with async TP#117

Fix issues with async TP#117
LucasWilkinson wants to merge 2 commits into
mainfrom
lwllkinson/fix-async-tp

LucasWilkinson commented Feb 7, 2026

Uh oh!

ProExpertProg left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

LucasWilkinson commented Feb 7, 2026

Uh oh!

ProExpertProg left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants